How to stop YADE from exiting automatically on PBS?

Asked by Matt Kesseler

Hi all,

I've been able to get my YADE script working on my university's HPC via PBS, and have managed to get it to save output files. However, the script only runs for a second or so before an auto-exit dialog appears in YADE and seems to stop the script. This is probably a simple question but how can I prevent YADE from doing this and make sure it completes the whole script?

Below is the PBS script I am using.

#!/bin/bash
#OPTIONS FOR PBS PRO ==============================================================

#PBS -P HPCA-02414-PGR
#PBS -l walltime=2:00:00
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -k oe
#PBS -o gds-test
export PBS_O_PATH="$PBS_O_PATH:$HOME/YADE"

#OPTIONS FOR PBS PRO ================================================================
echo ------------------------------------------------------
echo -n 'Job is running on node '; cat $PBS_NODEFILE
echo ------------------------------------------------------
echo PBS: qsub is running on $PBS_O_HOST
echo PBS: originating queue is $PBS_O_QUEUE
echo PBS: executing queue is $PBS_QUEUE
echo PBS: working directory is $PBS_O_WORKDIR
echo PBS: execution mode is $PBS_ENVIRONMENT
echo PBS: job identifier is $PBS_JOBID
echo PBS: job name is $PBS_JOBNAME
echo PBS: node file is $PBS_NODEFILE
echo PBS: current home directory is $PBS_O_HOME
echo PBS: PATH = $PBS_O_PATH
echo ------------------------------------------------------

cd $PBS_O_WORKDIR/YADE
yade -n --threads=2 gds.py
exit 0

Please let me know if you need me to provide any more information; as far as I can tell the YADE implementation and script are fine, as the code works properly outside of PBS.

Matt.

Question information

Language:
English Edit question
Status:
Solved
For:
Yade Edit question
Assignee:
No assignee Edit question
Solved by:
Bruno Chareyre
Solved:
Last query:
Last reply:
Revision history for this message
Matt Kesseler (evxmk9) said :
#1

This is the output from YADE, alongside some data files I told it to create. The script prints every 1000th iteration to show it's working during debugging.

------------------------------------------------------
Job is running on node node068
------------------------------------------------------
PBS: qsub is running on login01.cm.cluster
PBS: originating queue is submit
PBS: executing queue is HPCA-02414-PGR
PBS: working directory is /panfs/panasas01.panfs.cluster/evxmk9
PBS: execution mode is PBS_BATCH
PBS: job identifier is 2083995.master.cm.cluster
PBS: job name is gds.sh
PBS: node file is /cm/local/apps/pbspro/var/spool/aux/2083995.master.cm.cluster
PBS: current home directory is /panfs/panasas01.panfs.cluster/evxmk9
PBS: PATH = /cm/shared/apps/gts-0.7.6/gcc/bin:/cm/shared/apps/python/2.7.12/bin:/cm/shared/apps/yade-2$
------------------------------------------------------
TCP python prompt on localhost:9000, auth cookie `escydu'
Welcome to Yade 2016.06a
XMLRPC info provider on http://localhost:21000
Running script gds.py
/cm/shared/apps/python/2.7.12/lib/python2.7/site-packages/IPython/config.py:13: ShimWarning: The `IPyt$
  "You should import from traitlets.config instead.", ShimWarning)
/cm/shared/apps/python/2.7.12/lib/python2.7/site-packages/IPython/core/interactiveshell.py:440: UserWa$
  warn('As of IPython 5.0 `PromptManager` config will have no effect'
1000
2000
3000
[[ ^L clears screen, ^U kills line. F8 plot. ]]

In [1]: Do you really want to exit ([y]/n)?

Revision history for this message
Jérôme Duriez (jduriez) said :
#2

Hi,

Probably a copy of your YADE python script is as much (at least..) required here as the copy of your PBS script..
Your issue may depend on the kind of commands you're using to let YADE run. For instance O.run(someNumberOfIterations,wait=1) would probably be much better in your case than O.run()

Revision history for this message
Matt Kesseler (evxmk9) said :
#3

Here is the YADE script.

from yade import pack,bodiesHandling,utils,export
from math import tan,sin,cos,sqrt

internalfrictionangle=28*(pi/180)
externalfrictionangle=20*(pi/180)
width=0.1
rampangle=40*(pi/180)
shutterheight=0.2
radius=0.01
cwidth=width
insertradius=0.2464
shutterpoint=1.448+insertradius*tan(rampangle/2)
ramplength=shutterpoint+shutterheight/tan(rampangle)
runofflength=1.7
generationheight=1.25*cos(rampangle)*shutterheight
damp=0.01

finish=6000000
interval=1000

O.materials.append(FrictMat(density=8050,young=210e6,poisson=0.25,frictionAngle=externalfrictionangle,label='steel'))

#shutter plate(physical)
O.bodies.append(geom.facetBox((cwidth/2,cos(rampangle)*shutterpoint-sin(rampangle)*shutterheight/2,sin(rampangle)*shutterpoint+cos(rampangle)*shutterheight/2),(cwidth/2,shutterheight/2,0), orientation=Quaternion((1,0,0),rampangle-90*(pi/180)),wallMask=63,color=(1,1,1),wire=False))

#shutter wall(for generation)
O.bodies.append(wall((0,cos(rampangle)*shutterpoint-sin(rampangle)*shutterheight,0),axis=1,sense=0,color=(1,1,1)))

O.bodies.append(wall((0,cos(rampangle)*shutterpoint+cos(rampangle)/tan(rampangle)*shutterheight,0),axis=1,sense=0,color=(1,1,1)))

#ramp surface
O.bodies.append(geom.facetBox((cwidth/2,cos(rampangle)*ramplength/2,sin(rampangle)*ramplength/2),(cwidth/2,ramplength/2,0), orientation=Quaternion((1,0,0),rampangle),wallMask=63,color=(1,1,1),wire=False))

#runoff zone
O.bodies.append(geom.facetBox((cwidth/2,-(runofflength+insertradius*tan(rampangle/2))/2,0),(cwidth/2,(runofflength+insertradius*tan(rampangle/2))/2,0),orientation=Quaternion((1,0,0),0),wallMask=63,color=(1,1,1),wire=False))

#curved transition (circular for now)
O.bodies.append(geom.facetCylinder((cwidth/2,-insertradius*tan(rampangle/2),insertradius),insertradius,cwidth,orientation=Quaternion((0,1,0),pi/2),segmentsNumber=16, wallMask=7,angleRange=(0,rampangle),closeGap=False,radiusTopInner=0.999*insertradius,radiusBottomInner=0.999*insertradius,color=(1,1,1),wire=False))

#invisible sidewalls
O.bodies.append(wall((0,0,0),axis=0,sense=0,color=(1,1,1)))
O.bodies.append(wall((cwidth,0,0),axis=0,sense=0,color=(1,1,1)))

O.materials.append(FrictMat(density=1730,young=65e6,poisson=0.2,frictionAngle=internalfrictionangle,label='sand'))

#particle generation
# sphere packing is not equivalent to particles in simulation, it contains only the pure geometry
sp=pack.SpherePack()
# generate randomly spheres with uniform radius distribution
sp.makeCloud((0,cos(rampangle)*shutterpoint-sin(rampangle)*shutterheight,sin(rampangle)*shutterpoint+cos(rampangle)*shutterheight),(cwidth,cos(rampangle)*shutterpoint+cos(rampangle)/tan(rampangle)*shutterheight,generationheight+sin(rampangle)*shutterpoint+cos(rampangle)*shutterheight),rMean=radius,rRelFuzz=0)
# add the sphere pack to the simulation
sp.toSimulation()

O.engines=[
   ForceResetter(),
   InsertionSortCollider([Bo1_Sphere_Aabb(),Bo1_Facet_Aabb(),Bo1_Wall_Aabb()]),
   InteractionLoop(
      [Ig2_Sphere_Sphere_ScGeom(),Ig2_Facet_Sphere_ScGeom(),Ig2_Wall_Sphere_ScGeom()],
      [Ip2_FrictMat_FrictMat_FrictPhys()],
      [Law2_ScGeom_FrictPhys_CundallStrack()]
   ),
   NewtonIntegrator(gravity=(0,0,-9.81),damping=damp),
   PyRunner(command='addData()',iterPeriod=interval),
   PyRunner(command='finish()',iterPeriod=finish),
   PyRunner(command='vtk()',iterPeriod=interval),
]
#O.dt=.5*PWaveTimeStep()
O.dt=5e-7

vtkExporter = export.VTKExporter('/panfs/panasas01.panfs.cluster/evxmk9/YADE/test-data/data')
vtkExporter.exportFacets(what=[('pos','b.state.pos')])

def vtk():
   vtkExporter.exportSpheres(what=[('dist','b.state.pos.norm()'),('particleVelocity','b.state.vel')])

O.run()

def finish():
   O.pause()

#def checkUnbalanced():

#O.trackEnergy=True

def addData():
   print O.iter
   for b in O.bodies:
      if isinstance(b.shape,Sphere): b.shape.color=scalarOnColorScale(b.state.vel.norm(),0,2.5)
   if O.iter==900000:
      for b in O.bodies:
         if isinstance(b.shape,Sphere):
            if b.state.pos[2]>(sin(rampangle)*shutterpoint+cos(rampangle)*shutterheight):
               O.bodies.erase(b.id)
   if O.iter==1000000:
      for b in O.bodies:
         if b.id<4:
            O.bodies.erase(b.id)

O.saveTmp()

Again this code did not exit automatically and functioned normally when I used it on my local PC or on the local server of the HPC (so without using PBS).

Revision history for this message
Jérôme Duriez (jduriez) said :
#4

I have no or very little idea what HPC or PBS are :-) but you do have a O.run() which directly returns to terminal (with the simulation running in the background), provided the terminal is "active" (like in a "normal" interactive YADE session).

Hence I'm not surprised of the behaviour you're facing, I guess the same would happen with yade-batch

I think adding O.wait() after O.run() may solve your problems without further changes in your script. See
https://yade-dem.org/doc/yade.wrapper.html#yade.wrapper.Omega.wait
and
https://yade-dem.org/doc/user.html#stop-conditions

Jerome

Revision history for this message
Best Bruno Chareyre (bruno-chareyre) said :
#5

As Jerome guessed, O.run() was the problem.
"O.run()" means to python, almost, "spawn an independant process running DEM in the background and resume executing the script immediatly".
At the end of the script the master process terminates. From a job scheduler point of view the job is finished, even if DEM is still running in the background.
If you mean to wait until the DEM is finished the command is O.run(N,true) (the default is O.run(N,false)).
Bruno

Revision history for this message
Bruno Chareyre (bruno-chareyre) said :
#6

Note that for the same reason your script is weird - at least unpredictable - independently of PBS:

O.run()
def addData():
   [...]
O.saveTmp()

If you are lucky python will read the definition of addData() before the function is effectively used at first iteration, if you are not lucky you may get a "unknown function" error. As for the saveTmp(), nobody can tell if it will save the initial state or the state after 1-2 iterations. For sure, it is not saving the final state.

B

Revision history for this message
Matt Kesseler (evxmk9) said :
#7

Thanks Bruno Chareyre, that solved my question.

Revision history for this message
Matt Kesseler (evxmk9) said :
#8

I can confirm that this issue was with O.run(); with a few other minor tweaks the script is now working fine. :D