HPCaVe servers use a queuing system to match users jobs with available computing resources, therefore optimizing the servers workload.

Users submit their programs to the job scheduler (PBS), which maintains a queue of jobs and distributes them on the compute nodes according to the servers status, scheduling policies and jobs parameters:

  • Number of compute nodes / cores
  • Estimated execution time
  • Required memory
  • Number of jobs already queued by the user

The interface with PBS is done via a text file – the PBS script – created by the user, which will define your job requirements and execution steps. This file is mainly comprised of two sections:

  1. the header in which you specify the job requirements (execution time, number of CPU cores to use, memory requirements…) in the form of PBS directives.
  2. the body in which you will write the commands to load specific softwares, define environment variables, and run your job.

PBS script: header

The header is a succession of PBS directives which syntax is

The most relevant options on HPCaVe servers are:

  • #PBS -S /bin/bash Shell to use
  • #PBS -N jobName Name of the job
  • #PBS -o stdOut.txt Standard output file
  • #PBS -e stdErr.txt Standard error file
  • #PBS -j oe Redirects stderr in stdout
  • #PBS -q qalpha Defines the target queue (either qalpha or qbeta)
  • #PBS -l walltime=10:00:00 Maximal execution time (10h here)
  • #PBS -l select=2:ncpus=24 Resource requirements (CPU cores, MPI processes, memory…). That is the only “required” option.

Here is, for instance, a typical header for a MPI job, spawning on 2 compute nodes with 12 MPI processes on each core, and requesting a maximum memory of 64 GB with a maximum execution time of 20 minutes:

PBS script: script body

After having defined your resources needs in the header, you will have to define the execution steps of your job. This includes (but is not limited to):

  • Setting environment variables
  • Copying data files, executables
  • Specifying the output directories
  • Specifying the execution command …

Here is a list of the relevant environment variables you have access to:

Environment variables

Different environment variables are accessible in the PBS script, which will allow you to interact with PBS, temporary directories created on job launch as well as various job-related variables:

  • $PBS_O_WORKDIR Directory where the qsub command was executed. Useful with the cd (change directory) command to change your current directory to your working directory.
  • $TMPDIR Local temporary disk storage unique to each node and each job. This directory is automatically created at the beginning of the job and deleted at the end of the job
  • $USER User Name (NetID). Useful if you would like to dynamically generate a directory on some scratch space.
  • $HOSTNAME Name of the computer currently running the script. This should be one of the nodes listed in the file $PBS_NODEFILE.
  • $HOST Same as $HOSTNAME.
  • $PBS_JOBID Job ID number given to this job. This number is used by many of the job monitoring programs such as qstat, showstart, and dque
  • $PBS_JOBNAME Name of the job. This can be set using the -N option in the PBS script (or from the command line). The default job name is the name of the PBS script.
  • $PBS_NODEFILE Name of the file that contains a list of the HOSTS provided for the job.
  • $PBS_ARRAYID Array ID numbers for jobs submitted with the -t flag. For example a job submitted with #PBS -t 1-8 will run eight identical copies of the shell script. The value of $PBS_ARRAYID will be an integer between 1 and 8.
  • $PBS_VNODENUM Used with pbsdsh to determine the task number of each processor. For more information see http://www.ep.ph.bham.ac.uk/general/support/torquepbsdsh.html.
  • $PBS_O_PATH Original PBS path. Used with pbsdsh.
  • $PBS_NUM_PPN Number of cores requested (per node)

 

Submit your script/job

Submitting your job to the PBS scheduler is easy once you have authored the corresponding PBS script file, using the qsub command:

For instance, if you saved your script file as myScript.sh, just run the following command in a terminal:

 

Note that you can also specify PBS directives in the command line as qsub options, which will override the ones defined in the script.

For instance, to override the CPU requirements and target queue of a PBS script, run the command:

 

The common approach to specify your CPU needs is however to specify your requirements in the header of the script file, and to give as an option to qsub the target machine you wish your job to run on:

And at the command line:

There is a difference in submission options between Scratchbeta (ICE XA) and Scratchalpha (UV-2000)

  • On MeSU alpha, the whole machine with its 1024 cores is seen as a single entity by both OS and PBS. When you specify your Needed CPU Number you can just say even if one blade has only 8 cores.
  • The behaviour on MeSU-beta is different, as all blades have an OS and a PBS daemon, and are seen with their 24 cores each. So when you select your cpu number you CANNOT say select=1 if you need more than 24 cpus It will never run because means that you want one node with 32 cores.
    Prefer multiples of 2 with a maximum of 24 cores on your lines
    will give you 24 cores in total (3 nodes with 8 cores each).
    will give you 48 cores in total (4 nodes with 12 cores each).