This guide is dedicated to the usage of MeSU Supercomputer and should help you to get a better understanding of the machines you are going to use.
Make sure to fully understand the first sections content in order to be able to follow the walkthrough in the “try it yourself” section.
A note on parallel computing: Newcomers unfamiliar with the basis of parallel computing are invited to consult the excellent LLNL tutorial on the subject, as well as an explanation of the differences between the two main programming methods used in parallel computing: OpenMP & MPI.
1 – System Overview
MeSU Supercomputer architecture is more complex than a personal computer. It is made of different kinds of nodes (a node represents a subset of a server, dedicated to specific tasks) – login nodes and computing nodes – which communicate together via a specific program called a batch scheduler.
1.1 – Login nodes
In order to run a program on a computing resource, you will have to first log into the login nodes. Login nodes are dedicated and separate small servers which act as a secure gateway between your computer and the computing nodes, on which your program will be running.
Note: login nodes are not meant for heavy computation, any intensive job running on them will be automatically killed.
The login nodes are dedicated to the following tasks:
- Transferring files from/to your personal computer,
- Compiling your source code,
- Setting up your jobs’ parameters and variables,
- Monitoring the execution of your jobs.
1.2 – Computing nodes
Computing nodes constitute the core of the supercomputer.
Your jobs will run on those nodes, and will be submitted, managed and monitored with the batch scheduler.
1.3 – Data directories
Different filesystems are mounted on MeSU Supercomputer. When logged in, you have a direct access to your /home directory, which is meant to host your codes, programs, and allow you to submit jobs. This directory is not meant for data storage, and is limited to 30 GB per user.
Other directories are to be used for data storage and computation :
- /scratchalpha/username: Data directory, dedicated to MeSU-alpha
- /scratchbeta/username: Data directory, dedicated to MeSU-beta and MeSU-gamma
Note: data located on /scratchalpha is only accessible from MeSU-alpha, and data on /scratchbeta from MeSU-beta and MeSU-gamma.
1.4 – The batch scheduler
A batch scheduler is a very specific software managing all users’ jobs submissions, in order to prioritize and allocate resources for every of them. Acting as a gateway between the login nodes and the computing nodes, its role is to optimize the cluster utilization according to the stack of pending jobs, and their requested resources (execution time, number of CPUs, maximum memory…). The batch scheduler used on MeSU Supercomputer is called PBS.
2 – Connection to HPCave servers
If you did not already opened an account, please consult the dedicated page.
In order to launch your first jobs, you will first need to login to the login nodes, using the Secure Shell communication protocol (ssh), which guarantees a secure transfer of data.
You can either use ssh via a command line interface (CLI) or a graphical application.
To connect, type in the terminal (make sure to replace “your-username” with your real username) :
or use the X11 session forwarding if you want to have access to the GUI (Graphical User Interface) of the softwares you are using:
ssh -Y firstname.lastname@example.org
After typing in your password, you should be welcomed with a message provided by the system administrator, which will give you information about the current status of MeSU and future maintenance operations.
3 – Uploading files from your computer (and back!)
In order to upload files to MeSU Supercomputer, you will either need to use a specialized FTP software such as Filezilla, or use terminal UNIX commands such as scp, rsync, fputs…
Please note that the servers are connected to internet, and you can therefore also download items with the command wget for instance, or by checking out a git, svn or mercurial repository:
svn checkout https://www.domain.com/path/to/svn/repo
git clone https://www.domain.com/path/to/git/repo
hg clone https://www.domain.com/path/to/mercurial/repo
3.1 – Filezilla (Windows, MacOsX and linux)
Filezilla is one of the most popular FTP (File Transfer Protocol) software available, and allows for an easy transfer of data between two network machines.
On the following screenshot, please fill the fields of the zone 2 as follows:
- Host: sftp://mesu.dsi.upmc.fr
- Username: Your MeSU username
- Port: 22 is the standard port for ssh protocol
Once you are done, hit the Quickconnect button, and transfer files between the zone 4 (your local computer) and the zone 5 (HPCaVe servers).
3.2 – scp (linux & MacOsX)
On a linux or MacOSX system (and through MobaXTerm for instance for Windows users), to transfer a source file called main.cpp to your session on HPCaVe servers:
scp /local/path/to/main.cpp email@example.com:/path/on/HPCaVE/
The scp command also supports wildcards (*) selection and directory copying:
scp *.txt firstname.lastname@example.org:/path/on/HPCaVE/
scp -r directory/ email@example.com:/path/on/HPCaVE/
In order to copy a file back from HPCaVe servers to your personal computer, you will first have to retrieve your local IP adress:
- “ifconfig” in a terminal in linux
- Apple Menu > System preferences > Network on MacOSX
then, while logged in to HPCaVe login nodes, type in a terminal (“user” is the username on your personal computer):
scp /path/to/file.txt user@your-IP-adress:/path/on/your/computer/
4 – Running a job
You will mainly use HPCaVe servers to run two very distinct types of programs:
- Already installed softwares.
- Softwares of which you have the source code (on your computer, or through a remote source control system), and need to install (compile).
Although the main linux utilities and compilers (grep, cat, gcc…) are available in your environment as soon as you connect to HPCaVe servers, not all functionalities and software are accessible by default.
If you wish to use a specific scientific software (OpenFOAM, Gromacs, FreeFem++ …) or a certain suite of developing tools (Intel MKL, MPI implementation, specific compiler versions, etc.), you will have to use modules.
4.1 – A note on “modules” (working with an installed software)
Modules act as software packages which you can load interactively in order to gain access to their contents. They are used to avoid conflicts between software and libraries versions, and to provide you at all time with a clean working environment ( for instance no environment variables, defining the location of software on the servers, should have the same name but point to different paths).
In order to know which modules are available and see if you need to use them, you can either consult the dedicated page or type the command:
To load a module and gain access to its content, type:
module load name-of-the-module
The specific program or software should now be available at the command line.
4.2 – Compiling your code
If you have access to the source code of an external software, or if you developed your own code, you will most probably be able to compile and run it on HPCaVe servers.
Compiling a code (transforming the source code in a binary executable file) is a whole topic by itself, but many tools are available on HPCaVe servers in regards with code development, for instance:
- Collection of open source compilers (gcc, gfortan, mpifortran, mpic++, mpicc…) in recent and older versions
- Extensive collection of scientific libraries
- Optimized versions of MPI and OpenMP for MeSU architecture
- Intel compilers and scientific libraries such as Intel MKL
- Profiling and debugging tools (gdb, gprof, valgrind, Intel VTunes…)
- Text editors (vi, vim, GNU Emacs, nano)
Note: If you did not load the correct modules before compiling your code, your compiler should output error messages specifying the missing libraries.
4.3 – Set up your job script from /HOME
As explained in the Overview section, the communication between the login nodes and the compute nodes is managed by PBS, to which you will request computing resources, specify your job parameters and commands to execute through a shell or python file, called a PBS script.
Whether you wish to work with an existing software or with a compiled program, you will therefore have to write such a file.
Minimal script example for mesu-alpha
#PBS -q alpha
#PBS -l select=1:ncpus=64
# Load the mpt module, to gain access to an optimized OpenMP version
module load mpt/2.18
# Copy the executable to the temporary directory (not necessary for existing softwares)
# the variable $PBS_O_WORKDIR designs the path from which the qsub command will be executed
cp $PBS_O_WORKDIR/myProgram .
# Environment variables
#./myProgram could here be a compiled executable file
./myProgram > log.txt
#Transfer log file from the compute nodes to the user directory
cp log.txt $PBS_O_WORKDIR
Note that although the line starting with a “#” will be parsed as comments, the first few lines of the file will allow you to specify different resources.
For instance, to request 32 cores on MeSU-alpha, you should replace the line 3 with :
#PBS -l select=1:ncpus=32
To work on MeSU-beta and request 96 cores, you should actually request a different queue and use 4 nodes (see MeSU-beta technical specifications) containing each 24 cores :
#PBS -q beta
#PBS -l select=4:ncpus=24
If you know your job will run for a given duration (for example 3 hours, 20 minutes and 10 seconds), specify the line :
#PBS -l walltime=03:20:10
NB: Be aware that your program will run in a temporary directory on the compute nodes, which won’t be accessible anymore at the end of the execution. You must therefore transfer any output data back to your working environment ($PBS_O_WORKDIR) on the login nodes (line 11).
4.4 – Set up your job script from /scratch directories
The scratch directories are created on your first login, and are located at /scratchalpha/$USER and /scratchbeta/$USER
If you do not tell PBS to use those spaces explicitly, your job will be executed from your current working directory (most often your home directory).
Steps to use scratch directories
- Compile your code if needed (you can do it in your /home directory, but also in your scratch spaces).
- Move any input data files to the appropriate scratch directory. If you plan on using MeSU-alpha for instance, move your input files to /scratchalpha/$USER
- If you wish to run your job directly from /scratchalpha:
- Copy your executable and your PBS script in a dedicated directory in your scratch space.
- Change your working directory to /scratchalpha/$USER with cd
- Launch your script with qsub from /scratchalpha
- If you wish to run your job from your /home directory:
- edit your PBS script to use the appropriate environment variables and read/write from/to the correct directories
- Launch your script from your home directory
Here is a script you could launch from your /home directory, which will write in your /scratchalpha directory:
#!/bin/bash #PBS -q alpha #PBS -l select=1:ncpus=16 #PBS -l walltime=10:00:00 #PBS -N myTestJob #load appropriate modules module purge module load intel/intel-compilers-18.0/18.0 module load mpt/2.18 #move to PBS_O_WORKDIR cd $PBS_O_WORKDIR # Define scratch space scratch alpha for UV scratchbeta for ICE SCRATCH=/scratchalpha/$USER/myprogram_scratch_space PROJECT=’my project name’ mkdir $SCRATCH mkdir $SCRATCH/$PROJECT # copy some input files to $SCRATCH directory cp some_input_files $SCRATCH/$PROJECT #execute your program cd $SCRATCH/$PROJECT || exit 1 myprogram 1> myprogram.out 2> myprogram.err # copy some output files to submittion directory and delete temporary work files cp -p some_output_files $PBS_O_WORKDIR || exit 1 #clean the temporary directory rm -rf "$SCRATCH/$PROJECT”/*
Here is a similar example for MeSU-beta (requesting 48 cores distributed on 2 nodes) :
#!/bin/bash #PBS -q beta #PBS -l select=2:ncpus=24:mpiprocs=24 #PBS -l walltime=60:00:00 #PBS -N mytestjob #PBS -j oe ## Use multiple of 2 with a maximum of 24 on 'ncpus' parameter, one node has 24 cores max ## With the 'select=3:ncpus=10:mpiprocs=10' option you get 30 cores on 3 nodes ## If you use select=1:ncpus=30 your job will NEVER run because no node has 30 cores. # load modules #. /etc/profile.d/modules.sh #load appropriate modules module purge module load intel-compilers-18.0/18.0 module load mpt/2.18 #move to PBS_O_WORKDIR cd $PBS_O_WORKDIR # Define scratch space scratchbeta on ICE XA SCRATCH=/scratchbeta/$USER/myprogram_scratch_space PROJECT=’my project name’ mkdir $SCRATCH mkdir $SCRATCH/$PROJECT # copy some input files to $SCRATCH directory cp some_input_files $SCRATCH/$PROJECT #execute your program ## With SGI MPT use 'mpiexec_mpt -np 30 myprogram' to use mpt correctly for example cd $SCRATCH/$PROJECT || exit 1 myprogram 1> myprogram.out 2> myprogram.err # copy some output files to submittion directory and delete temporary work files cp -p some_output_files $PBS_O_WORKDIR || exit 1 #clean the temporary directory rm -rf "$SCRATCH/$PROJECT”/*
4.5 – Submit a job
Once the PBS script has been correctly set up, requesting your job to run on the compute nodes is then as simple as executing the qsub command:
which will output on success a unique string, the job identifier (referred to later as jobID), in the form “123456.mesu2” which will be useful to get information on your job status.
5 – Checking your job status
The jobs you will submit to PBS do not usually run as soon as you have ran the command qsub. Indeed, PBS first has to parse your script file in order to insert your job in the queue system. Depending on the resources (number of nodes and CPU cores, maximum memory, walltime i.e. maximum time of your job) you have requested in the header of your PBS script, the batch scheduler will put your job in an appropriate waiting queue.
You can obtain information about the status of all jobs by typing one of the following commands:
# PBS built-in tools
qstat # to list all jobs
qstat -u $USER # to get information about your jobs
# MeSU specific tools (recommended)
qqueue # to list all jobs
qqueue -u $USER # to get information about your jobs
These commands should tell you if your job is waiting to run (Q or PD), running (R), on hold (H) or finishing (E).
If you want to check if there are available resources for your job to start immediatly, use the following command :
qinfo # to list compute resources status
If your job does not appear in the output of qstat or qqueue, it might either have already been processed, or have failed because of a wrong PBS script. You could then run the tracejob command in order to have information on your job status:
$ tracejob 300026.mesu2
06/18/2018 14:44:19 S enqueuing into f1032c_p, state 1 hop 1
06/18/2018 14:44:19 S Job Queued at request of firstname.lastname@example.org, owner = email@example.com, job name = greenCB, queue = f1032c_p
06/18/2018 14:44:19 A queue=f1032c_p
06/18/2018 14:44:20 L Considering job to run
06/18/2018 14:44:20 S Job Modified at request of Scheduler@mesu2.ib0.xa.dsi.upmc.fr
06/18/2018 14:44:20 S Job Run at request of Scheduler@mesu2.ib0.xa.dsi.upmc.fr on exec_vnode (mesu3:ncpus=8)
06/18/2018 14:44:20 L Job run
06/18/2018 14:44:20 A user=bernardg group=ldapg project=_pbs_project_default jobname=greenCB queue=f1032c_p ctime=1529325859 qtime=1529325859 etime=1529325859 start=1529325860 exec_host=mesu3/0*8
exec_vnode=(mesu3:ncpus=8) Resource_List.ncpus=8 Resource_List.nodect=1 Resource_List.place=free Resource_List.select=1:ncpus=8 Resource_List.walltime=100:00:00 resource_assigned.ncpus=8
Once your job has finished running (or crashed), a log file ending in .o + jobID should be written in the directory from which you submitted your job with qsub, for instance myJob.o123456.
Finally, the command qstat can be used with specific options to get more details about a finished job:
$ qstat -fx 300026.mesu2
Job Id: 300026.mesu2
Job_Name = greenCB
Job_Owner = firstname.lastname@example.org
resources_used.cpupercent = 99
resources_used.cput = 00:27:49
resources_used.mem = 7426560kb
resources_used.ncpus = 8
resources_used.vmem = 7467252kb
resources_used.walltime = 00:27:50
job_state = R
queue = f1032c_p
server = mesu2
Checkpoint = u
Resource_List.ncpus = 8
Resource_List.nodect = 1
Resource_List.place = free
Resource_List.select = 1:ncpus=8
Resource_List.walltime = 100:00:00
Submit_arguments = -q f1032c_p ../MultiTwin_2018/BlastProg/launch_cleanb.sh
pset = iru2=””
project = _pbs_project_default
6 – Try it yourself!
You should now be able to run your first computing jobs!
- To follow the example showing you how to compile and run a program from source code, download and extract the tutorial archive.
- If you wish to use your own code, try to replicate the steps in 6.1 – Using a dummy parallel C code, or if you want to try using an existing software, refer yourself to 6.2 – Running an existing software.
6.0 – Common part
- First, connect to HPCaVe and create a directory to host your test codes, data and outputs. If you are using Windows, you can use a graphical client such as MobaXTerm for this purpose. On linux and MacOSX, type in a terminal:
logout # to quit the ssh session
Note that you can either logout with the logout command, or by typing ctrl + D.
- Transfer necessary files from your computer to HPCaVe servers. Windows users can use Filezilla for instance to copy the archive source code and scripts, while linux and MacOSX users can type in a terminal:
scp -r /path/to/wiki/* email@example.com:~/hpcaveTest
- You can now connect back to HPCaVe servers, and check that your files have properly been copied to the hpcaveTest directory.
6.1 – Using a dummy parallel C code (using OpenMP on MeSU-alpha)
In this section, we are going to compile and run a dummy OpenMP parallel code on MeSU-alpha, which will approximate the value of pi in parallel according to the Monte Carlo method.
Once the source code has been transfered to HPCaVe servers, navigate to the wiki/openmp directory, and first compile the source code with gcc, enabling the fopenmp flag to enable OpenMP support:
gcc pi_openmp.c -o pi_openmp -fopenmp
You should now have an executable file, called pi_openmp, which we will submit to PBS through the script file called script_openmp.sh (this script will use /home as working space):
#PBS -q alpha
#PBS -l select=1:ncpus=16
#PBS -l walltime=04:00:00
module load mpt/2.18
#Copy the executable to the temporary directory
cp $PBS_O_WORKDIR/pi_openmp .
omplace -nt 16 ./pi_openmp > log.txt
cp log.txt $PBS_O_WORKDIR
In order to launch the job, you will have to run the command:
If the MeSU-alpha is not fully loaded, your job should run quite quickly, but remember that you can still check its execution status with qstat and qstat -fx jobID.
Once your job has been processed, a file called log.txt should have been written in the directory wiki/openmp, containing the result of your computation (the approximation of pi), as well as information on the time spent running.
Feel free to experiment and change the submission parameters or pieces of the code, as well as running on MeSU-beta instead of MeSU-alpha (by modifying the PBS -q parameter).
Note: Other samples are provided in the archive, to have a quick preview of working with MPI and hybrid MPI/OpenMP.
Keep in mind that when using mpi, you will have to compile your code with an adapted compiler such as mpicc, and run it with mpirun or mpiexec in the PBS script:
mpicc pi_mpi.c -o pi_mpi
mpirun -n 16 ./pi_mpi
6.2 – Running an existing software
The workflow to run an existing software (gromacs in this example) is simpler than compiling and running your own source code.
After having transfered your data, you should just have to create an appropriate PBS script file containing the commands allowing you to run the software. For instance, to run gromacs on 48 cores of MeSU-beta and on data contained in the /work directory:
#PBS -q beta
#PBS -l select=2:ncpus=24
#PBS -l walltime=04:00:00
module load gromacs
gromacs -i /work/username/myData.bin -n 48
cp output $PBS_O_WORKDIR