This page lists important points concerning different aspects of MeSU Supercomputer.
- Test the scalability: before running your jobs on a lot of cores, make sure that the scalability is as expected. Even if you have a program which has been proved scalable on other systems, try to test as much as possible the scalability of your program on MeSU : start with one core and increase gradually until you reach diminishing returns in execution time reduction.
- Implement checkpointing: if your jobs should run for a long time, think about implementing checkpointing. We can not guarantee a continuous availability of the servers (electricity or cooling problems, maintenance, unexpected crashes might occur). Making sure that your job frequently dumps its current state on the hard drive could eventually allow you to resume your work in case of interruption.
- Mind the walltime : do not risk to be stopped at the time limit. Most queues will indeed interrupt your job after 12 hours of computation (you can check that with qstat -q).
- Don’t request too much : only request the number of processors you actually need to get results in a reasonable amount of time. Requiring too much resources and consuming a large share of computing resources will reduce your chance of being scheduled in a short time frame.
- Launch your jobs during off-peak hours: the usage of the servers is lower early in the morning or during the night, do not hesitate to launch large jobs at this period.
Files and data management
- Use /scratch volumes, which are optimized for performance. You can find more information on those file volumes here.
- Backup your data, as the /scratchalpha and /scratchbeta are not backed up, and that we can not guarantee a 100% failsafe back up of your /home directories.
- Backups are not an “undelete”, and your /home restoration in case of incident mainly depends on the time between the deletion of files and the backup request to HPCaVe admins, as well as the life span of the file (did it exist long enough?).
- Do not let your job write heavy output to your /home directory. Not only your quota will be exceeded quickly, but you will experience bad performance in comparison of scratch directories.
- Respect your quota: 30 Go on home. Do not leave unused data in scratch volumes (arbitrary purge of unused data may happen without notice in case of overload).
- Keep your data secure : MeSU Supercomputer is secured, but often scanned on the network by would-be intruders. Although highly improbable, we can not guarantee that a data thievery won’t happen.
- Strong personal password : Your password must be strong/long/complex enough, do not share it with others (DSI policies).
- Respect the DSI policy
- Keep the software on your laptops/PC up to date.
- Keep in mind that the Network traffic is monitored by the DSI