...
Slurm offers a lot of options for job allocation, process placement, job dependencies and arrays and much more. We cannot exhaustively cover all topics here. As mentioned at the top of the page, please consult the official documentation and the man pages for an in depth description of all parameters.
48h jobs
Most Compute node partitions have a maximal wall time of 12h. But large96:shared (and medium40, large40, large40:shared at Emmy only) offer 48h. During normal office hours one can request the extension of the wall time of a running job (mail with user and job ID to). Alternatively, on request () you can get special access to run 48h jobs on all partitions of Lise and/or Emmy (by adding #SBATCH -q 48h
). However, we recommend special access to the 48h QoS (#SBATCH -q 48h
) on last resort, only: We do not guaranty to refund your NPL on the 48h QoS if something goes wrong. Before, you should make sure that you exploited all possibilities to parallelize/speed up you code or make it restartable (see below).
...
Dependent & Restartable Jobs - How to pass the
...
wall time limit
If your simulation is restartable, it might be handy to automatically trigger a follow-up job. Simply provide the ID of the previous job as an additional sbatch argument:
...