...
Slurm offers a lot of options for job allocation, process placement, job dependencies and arrays and much more. We cannot exhaustively cover all topics here. As mentioned at the top of the page, please consult the official slurm documentation and the man pages for an in depth description of all parameters.
+48 Hour Jobs & beyond
Most Slurm partitions CPU We prefer short job times and enable +48h jobs only as a last resort. Minimal walltimes have several advantages regarding efficient queuing (backfilling). If all jobs were +48h, two days before each maintenance our machine would be empty...
Most Slurm partition CPU CLX have a maximal wall time of 12 hours. In contrast, 48 hours is offered per default by all shared partitions, all grete partitions and medium40, large40 of Emmy.
During normal office hours, one can request the extension of the wall time of any running job (mail with your user and job ID to support[at]hlrn support@nhr.zib.de). Alternatively - also per mail request (including user and project ID) - permanent access to run 48h jobs on all partitions of Lise and/or Emmy can be granted (and be used by adding e.g. #SBATCH -q 48h
). Other Quality of Service levels for even longer runtimes can also be requested, but have additional restrictions regarding job size (number of nodes).
However, we recommend permanent access to the loing long running QoS only as a last resort. We do not guarantee to refund your NPL compute time on the long running QoS if something fails. Before, you You should exploit all possibilities to parallelize/speed up your code or make it restartable (see also below), first.
Dependent & Restartable Jobs - How to pass the wall time limit
...