Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.

This page contains all important information about the batch system Slurm, that you will need to run software on the HLRN. It does not contain every feature that Slurm has to offer. For that, please consult the official documentation and the man pages.

Submission of jobs mainly happens via the sbatch command using jobscript, but interactive jobs and node allocations are also possible using srun or  salloc. Resource selecttion (e.g. number of nodes or cores) is handled via command parameters, or may be specified in the job script.

Partitions

...

Lise

...

Memory physical

...

Memory allowed

...

1485 GB

...

very fat nodes for data pre- and postprocessing

...

16 dedicated

+48 on demand

...

If you do not request a partition, you will be placed on to the default partition, which is standard96 in Berlin and medium40 in Göttingen.

The default partitions are suitable for most calculations. The :test partitions are, as the name suggests, intended for shorter and smaller test runs. These have a higher priotity and a few dedicated nodes, but are limited in time and number of nodes. The :shared nodes are mainly for postprocessing. Nearly all nodes are exclusive to one job, except for the nodes in these :shared partitions.

Parameters

...

-p <name>

...

# CPUs per task

...

Content

Inhalt

Slurm partitions

To match your job requirements with the hardware, you choose among the

Important slurm commands

The commands normally used for job control and management are

  • Job submission:
    sbatch <jobscript>
    srun <arguments> <command>
  • Job status of a specific job:
    squeue -j jobID for queues/running jobs
    $ scontrol show job jobID for full job information (even after the job finished).
  • Job cancellation:
    scancel jobID
    scancel -i -u $USER cancel all your jobs (-u $USER) but ask for every job (-i)
    scancel -9 send kill SIGKILL instead of SIGTERM
  • Job overview:
    $ squeue -l --me
  • Job start (estimated):
    squeue --start -j jobID
  • Workload overview of the whole system: sinfo (esp. sinfo --format="%25C %A") , squeue -l

Job Scripts

A job script can be any script that contains special instruction for Slurm. Most commonly used forms are shell scripts, such as bash or plain sh. But other scripting languages (e.g. Python, Perl, R) are also possible.

Codeblock
linenumbers
languagebash
titleExample Batch Scripttrue
#!/bin/bash

#SBATCH -p medium40cpu-clx:test
#SBATCH -N 16
#SBATCH -t 06:00:00

module load impi
srun mybinary

...

More examples can be found at Examples and Recipes.

Important slurm commands

The commands normally used for job control and management are

  • Job submission:
    sbatch <jobscript>
    srun <arguments> <command>
  • Job status of a specific job:
    squeue -j jobID for queues/running jobs
    $ scontrol show job jobID for full job information (even after the job finished).
  • Job cancellation:
    scancel jobID
    scancel -i -u $USER cancel all your jobs (-u $USER) but ask for every job (-i)
    scancel -9 send kill SIGKILL instead of SIGTERM

...

Parameters

ParameterSBATCH flagComment
# nodes-N <#>
# tasks-n <#>
# tasks per node#SBATCH --tasks-per-node <#>Different defaults between mpirun and srun
partition

-p <name>

e.g. cpu-clx, overview: Slurm partition CPU CLX

# CPUs per task

-c <#>interesting for OpenMP/Hybrid jobs
Wall time limit-t hh:mm:ss
Mail--mail-type=ALLSee sbatch manpage for different types
Project/Account-A <project>Specify project for core hour accounting

Job Walltime

The maximum runtime is set per partition and can be viewed either on the system with sinfo  or here. There is no minimum walltime (we cannot stop your jobs from finishing, obviously), but a walltime of at least 1 hour is encouraged. A large amount of smaller, shorter jobs can cause problems with our accounting system. The occasional short job is fine, but if you submit larger amounts of jobs that finish (or crash) quickly, we might have to intervene and temporarily suspend your account. If you have lots of smaller workloads, please consider combining them into a single job that uses at least 1 hour.

Select the project account

Batch jobs are submitted by a user account to the compute system.

  • For each job the user chooses one project that will be charged by the job. At the beginning of the lifetime of the User Account the default project is the Test Project.
  • The user controls the project for a job using the option --account at submit time.
  • For the User Account the default project for computing time can be changed under the link User Data on the Portal NHR@ZIB.

Codeblock
titleExample: account for unsafe-one job
To charge the account myaccount
add the following line to the job script. 
#SBATCH --account=myaccount

After job script submission the batch system checks the project for account coverage and authorizes the job for scheduling. Otherwise the job is rejected, please notice the error message:

Codeblock
titleExample: out of core hour
You can check the account of a job that is out of core hour.
> squeue
... myaccount ... AccountOutOfNPL ...

Interactive jobs
Anker
interactive_jobs
interactive_jobs

For using compute resources interactively, e.g. to follow the execution of MPI programs, the following steps are required. Note that non-interactive batch jobs via job scripts (see below) are the primary way of using the compute resources.

  1. A resource allocation for interactive usage has to be requested first with the salloc --interactive command which should also include your resource requirements.
  2. When salloc successfully allocated the requested resources, you have to issue an additional srun command to work one of the allocated nodes (see example below) if you want to work on the compute node.
  3. Afterwards, srun or MPI launch commands, like mpirun or mpiexec, can be used to start parallel programs (see according user guides)
Codeblock
languagetext
blogin1 ~ $ salloc -t 00:10:00 -p cpu-clx:test -N2 --tasks-per-node 24
salloc: Granted job allocation [...]
salloc: Waiting for resource configuration
salloc: Nodes bcn[1001,1003] are ready for job
# To get a shell on one of the allocated nodes
blogin1 ~ $ srun --pty --interactive --preserve-env ${SHELL}
bcn1001 ~ $ srun hostname | sort | uniq -c
     24 bcn1001
     24 bcn1003
bcn1001 ~ $ exit
# Exit a second time for Berlin/Lise 
blogin1:~ > exit
salloc: Relinquishing job allocation [...]

Using the Shared Nodes

We provide a varying number of nodes from the large40 and large96 partitions as post processeing nodes in a shared mode, so that multiple jobs can run at once on a single node. You can request CPUs and memory and should take care, that you do not exceed your limits. For each CPU/Hyperthread, there is about 9.6Gb of Memory on large40:shared or 4 on the large96:shared partition.

The maximum walltime on the shared partitions is currently 2 days.

Erweitern
titleExample Job for the shared partition

This is an example for a job script using 10 cores. As this is not a MPI job, srun/mpirun is not needed. This jobs memory usage should not exceed

Mb

Codeblock
#!/bin/bash
#SBATCH -p large96:shared
#SBATCH -t 1-0 #one day
#SBATCH -n 10
#SBATCH -N 1

python postprocessing.py

Advanced Options

Slurm offers a lot of options for job allocation, process placement, job dependencies and arrays and much more. We cannot exhaustively cover all topics here. As mentioned at the top of the page, please consult the official documentation and the man pages for an in depth description of all parameters.

Job Arrays

If you need to submit a large number of similar jobs, please do use for loops to submet them, but instead use job arrays (this lessens the burden on the scheduler). Arrays can be defined using the -a <number of jobs> option. To divide your workload on to the different jobs within your jobscript, there are several environment variables that can be used:

Erweitern
Kein Format
nopaneltrue
SLURM_ARRAY_TASK_COUNT
    Total number of tasks in a array. 
SLURM_ARRAY_TASK_ID
    Job array ID (index) number. 
SLURM_ARRAY_TASK_MAX
    Job array's maximum ID (index) number. 
SLURM_ARRAY_TASK_MIN
    Job array's minimum ID (index) number. 
SLURM_ARRAY_TASK_STEP
    Job array's index step size. 
SLURM_ARRAY_JOB_ID
    Job array's master job ID number. 
Codeblock
languagebash
titleExample of an array job
collapsetrue
#!/bin/bash #SBATCH -p standard96 #SBATCH -t 12:00:00 #one day #SBATCH -N 16 #SBATCH --tasks-per-node 96 #SBATCH -a 0-3 #SBATCH -o arrayjob-%A_%a #"%A" is replaced by the job ID and "%a" with the array index. [...]