...
Compute system Lise at NHR@ZIB contains different Compute partitions for CPUs and GPUs. Your choice for the partition affects specific configurations of
- Login nodes,
- slurm partition (compute nodes of Compute partitions and Accounting), and
- Software.
Login nodes
...
Info |
---|
Hints for fair usage of the shared WORK ressource: Metadata Usage on WORK |
Using slurm batch system
To run your applications on the systems, you need to go through our batch system/scheduler: Slurm. The scheduler uses meta information about the job (requested node and core count, wall time, etc.) and then runs your program on the compute nodes, once the resources are available and your job is next in line. For a more in depth introduction, visit our Slurm documentation.
We distinguish two kinds of jobs:
- Interactive job execution
- Job script execution
Resource specification
To request resources, there are multiple flags to be used when submitting the job.
...
-p <name>
...
For using compute resources interactively, e.g. to follow the execution of MPI programs, the following steps are required. Note that non-interactive batch jobs via job scripts (see below) are the primary way of using the compute resources.
- A resource allocation for interactive usage has to be requested first with the
salloc --interactive
command which should also include your resource requirements. - When
salloc
successfully allocated the requested resources, you have to issue an additional srun command to work one of the allocated nodes (see example below) if you want to work on the compute node. - Afterwards,
srun
or MPI launch commands, likempirun
ormpiexec
, can be used to start parallel programs (see according user guides)
Codeblock | ||
---|---|---|
| ||
blogin1 ~ $ salloc -t 00:10:00 -p cpu-clx:test -N2 --tasks-per-node 24
salloc: Granted job allocation [...]
salloc: Waiting for resource configuration
salloc: Nodes bcn[1001,1003] are ready for job
# To get a shell on one of the allocated nodes
blogin1 ~ $ srun --pty --interactive --preserve-env ${SHELL}
bcn1001 ~ $ srun hostname | sort | uniq -c
24 bcn1001
24 bcn1003
bcn1001 ~ $ exit
# Exit a second time for Berlin/Lise
blogin1:~ > exit
salloc: Relinquishing job allocation [...] |
Job scripts
Please go to our webpage CPU CLX partition for more details about job scripts. For introduction, standard batch system jobs are executed applying the following steps:
- Provide (write) a batch job script, see the examples below.
- Submit the job script with the command
sbatch
(sbatch jobscript.sh
) - Monitor and control the job execution, e.g. with the commands
squeue
andscancel
(cancel the job).
Job Accounting
Accounting gives you more information about job accounting.
...