List of Slurm Partitions
The compute nodes of Lise in Berlin (blogin.hlrn.de) and Emmy in Göttingen (glogin.hlrn.de) are organized via the following SLURM partitions:
Lise (Berlin)
Partition name | Nodes | CPU | Main memory (GB) | Max. nodes per job | Max. walltime | Remark | |
---|---|---|---|---|---|---|---|
standard96 | 1204 | Cascade 9242 | 362 | 512 | 16 / 500 | 12:00:00 | default partition |
standard96:test | 32 dedicated +128 on demand | 362 | 16 | 1 / 500 | 1:00:00 | test nodes with higher priority but lower walltime | |
large96 | 28 | 747 | 8 | 16 / 500 | 12:00:00 | fat memory nodes | |
large96:test | 2 dedicated +2 on demand | 747 | 2 | 1 / 500 | 1:00:00 | fat memory test nodes with higher priority but lower walltime | |
large96:shared | 2 dedicated | 747 | 1 | 16 / 500 | 48:00:00 | fat memory nodes for data pre- and postprocessing | |
huge96 | 2 | 1 522 | 1 | 16 / 500 | 24:00:00 | very fat memory nodes for data pre- and postprocessing |
See Slurm usage how to pass the 12h walltime limit with job dependencies.
Fat-Tree Network of Lise
See OPA Fat Tree network of Lise
Emmy (Göttingen)
Partition (number holds cores per node) | Node name | Nodes | Max. nodes per job | Max. jobs per user | Usable memory MB per node | CPU, GPU type | Shared | NPL per node hour | Remark | |
---|---|---|---|---|---|---|---|---|---|---|
standard96 | gcn# | 12:00:00 | 996 | 256 | unlimited | 362 000 | Cascade 9242 | ✘ | 96 | default partition |
standard96:test | gcn# | 1:00:00 | 8 dedicated +128 on demand | 16 | unlimited | 362 000 | Cascade 9242 | ✘ | 96 | test nodes with higher priority but lower walltime |
large96 | gfn# | 12:00:00 | 12 | 2 | unlimited | 747 000 | Cascade 9242 | ✘ | 144 | fat memory nodes |
large96:test | gfn# | 1:00:00 | 2 dedicated +2 on demand | 2 | unlimited | 747 000 | Cascade 9242 | ✘ | 144 | fat memory test nodes with higher priority but lower walltime |
large96:shared | gfn# | 48:00:00 | 2 dedicated +6 on demand | 1 | unlimited | 747 000 | Cascade 9242 | ✓ | 144 | fat memory nodes for data pre- and postprocessing |
huge96 | gsn# | 24:00:00 | 2 | 1 | unlimited | 1522 000 | Cascade 9242 | ✘ | 192 | very fat memory nodes for data pre- and postprocessing |
medium40 | gcn# | 48:00:00 | 424 | 128 | unlimited | 181 000 | Skylake 6148 | ✘ | 40 | |
medium40:test | gcn# | 1:00:00 | 8 dedicated +64 on demand | 8 | unlimited | 181 000 | Skylake 6148 | ✘ | 40 | test nodes with higher priority but lower walltime |
large40 | gfn# | 48:00:00 | 12 | 4 | unlimited | 764 000 | Skylake 6148 | ✘ | 80 | fat memory nodes |
large40:test | gfn# | 1:00:00 | 2 dedicated +2 on demand | 2 | unlimited | 764 000 | Skylake 6148 | ✘ | 80 | fat memory test nodes with higher priority but lower walltime |
large40:shared | gfn# | 48:00:00 | 2 dedicated +6 on demand | 1 | unlimited | 764 000 | Skylake 6148 | ✓ | 80 | fat memory nodes for data pre- and postprocessing |
grete | ggpu# | 48:00:00 | 33 | 8 | unlimited | 500 000 MB per node (40GB HBM per GPU) | Zen3 EPYC 7513 + 4 NVidia A100 40GB | ✘ | 600 | see GPU Usage |
grete:shared | ggpu# | 48:00:00 | 38 | 1 | unlimited | 500 000 MB, 764 000 MB, or 1 000 000 MB per node (32 GB, 40GB, or 80GB HBM per GPU) | Skylake 6148 + 4 Nvidia V100 32GB, Zen3 EPYC 7513 + 4 NVidia A100 40GB, and Zen2 EPYC 7662 + 8 NVidia A100 80GB | ✓ | 150 per GPU | |
grete:interactive | ggpu# | 48:00:00 | 6 | 1 | unlimited | 764 000 MB (32 GB per GPU) or 500 000 MB (10GB or 20GB HBM per MiG slice) | Skylake 6148 + 4 Nvidia V100 32GB, Zen3 EPYC 7513 + 4 NVidia A100 40GB splitted in 2g.10gb and 3g.20gb slices | ✓ | 150 per GPU (V100) or 47 per MiG slice (A100) | see GPU Usage A100 GPUs are split into slices via MIG (3 slices per GPU) |
grete:preemptible | ggpu# | 48:00:00 | 6 | 1 | unlimited | 764 000 MB (32 GB per GPU) or 500 000 MB (10GB or 20GB HBM per MiG slice) | Skylake 6148 + 4 Nvidia V100 32GB, Zen3 EPYC 7513 + 4 NVidia A100 40GB splitted in 2g.10gb and 3g.20gb slices | ✓ | 150 per GPU (V100) or 47 per MiG slice (A100) |
* 600 for the nodes with 4 GPUs, and 1200 for the nodes with 8 GPUs
Which partition to choose?
If you do not request a partition, your job will be placed in the default partition, which is standard96.
The default partitions are suitable for most calculations. The :test partitions are, as the name suggests, intended for shorter and smaller test runs. These have a higher priority and a few dedicated nodes, but are limited in time and number of nodes. Shared nodes are suitable for pre- and postprocessing. A job running on a shared node is only accounted for its core fraction (cores of job / all cores per node). All non-shared nodes are exclusive to one job, which implies that full NPL are paid.
Details about the CPU/GPU types can be found below.
The network topology is described here.
The available home/local-ssd/work/perm storages are discussed in Storage Systems.
An overview of all partitions and node statuses is provided by: sinfo -r
To see detailed information about a nodes type: scontrol show node <nodename>
List of CPUs and GPUs
Short name | Link to manufacturer specifications | Where to find | Units per node | Cores per unit | Clock speed |
---|---|---|---|---|---|
Cascade 9242 | Intel Cascade Lake Platinum 9242 (CLX-AP) | Lise and Emmy compute partitions | 2 | 48 | 2.3 |
Cascade 4210 | Intel Cascade Lake Silver 4210 (CLX) | blogin[1-8], glogin[3-8] | 2 | 10 | 2.2 |
Skylake 6148 | Intel Skylake Gold 6148 | Emmy compute partitions | 2 | 20 | 2.4 |
Skylake 4110 | Intel Skylake Silver 4110 | glogin[1-2] | 2 | 8 | 2.1 |
Tesla V100 | NVIDIA Tesla V100 32GB | Emmy grete partitions | 4 | 640/5120* | |
Tesla A100 | NVIDIA Tesla A100 40GB and 80GB | Emmy grete partitions | 4 or 8 | 432/6912* |
*Tensor Cores / CUDA FP64 Cores