...
- By default, the
srun
command gets exclusive access to all resources of the job allocation and uses all tasks- you therefore need to limit
srun
to only use part of the allocation - this includes implicitly granted resources, i.e. memory and GPUs
- the
--exact
flag is needed. - if running non-mpi programs, use the
-c
option to denote the number of cores, each process should have access to
- you therefore need to limit
srun
waits for the program to finish, so you need to start concurrent processes in the backgroundGood default memory per cpu values (without hyperthreading) are usually are:
standard96 large96 huge96 medium40 large40/gpu --mem-per-cpu
3770M
7781M 15854M4525M 19075M
Examples
Codeblock | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash #SBATCH -p standard96 #SBATCH -t 06:00:00 #SBATCH -N 1 srun --exact -n1 -c 10 --mem-per-cpu 3770M ./program1 & srun --exact -n1 -c 80 --mem-per-cpu 3770M ./program2 & srun --exact -n1 -c 6 --mem-per-cpu 3770M ./program3 & wait |
...
Codeblock | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash #SBATCH --partition medium40standard96:test # adjust partition as needed #SBATCH --nodes 1 # more than 1 node can be used #SBATCH --tasks-per-node 4096 # one task per CPU core, adjust for partition # set memory available per core MEM_PER_CORE=4525 # must be set to value that corresponds with partition # see https://www.hlrn.de/doc/display/PUB/Multiple+concurrent+programs+on+a+single+node # Define srun arguments: srun="srun -n1 -N1 --exclusive --mem-per-cpu $MEM_PER_CORE" # --exclusive ensures srun uses distinct CPUs for each job step # -N1 -n1 allocates a single core to each task # Define parallel arguments: parallel="parallel -N 1 --delay .2 -j $SLURM_NTASKS --joblog parallel_job.log" # -N number of argument you want to pass to task script # -j number of parallel tasks (determined from resources provided by Slurm) # --delay .2 prevents overloading the controlling node on short jobs # --resume add if needed to use joblog to continue an interrupted run (job resubmitted) # --joblog creates a log-file, required for resuming # Run the tasks in parallel $parallel "$srun ./task.sh {1}" ::: {1..100} # task.sh executable(!) script with the task to complete, may depend on some input parameter # ::: {a..b} range of parameters, alternatively $(seq 100) should also work # {1} parameter from range is passed here, multiple parameters can be used with # additional {i}, e.g. {2} {3} (refer to parallel documentation) |
...
Codeblock | ||||||
---|---|---|---|---|---|---|
| ||||||
$ parallel --xapply echo {1} {2} ::: 1 2 3 ::: a b c 1 a 2 b 3 c $ parallel echo {1} {2} ::: 1 2 3 ::: a b c 1 a 1 b 1 c 2 a 2 b 2 c 3 a 3 b 3 c |
...