Content
Code execution
For examples for code execution, please visit Slurm CPU Genoa partition CPU CLX.
Code compilation
...
Codeblock |
---|
Codeblock |
---|
title | Serial icc |
---|
collapse | true |
---|
|
module load intel
icc -o hello.bin hello.c
ifort -o hello.bin hello.f90
icpc -o hello.bin hello.cpp |
Codeblock |
---|
title | OpenMP icc |
---|
collapse | true |
---|
|
module load intel
icc -qopenmp -o hello.bin hello.c
ifort -qopenmp -o hello.bin hello.f90
icpc -qopenmp -o hello.bin hello.cpp |
Gnu compiler
title | Serial gcc |
---|
collapse | true |
---|
|
module load gcc
gcc -o hello.bin hello.c
gfortran -o hello.bin hello.f90
g++ -o hello.bin hello.cpp |
...
Codeblock |
---|
title | OpenMP gcc |
---|
collapse | true |
---|
|
module load gcc
gcc -fopenmp -o hello.bin hello.c
gfortran -fopenmp -o hello.bin hello.f90
g++ -fopenmp -o hello.bin hello.cpp |
Slurm job script
The examples for slurm job scripts, e.g. myjobscipt.slurm, that cover the setup
...
Codeblock |
---|
|
#SBATCH --nodes=1
#SBATCH --partition=cpu-clx:testgenoa
./hello.bin |
Codeblock |
---|
title | OpenMP, full node |
---|
collapse | true |
---|
|
#SBATCH --nodes=1
#SBATCH --partition=cpu-clx:testgenoa
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=96
./hello.bin |
Codeblock |
---|
title | OpenMP, half node |
---|
collapse | true |
---|
|
#SBATCH --nodes=1
#SBATCH --partition=cpu-clx:test
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=48192
./hello.bin |
Codeblock |
---|
title | OpenMP, hyperthreading |
---|
collapse | true |
---|
|
#SBATCH --nodes=1
#SBATCH --partition=cpu-clx:testgenoa
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=192384
./hello.bin |
You can run different OpenMP codes at the same time. The examples cover the setup
- 2 nodes,
- 4 OpenMP codes run simultaneously.
- The code is not MPI parallel.
mpirun
is used to start the codes only.
Codeblock |
---|
title | OpenMP simultaneously |
---|
collapse | true |
---|
|
#SBATCH --nodes=2
#SBATCH --partition=cpu-clx:test
module load impi/2019.5
export SLURM_CPU_BIND=none
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=48
mpirun -ppn 2 \
-np 1 ./code1.bin : -np 1 ./code2.bin : -np 1 ./code3.bin : -np 1 ./code4.bin |
Codeblock |
---|
title | OpenMP simultaneously hyperthreading |
---|
collapse | true |
---|
|
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
module load impi/2019.5
export SLURM_CPU_BIND=none
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=96
mpirun -ppn 2 \
-np 1 ./code1.bin : -np 1 ./code2.bin : -np 1 ./code3.bin : -np 1 ./code4.bin |
Intel compiler flags
To make full use of the vectorizing capabilities of the CPUs, AVX512 instructions and the 512bit ZMM registers can be used with the following compile flags with the Intel compilers:
-xCORE-AVX512 -qopt-zmm-usage=high
However, high ZMM usage is not recommended in all cases (read moreImage Removed).
With GNU compilers (GCC 7.x and later), architecture-specific optimization for Skylake and Cascade Lake CPUs is enabled with
-march=skylake-avx512
Using the Intel MKL
The Intel® Math Kernel Library (Intel® MKL) is designed to run on multiple processors and operating systems. It is also compatible with several compilers and third party libraries, and provides different interfaces to the functionality. To support these different environments, tools, and interfaces Intel MKL provides multiple libraries from which to choose.
...