Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.

...

CP2K is an MPI-parallel application. You can use either mpirun or srun as the job starter for CP2K. If you opt for mpirun, then, apart from loading the corresponding impi or openmpi modules, CPU and/or GPU pinning should be carefully carried out.

CP2K VersionModulefileRequirementCompute PartitionsSupport

CPU/GPU

Lise/Emmy
2022.2cp2k/2022.2

intel/2021.2 (Lise)

intel/2022.2 (Emmy)

CentOS 7

libint, fftw3, libxc, elpa, scalapack, cosma, xsmm, spglib, mkl, sirius, libvori and libbqb

(Haken) / (Fehler)(Haken) / (Haken)
2023.1cp2k/2023.1

intel/2021.2 (Lise)

intel/2022.2 (Emmy)

CentOS 7

Lise: libint, fftw3, libxc, elpa, scalapack, cosma, xsmm, spglib, mkl, sirius, libvori and libbqb.

Emmy: libint, fftw3, libxc, elpa, scalapack, cosma, xsmm, spglib, mkl and sirius.

(Haken) / (Fehler)(Haken) / (Haken)
2023.1cp2k/2023.1
openmpi/gcc.11/4.1.4
cuda/11.8
GPU A100

libint, fftw3, libxc, elpa, elpa_nvidia_gpu, scalapack, cosma, xsmm, dbcsr_acc, spglib,

mkl, sirius, offload_cuda, spla_gemm, m_offloading, libvdwxc

(Fehler) / (Haken)

(Haken) / (Fehler)

2023.2cp2k/2023.2

intel/2021.2

impi/2021.7.1

CentOS 7

libint, fftw3, libxc, elpa, scalapack, cosma, xsmm, spglib, mkl, sirius, libvori and libbqb

(Haken) / (Fehler)

(Haken) / (Fehler)

2023.2cp2k/2023.2openmpi/gcc.11/4.1.4
cuda/11.8
GPU A100

libint, fftw3, libxc, elpa, elpa_nvidia_gpu, scalapack, cosma, xsmm, dbcsr_acc, spglib,

mkl, sirius, offload_cuda, spla_gemm, m_offloading, libvdwxc

(Fehler) / (Haken)

(Haken) / (Fehler)

2024.1cp2k/2023.2impi/2021.13Rocky Linux 9omp,libint,fftw3,fftw3_mkl,libxc,elpa,parallel,mpi_f08,scalapack,xsmm,spglib,mkl,sirius,hdf5
(Haken) / (Fehler)(Haken) / (Fehler)

Remark: cp2k needs special attention when running on GPUs.

...

Codeblock
languagebash
titleLise (using srun)For compute nodes with Rocky Linux 9
#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --partition=cpu-clx
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --cpus-per-task=4
#SBATCH --job-name=cp2k

export SLURM_CPU_BIND=none
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}  

# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close

# Binding MPI tasks
export I_MPI_PIN=yes
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_CELL=core

# Our tests have shown that CP2K has better performance with psm2 as libfabric provider
# Check if this also apply to your system
# To stick to the default provider, comment out the following line
export FI_PROVIDER=psm2

module load intel/2021.2 impi/2021.7.113 cp2k/20232024.21
srunmpirun cp2k.psmp input > output


Codeblock
languagebash
titleLise For compute nodes with CentOS 7 (using mpirun)
#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --partition standard96
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --cpus-per-task=4
#SBATCH --job-name=cp2k

export SLURM_CPU_BIND=none
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}  

# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close

# Binding MPI tasks
export I_MPI_PIN=yes
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_CELL=core

module load intel/2021.2 impi/2021.7.1 cp2k/2023.2
mpirun cp2k.psmp input > output

...

Codeblock
languagebash
titleLise For compute nodes with CentOS 7 (using mpirun): on srun)
#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --partition standard96
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --cpus-per-task=4
#SBATCH --job-name=cp2k

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

module load intel/2021.2 impi/2021.7.1 cp2k/2023.2
srun cp2k.psmp input > output


Codeblock
languagebash
titleFor Nvidia A100 GPU nodes
#!/bin/bash 
#SBATCH --partition=gpu-a100  
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=18
#SBATCH --job-name=cp2k

export SLURM_CPU_BIND=none
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}    
export OMP_PLACES=cores
export OMP_PROC_BIND=close

module load gcc/11.3.0 openmpi/gcc.11/4.1.4 cuda/11.8 cp2k/2023.2

# gpu_bind.sh (see the following script) should be placed inside the same directory where cp2k will be executed
# Don't forget to make gpu_bind.sh executable by running: chmod +x gpu_bind.sh 
mpirun --bind-to core --map-by numa:PE=${SLURM_CPUS_PER_TASK} ./gpu_bind.sh cp2k.psmp input > output

...

Codeblock
languagebash
titlegpu_bind.sh
#!/bin/bash
export CUDA_VISIBLE_DEVICES=$OMPI_COMM_WORLD_LOCAL_RANK
$@


HTML Kommentar

Commenting out this block, as Lise and Emmy have separate documentation pages now.

Codeblock
languagebash
titleEmmy (using srun)
#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --cpus-per-task=4
#SBATCH --job-name=cp2k

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

module load intel/2022.2 impi/2021.6 cp2k/2023.1
srun cp2k.psmp input > output


Remark on OpenMP

Depending on the problem size, it may happen that the code stops may stop with a segmentation fault due to insufficient stack size or due to threads exceeding their stack space. To circumvent this, we recommend inserting in the jobscriptjob script:

Codeblock
languagebash
export OMP_STACKSIZE=512M
ulimit -s unlimited

...