...
HTML Kommentar |
---|
Commenting out this block, since Berlin and Göttingen have separate documentation pages now. Codeblock |
---|
language | bash |
---|
title | For Intel Skylake CPU compute nodes (Phase 1, Göttingen unsafe-only): |
---|
| #!/bin/bash
#SBATCH --time 12:00:00
#SBATCH --nodes 2
#SBATCH --tasks-per-node 40
export SLURM_CPU_BIND=none
module load impi/2019.5
module load vasp/5.4.4.p1
mpirun vasp_std |
|
The following example shows a job script that will run on the Nvidia A100 GPU nodes (Berlin). Per default, VASP will use one GPU per MPI task. If you plan to use 4 GPUs per node, you need to set 4 MPI tasks per node. Then, set the number of OpenMP threads to 18 (because 4x18=72 which is the number of CPU cores on GPU A100 partition) to speed up your calculation. This, however, also requires proper process pinning.
Codeblock |
---|
language | bash |
---|
title | For Nvidia A100 GPU compute nodes with CentOS 7(Berlin) |
---|
|
#!/bin/bash
#SBATCH --time =12:00:00
#SBATCH --nodes =2
#SBATCH --tasks-per-node 96
export SLURM_CPU_BIND=none=4
#SBATCH --cpus-per-task=18
#SBATCH --partition=gpu-a100
# Set the number of OpenMP threads as given by the SLURM parameter "cpus-per-task"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close
# Avoid hcoll as MPI collective algorithm
export OMPI_MCA_coll="^hcoll"
# You may need to adjust this limit, depending on the case
export OMP_STACKSIZE=512m
module load impinvhpc-hpcx/201923.51
module load vasp/56.4.4.p1
mpirun 1
# Carefully adjust ppr:2, if you don't use 4 MPI processes per node
mpirun --bind-to core --map-by ppr:2:socket:PE=${SLURM_CPUS_PER_TASK} vasp_std |
The following job script exemplifies how to run vasp 6.4.1 3 making use of OpenMP threads. Here, we have 2 OpenMP threads and 48 MPI tasks per node (the product of these 2 numbers should ideally be equal to the number of CPU cores per node).
...
Codeblock |
---|
language | bash |
---|
title | For compute nodes with CentOS 7 |
---|
|
#!/bin/bash
#SBATCH --time=12:00:00
#SBATCH --nodes=2
#SBATCH --tasks-per-node=48
#SBATCH --cpus-per-task=2
#SBATCH --partition=standard96
export SLURM_CPU_BIND=none
# Set the number of OpenMP threads as given by the SLURM parameter "cpus-per-task"
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Adjust the maximum stack size of OpenMP threads
export OMP_STACKSIZE=512m
# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close
# Binding MPI tasks
export I_MPI_PIN=yes
export I_MPI_PIN_DOMAIN=omp
export I_MPI_PIN_CELL=core
module load impi/2021.7.1
module load vasp/6.4.1
mpirun vasp_std |
The following example shows a job script that will run on the Nvidia A100 GPU nodes (Berlin). Per default, VASP will use one GPU per MPI task. If you plan to use 4 GPUs per node, you need to set 4 MPI tasks per node. Then, set the number of OpenMP threads to 18 (because 4x18=72 which is the number of CPU cores on GPU A100 partition) to speed up your calculation. This, however, also requires proper process pinning.last example demonstrates how to run a job with vasp 5.4.4.p1 on nodes withe CentOS7
Codeblock |
---|
language | bash |
---|
title | For Nvidia A100 GPU compute nodes (Berlin)with CentOS 7 |
---|
|
#!/bin/bash
#SBATCH --time= 12:00:00
#SBATCH --nodes= 2
#SBATCH --tasks-per-node=4
#SBATCH --cpus-per-task=18
#SBATCH --partition=gpu-a100
# Set the number of OpenMP threads as given by the SLURM parameter "cpus-per-task"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
# Binding OpenMP threads
export OMP_PLACES=cores
export OMP_PROC_BIND=close
# Avoid hcoll as MPI collective algorithm
export OMPI_MCA_coll="^hcoll"
# You may need to adjust this limit, depending on the case
export OMP_STACKSIZE=512m
module load nvhpc-hpcx/23.1 96
export SLURM_CPU_BIND=none
module load impi/2019.5
module load vasp/65.4.1
# Carefully adjust ppr:2, if you don't use 4 MPI processes per node
mpirun --bind-to core --map-by ppr:2:socket:PE=${SLURM_CPUS_PER_TASK}4.p1
mpirun vasp_std |