CUDA (Compute Unified Device Architecture) is an interface to program Nvidia GPUs. It offers support to the languages such as C, C++, and Fortran.
To build and execute code on the GPU A100 partition, please login to
- a GPU A100 login node, like bgnlogin.nhr.zib.de.
- see also Quickstart
Code build
For code generation we recommend the software package NVIDIA hpcx which is a combination of compiler and powerful libraries, like e.g. CUDA, blas, and MPI.
bgnlogin1 $ module load nvhpc-hpcx/23.1 bgnlogin1 $ module list Currently Loaded Modulefiles: ... 4) hpcx 5) nvhpc-hpcx/23.1 bgnlogin1 $ nvc -cuda -gpu=cc8.0 cuda.c -o cuda.bin bgnlogin1 $ nvc -cuda -gpu=cc8.0 -cudalib=cublas cuda_cublas.c -o cuda_cublas.bin
- CUDA offers a blas library for the GPU
- and it can be used in combination with MPI.
bgnlogin1 $ module load nvhpc-hpcx/23.1 bgnlogin1 $ mpicc -cuda -gpu=cc8.0 -cudalib=cublas mpi_cuda_cublas.c -o mpi_cuda_cublas.bin
Code execution
All available slurm partitions for the A100 GPU partition you can see on Slurm partitions GPU A100.
#!/bin/bash #SBATCH --partition=gpu-a100:shared #SBATCH --gres=gpu:1 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=72 ./cuda.bin ./cuda_cublas.bin
#!/bin/bash #SBATCH --partition=gpu-a100 #SBATCH --gres=gpu:4 #SBATCH --nodes=2 #SBATCH --ntasks-per-node=72 module load nvhpc-hpcx/23.1 mpirun --np 8 --map-by ppr:2:socket:pe=1 ./mpi_cuda_cublas.bin
GPU-aware MPI
For efficient use of MPI-distributed GPU codes, an GPU/CUDA-aware MPI installation of Open MPI is available in the openmpi/gcc.11/4.1.4
environment module. Open MPI respects the resource requests made to Slurm. Thus, no special arguments are required to mpiexec/run
. Nevertheless, please consider and check the correct binding for your application to CPU cores and GPUs. Use --report-bindings
of mpiexec/run to check it.