PVC GPU Programming and Usage
Programs for Intel GPUs can be written using open industry and cross-platform standards, like:
OpenMP offloading in C/C++ and Fortran, i.e. with the help of compiler directives and a supporting compiler, e.g. Intel’s
icpx
(C/C++ compiler) andifx
(Fortran compiler) which is available inintel/...
environment modules.SYCL for C++, i.e. in a data-parallel and explicit fashion (similar to CUDA). Different to OpenMP, SYCL provides for more control over the code, data movements, allocations and so forth that are actually executed on the GPU. The SYCL implementation for Intel GPUs is provided by oneAPI. On the PVC partition, SYCL code can be compiled with the
icpx
compiler from theintel
environment environment module(s). Migration of existing CUDA codes can be assisted using the DPC++ Compatibility Tool. Thedpct
binary is available via theintel/...
environment modules. Note that SYCL codes can also be executed on Nvidia (and AMD) GPUs.
Please refer to the oneAPI Programming Guide for further details about programming. Contact NHR@ZIB support to get assistance on getting your application run on Intel GPUs
The following tools may be useful to check the availability and performance indicators of the GPUs
xpu-smi
(available without any environment module being loaded) lists the available GPUs from the drivers and selected metrics/properties.sycl-ls [--verbose]
(available inintel/...
environment modules) shows GPUs and their properties as available to applications.nvtop
(available innvtop
environment module) shows GPU compute and memory usage.
MPI Usage
For the PVC partition, Intel MPI is the preferred GPU-aware MPI implementation. Load an impi
environment module, to make it available.
To enable GPU support, set the environment variable I_MPI_OFFLOAD
to "1" (in your jobscript). In case you make use of GPUs on multiple nodes, it is strongly recommended to use the psm3 libfabric provider (FI_PROVIDER=psm3
)
Depending on your application’s needs set I_MPI_OFFLOAD_CELL
to either tile
or device
to assign each MPI rank either a tile or the whole GPU device.
It is recommended to check the pinning by setting I_MPI_DEBUG
to (at least) 3 and I_MPI_OFFLOAD_PRINT_TOPOLOGY
to 1.
Refer to the Intel MPI documentation on GPU support for further information.
Example Job Script:
#!/bin/bash # example to use use 2 x (2 x 4) = 16 MPI processes, each assigned # to one of the two tiles (stacks) of an PVC GPU #SBATCH --partition=gpu-pvc #SBATCH --nodes=2 #SBATCH --ntasks-per-node=8 #SBATCH --job-name=pin-check # required for usage of Intel GPUs module load intel # required for MPI, apparently module load impi/2021.11 # required for GPU usage with MPI export FI_PROVIDER=psm3 # to enable GPU support in Intel MPI export I_MPI_OFFLOAD=1 # assign each rank a tile of a GPU export I_MPI_OFFLOAD_CELL=tile # for checking the process pinning export I_MPI_DEBUG=3 export I_MPI_OFFLOAD_PRINT_TOPOLOGY=1 mpirun ./application