PVC GPU Programming and Usage

Programs for Intel GPUs can be written using open industry and cross-platform standards, like:

OpenMP offloading in C/C++ and Fortran, i.e. with the help of compiler directives and a supporting compiler, e.g. Intel’s icpx (C/C++ compiler) and ifx (Fortran compiler) which is available in intel/... environment modules.
SYCL for C++, i.e. in a data-parallel and explicit fashion (similar to CUDA). Different to OpenMP, SYCL provides for more control over the code, data movements, allocations and so forth that are actually executed on the GPU. The SYCL implementation for Intel GPUs is provided by oneAPI. On the PVC partition, SYCL code can be compiled with the icpx compiler from the intel environment environment module(s). Migration of existing CUDA codes can be assisted using the DPC++ Compatibility Tool. The dpct binary is available via the intel/... environment modules. Note that SYCL codes can also be executed on Nvidia (and AMD) GPUs.

Please refer to the oneAPI Programming Guide for further details about programming. Contact NHR@ZIB support to get assistance on getting your application run on Intel GPUs

The following tools may be useful to check the availability and performance indicators of the GPUs

xpu-smi (available without any environment module being loaded) lists the available GPUs from the drivers and selected metrics/properties.
sycl-ls [--verbose] (available in intel/... environment modules) shows GPUs and their properties as available to applications.
nvtop (available in nvtop environment module) shows GPU compute and memory usage.

MPI Usage

For the PVC partition, Intel MPI is the preferred GPU-aware MPI implementation. Load an impi environment module, to make it available.

To enable GPU support, set the environment variable I_MPI_OFFLOAD to "1" (in your jobscript). In case you make use of GPUs on multiple nodes, it is strongly recommended to use the psm3 libfabric provider (FI_PROVIDER=psm3)

Depending on your application’s needs set I_MPI_OFFLOAD_CELL to either tile or device to assign each MPI rank either a tile or the whole GPU device.

It is recommended to check the pinning by setting I_MPI_DEBUG to (at least) 3 and I_MPI_OFFLOAD_PRINT_TOPOLOGY to 1.

Refer to the Intel MPI documentation on GPU support for further information.

Example Job Script:

#!/bin/bash

# example to use use 2 x (2 x 4) = 16 MPI processes, each assigned 
# to one of the two tiles (stacks) of an PVC GPU

#SBATCH --partition=gpu-pvc
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --job-name=pin-check

# required for usage of Intel GPUs 
module load intel
# required for MPI, apparently
module load impi/2021.11

# required for GPU usage with MPI
export FI_PROVIDER=psm3

# to enable GPU support in Intel MPI
export I_MPI_OFFLOAD=1

# assign each rank a tile of a GPU
export I_MPI_OFFLOAD_CELL=tile

# for checking the process pinning
export I_MPI_DEBUG=3
export I_MPI_OFFLOAD_PRINT_TOPOLOGY=1

mpirun ./application

GPU and MPI Usage

PVC GPU Programming and Usage

MPI Usage