Content

Inhalt

minLevel	1
maxLevel	1
outline	false
style	disc
type	list
printable	true

...

This partition offers access to eight nodes, each equipped with four Intel Data Center GPU Max 1550 (formerly known as Ponte Vecchio aka PVC). The following offers more details on their usage.

More PVC related content

Untergeordnete Seiten (Anzeige untergeordneter Seiten)

allChildren	true

Hardware Overview

Property	Login Node	Compute Nodes
Count	2	8
CPU	2x Intel(R) Xeon(R) Platinum 8480L (Sapphire Rapids; 56 cores; 105 MB L3 cache)
RAM	512 GB	1024 GB (1 TB)
Local Storage	2x 1.8 TB SATA SSD	1x 3.6 TB NVMe drive
GPUs	none	4 x Intel Data Center GPU Max 1550 (128 GB HBM; X^e links with all-to-all topology between GPUs) 2 per Socket/NUMA domain
Fabric	InfiniBand HDR 200 GBit/s (1 HFI per Node; on NUMA domain 0)
Operating System	Rocky Linux 8

Note that a single GPU Max 1550 is comprised of two “tiles” or “stacks” which can be considered as NUMA domains. Depending on your workload you may restrict your application to a tile or make use of a full GPU.

Login Nodes

Login to the GPU PVC partition is possible through dedicated login nodes, reachable via SSH under bgilogin.nhr.zib.de:

Codeblock

language	bash

$ ssh -i $HOME/.ssh/id_rsa_nhr nhr_username@bgilogin.nhr.zib.de
Enter passphrase for key '...':
bgilogin1 $

File systems

The file systems HOME and WORK on the GPU system are the same as on the CPU system, see Quickstart. Access to compute node local SSD space is provided via the environment variable LOCAL_TMPDIR defined during a SLURM session (batch or interactive job).

Software and environment modules

Login and compute nodes of the PVC GPU partition are running under Rocky Linux.

...

When compiling applications for the PVC GPU partition, we recommend to use the PVC GPU login nodes or, in case of really demanding compilations and/or need for the presence of the GPU drivers, the use of a PVC GPU compute node via an interactive SLURM job session.

Slurm Partitions

Partition Name	Nodes	GPUs per Node	GPU Hardware	Description
gpu-pvc	8	4	Intel Data Center GPU Max 1550	full node exclusive

...

Example usage of two nodes (eight GPUs in total). Note that is currently not required to request GPU resources via Slurm for using nodes/GPUs of the PVC partition.

Codeblock
$ srun --nodes=2 --partition=gpu-pvc example_cmd

PVC GPU Programming and Usage

Programs for Intel GPUs can be written using open industry and cross-platform standards, like:

OpenMP offloading in C/C++ and Fortran, i.e. with the help of compiler directives and a supporting compiler, e.g. Intel’s icpx (C/C++ compiler) and ifx (Fortran compiler) which is available in intel/... environment modules.
SYCL for C++, i.e. in a data-parallel and explicit fashion (similar to CUDA). Different to OpenMP, SYCL provides for more control over the code, data movements, allocations and so forth that are actually executed on the GPU. The SYCL implementation for Intel GPUs is provided by oneAPI. On the PVC partition, SYCL code can be compiled with the icpx compiler from the intel environment environment module(s). Migration of existing CUDA codes can be assisted using the DPC++ Compatibility Tool. The dpct binary is available via the intel/... environment modules. Note that SYCL codes can also be executed on Nvidia (and AMD) GPUs.

Please refer to the oneAPI Programming Guide for further details about programming. Contact NHR@ZIB support to get assistance on getting your application run on Intel GPUs

The following tools may be useful to check the availability and performance indicators of the GPUs

xpu-smi (available without any environment module being loaded) lists the available GPUs from the drivers and selected metrics/properties.
sycl-ls [--verbose] (available in intel/... environment modules) shows GPUs and their properties as available to applications.
nvtop (available in nvtop environment module) shows GPU compute and memory usage.

MPI Usage

For the PVC partition, Intel MPI is the preferred GPU-aware MPI implementation. Load an impi environment module, to make it available.

To enable GPU support, set the environment variable I_MPI_OFFLOAD to "1" (in your jobscript). In case you make use of GPUs on multiple nodes, it is strongly recommended to use the psm3 libfabric provider (FI_PROVIDER=psm3)

Depending on your application’s needs set I_MPI_OFFLOAD_CELL to either tile or device to assign each MPI rank either a tile or the whole GPU device.

It is recommended to check the pinning by setting I_MPI_DEBUG to (at least) 3 and I_MPI_OFFLOAD_PRINT_TOPOLOGY to 1.

Refer to the Intel MPI documentation on GPU support for further information.

Example Job Script:

Codeblock

language	bash

#!/bin/bash

# example to use use 2 x (2 x 4) = 16 MPI processes, each assigned 
# to one of the two tiles (stacks) of an PVC GPU

#SBATCH --partition=gpu-pvc
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --job-name=pin-check

# required for usage of Intel GPUs 
module load intel
# required for MPI, apparently
module load impi/2021.11

# required for GPU usage with MPI
export FI_PROVIDER=psm3

# to enable GPU support in Intel MPI
export I_MPI_OFFLOAD=1

# assign each rank a tile of a GPU
export I_MPI_OFFLOAD_CELL=tile

# for checking the process pinning
export I_MPI_DEBUG=3
export I_MPI_OFFLOAD_PRINT_TOPOLOGY=1

mpirun ./application

AI Tools and Frameworks

Popular tools such as Pytorch, TensorFlow, and JAX can be used with the Intel distribution for Python (use the offline installer on the login nodes) together with certain special framework-specific extensions. Environments can be separately prepared for each framework below for use with Intel GPUs. Note that the module intel/2024.0.0 must be loaded for these frameworks to be installed or run properly.

Hinweis
The latest Intel AI tools have specific Intel GPU driver requirements. Currently, only the PVC compute nodes `bgi1007` and `bgi1008` have these drivers installed and are reserved under `pvcup`.

Pytorch

Load the Intel OneAPI module and create a new conda environment within your Intel python distribution:

Codeblock
module load intel/2024.0.0 conda create -n intel_pytorch_gpu python=3.9 conda activate intel_pytorch_gpu

Once the new environment has been activated, the following commands install Pytorch:

Codeblock
python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

This installs Pytorch together with Intel extension for Pytorch necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:

Codeblock

language	py

Python 3.9.18 (tags/v3.9.18-26-g6b320c3b2f6-dirty:6b320c3b2f6, Sep 28 2023, 00:35:27)
[GCC 13.2.0] :: Intel Corporation on linux
(null)Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import torch
>>> import intel_extension_for_pytorch as ipex
My guessed rank = 0
>>> [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())]
[0]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[1]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[2]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[3]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[4]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[5]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[6]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[7]: _DeviceProperties(name='Intel(R) Data Center GPU Max 1550', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=65536MB, max_compute_units=512, gpu_eu_count=512)
[None, None, None, None, None, None, None, None]

Examples of how to use the Intel extension for Pytorch can be found here.

TensorFlow

Similar to Pytorch, an Intel extension for TensorFlow exists. To prepare a TensorFlow environment for use with Intel GPUs, first create a new conda environment:

Codeblock
module load intel/2024.0.0 conda create -n intel_tensorflow_gpu python=3.9 conda activate intel_tensorflow_gpu

Once the new environment has been activated, the following commands install TensorFlow:

Codeblock
pip install tensorflow==2.14.0 pip install --upgrade intel-extension-for-tensorflow[xpu]

This installs TensorFlow together with it's Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:

Codeblock

language	py

Python 3.9.18 (tags/v3.9.18-26-g6b320c3b2f6-dirty:6b320c3b2f6, Sep 28 2023, 00:35:27)
[GCC 13.2.0] :: Intel Corporation on linux
(null)Type "help", "copyright", "credits" or "license" for more information.
Intel(R) Distribution for Python is brought to you by Intel Corporation.
Please check out: https://software.intel.com/en-us/python-distribution
>>> import tensorflow
2024-02-09 14:26:07.737940: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-02-09 14:26:07.740082: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-09 14:26:07.764245: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-09 14:26:07.764268: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-09 14:26:07.764290: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-09 14:26:07.769201: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-09 14:26:07.769345: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-09 14:26:08.459403: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-02-09 14:26:09.416471: I itex/core/wrapper/itex_gpu_wrapper.cc:35] Intel Extension for Tensorflow* GPU backend is loaded.
2024-02-09 14:26:09.457055: I itex/core/wrapper/itex_cpu_wrapper.cc:60] Intel Extension for Tensorflow* AVX512 CPU backend is loaded.
2024-02-09 14:26:09.551955: I itex/core/devices/gpu/itex_gpu_runtime.cc:129] Selected platform: Intel(R) Level-Zero
2024-02-09 14:26:09.552267: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552272: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552276: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552279: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552283: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552286: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552290: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.
2024-02-09 14:26:09.552293: I itex/core/devices/gpu/itex_gpu_runtime.cc:154] number of sub-devices is zero, expose root device.

Examples of how to use the Intel extension for TensorFlow can be found here.

JAX

Hinweis
Intel XPU support is still experimental for JAX.

Like Pytorch and TensorFlow, JAX also has an extension via OpenXLA. To prepare a JAX environment for use with Intel GPUs, first create a new conda environment:

Codeblock
module load intel/2024.0.0 conda create -n intel_jax_gpu python=3.9 conda activate intel_jax_gpu

Once the environment is activated, the following commands install JAX

Codeblock
pip install jax==0.4.20 jaxlib==0.4.20 pip install --upgrade intel-extension-for-openxla

This installs JAX together with its Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:

Codeblock

language	py

Python 3.9.18 (main, Sep 11 2023, 13:41:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jax
>>> print("jax.local_devices(): ", jax.local_devices())
Platform 'xpu' is experimental and not all JAX functionality may be correctly supported!
jax.local_devices():  [xpu(id=0), xpu(id=1), xpu(id=2), xpu(id=3), xpu(id=4), xpu(id=5), xpu(id=6), xpu(id=7)]

...

Charge rates

Charge rates for the slurm partitions you find in Accounting. The PVC partitions are currently available free of charge.

Versionen im Vergleich

Alte Version 1

Neue Version Aktuell

Schlüssel

Hardware Overview

Login Nodes

File systems

Software and environment modules

Slurm Partitions

PVC GPU Programming and Usage

MPI Usage

AI Tools and Frameworks

Pytorch

TensorFlow

JAX

Charge rates

Seitenvergleich

Versionen im Vergleich

Alte Version 1

Neue Version Aktuell

Schlüssel

Hardware Overview

Login Nodes

File systems

Software and environment modules

Slurm Partitions

PVC GPU Programming and Usage

MPI Usage

AI Tools and Frameworks

Pytorch

TensorFlow

JAX

Charge rates