...
Popular tools such as Pytorch
, TensorFlow
, and JAX
can be used with the Intel distribution for Python (use the offline installer on the login nodes) together with certain special framework-specific extensions. Environments can be separately prepared for each framework below for use with Intel GPUs. Note that the module intel/2024.0.0
(under sw.pvc
) must be loaded for these frameworks to be installed or run properly.
Hinweis |
---|
The latest Intel AI tools have specific Intel GPU driver requirements. Currently, only the PVC compute nodes |
Pytorch
Load the Intel OneAPI module and create a new conda environment within your Intel python distribution:
Codeblock |
---|
module load intel/2024.0.0
conda create -n intel_pytorch_gpu python=3.9
conda activate intel_pytorch_gpu |
Once the new environment has been activated, the following commands install Pytorch
:
...
We also offer a standalone module (intel_AI_tools/2024.0.0
) that loads a conda
installation with the following pre-installed, Intel GPU/XPU-ready environments:
intel_pytorch_2.1.0a0
intel_tensorflow_2.14.0
intel_jax_0.4.20
Hinweis |
---|
Please note that PVC nodes currently run on Rocky 8 linux, and so only python versions <=3.9 are supported. |
Info |
---|
NumPy 2.0.0 breaks binary backwards compatibility. If Numpy-related runtime errors are encountered, please consider downgrading to a version <2.0.0 |
Pytorch
Load the Intel OneAPI module and create a new conda environment within your Intel python distribution:
Codeblock |
---|
module load intel/2024.0.0
conda create -n intel_pytorch_gpu python=3.9
conda activate intel_pytorch_gpu |
Once the new environment has been activated, the following commands install Pytorch
:
Codeblock |
---|
python -m pip install torch==2.1.0a0 torchvision==0.16.0a0 torchaudio==2.1.0a0 intel-extension-for-pytorch==2.1.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/ |
This installs Pytorch
together with Intel extension for Pytorch necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:
...
Examples of how to use the Intel extension for Pytorch
can be found here.
TensorFlow
Similar to Pytorch
, an Intel extension for TensorFlow exists. To prepare a TensorFlow
environment for use with Intel GPUs, first create a new conda environment:
...
Codeblock |
---|
pip install tensorflow==2.14.0 pip install --upgrade intel-extension-for-tensorflow[xpu] |
...
==2.14.0 |
This installs TensorFlow
together with it's Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:
...
Examples of how to use the Intel extension for TensorFlow
can be found here.
JAX
Hinweis |
---|
Intel XPU support is still experimental for JAX, as of version 0.4.20 |
Like Pytorch
and TensorFlow
, JAX
also has an extension via OpenXLA. To prepare a JAX
environment for use with Intel GPUs, first create a new conda environment:
...
Once the environment is activated, the following commands install JAX
Codeblock |
---|
pip install numpy==1.24.4 pip install jax==0.4.20 jaxlib==0.4.20 pip install --upgrade intel-_extension-_for-openxla_openxla==0.2.1 |
This installs JAX
together with its Intel extension necessary to run non-CUDA operations on Intel GPUs. On a compute node, the presence of GPUs can be assessed:
...
Examples for using the Intel extension for JAX can be found here.
Distributed Training
multigpu and multinode jobs can be executed using the following strategy in a job submission script:
Codeblock |
---|
module load intel/2024.0.0
module load impi
export CCL_ROOT=/sw/compiler/intel/oneapi/ccl/2021.12
export LD_LIBRARY_PATH=$I_MPI_ROOT/lib:$LD_LIBRARY_PATH
hnode=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export MASTER_ADDR=$(scontrol getaddrs $hnode | cut -d' ' -f 2 | cut -d':' -f 1)
export MASTER_PORT=29500 |
It is advantageous to define the GPU tile usage (each Intel Max 1550 has two compute “tiles”) using affinity masks, wherein the format GPU_ID.TILE_ID
(zero-base index) specifies which GPU(s) and tile(s) to use. Eg, two use two GPUs and four tiles, one can specify:
Codeblock |
---|
export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
export ZE_AFFINITY_MASK=0.0,0.1,1.0,1.1 |
To use four GPUs and eight tiles, one would specify:
Codeblock |
---|
export ZE_FLAT_DEVICE_HIERARCHY=COMPOSITE
export ZE_AFFINITY_MASK=0.0,0.1,1.0,1.1,2.0,2.1,3.0,3.1 |
These specifications are applied to all nodes of a job. For more information, and alternative modes, please see the intel level-zero documentation.
Intel MPI can then be used to distribute and run your job, eg:
Codeblock |
---|
mpirun -np 8 -ppn 8 your_exe your_exe_flags |