Preface
In September 2024 NHR@ZIB replaces the global file systems HOME and WORK in all systems of Lise in Berlin. This affects all login nodes and all Compute partitions. Please be aware of the following activities.
For September 2024 we plan a 3-day maintenance to switch to the new file systems HOME and WORK.
All data in the HOME file system will be copied by NHR@ZIB. The user does not have to do anything on HOME.
The data in the WORK file system will not be copied automatically. Within a time period of three weeks, the users will have the opportunity to copy data from the old WORK to the new WORK file system.
Current migration state
nodes | CentOS 7 | Rocky Linux 9 |
---|---|---|
login | blogin[1-6] | blogin[7-8] |
compute (384 GB RAM) | 832 | 112 |
compute (768 GB RAM) | 32 | 0 |
compute (1536 GB RAM) | 2 | 0 |
Latest news
date | subject |
---|---|
2024-07-03 | official start of the migration phase with 2 login and 112 compute nodes running Rocky Linux 9 |
What has changed
SLURM partitions
CentOS 7 | Rocky Linux 9 | |
---|---|---|
old partition name | new partition name | current job limits |
● | ● | 40 nodes, 12h wall time |
● | ● | 16 nodes, 1 h wall time |
● | ● | |
● | ● | |
● | ||
● | ||
● | ● |
( ● available ● closed/not available yet )
Software and environment modules
CentOS 7 | Rocky Linux 9 | |
---|---|---|
OS components | glibc 2.17 | glibc 2.34 |
Python 3.6 | Python 3.9 | |
GCC 4.8 | GCC 11.4 | |
bash 4.2 | bash 5.1 | |
check disk quota |
|
|
Environment modules version | 4.8 (Tmod) | 5.4 (Tmod) |
Modules loaded initially |
|
|
|
| |
|
| |
compiler modules |
|
|
|
| |
MPI modules |
|
|
|
|
Shell environment variables
CentOS 7 | Rocky Linux 9 |
---|---|
| (undefined, local |
|
|
(undefined) |
|
(undefined) |
|
| |
|
What remains unchanged
node hardware and node names
communication network (Intel Omnipath)
file systems (HOME, WORK, PERM) and disk quotas
environment modules system (still based on Tcl, a.k.a. “Tmod”)
access credentials (user IDs, SSH keys) and project IDs
charge rates and CPU time accounting (early migrators' jobs are free of charge)
Lise’s Nvidia-A100 and Intel-PVC partitions
Special remarks
For users of SLURM’s
srun
job launcher:
Open MPI 5.x has dropped support for the PMI-2 API, it solely depends on PMIx to bootstrap MPI processes. For this reason the environment setting was changed fromSLURM_MPI_TYPE=pmi2
toSLURM_MPI_TYPE=pmix
, so binaries linked against Open MPI can be started as usual “out of the box” usingsrun mybinary
. For the case of a binary linked against Intel-MPI, this works too when a recent version (≥2021.11) of Intel-MPI has been used. If an older version of Intel-MPI has been used, and relinking/recompiling is not possible, one can follow the workaround for PMI-2 withsrun
as described in the Q&A section below. Switching fromsrun
tompirun
instead should also be considered.Using more processes per node than available physical cores (PPN > 96; hyperthreads) with the OPX provider:
The OPX provider currently does not support using hyperthreads/PPN > 96 on the clx partitions. Doing so may result in segmentation faults in libfabric during process startup. If a high number of PPN is really required, the libfabric provider has to be changed to PSM2 by settingFI_PROVIDER=psm2
. Note that the usage of hyperthreads may not advisable. We encourage users to test performance before using more threads than available physical cores.
Action items for users
All users of Lise are recommended to
log in to an already migrated login node (see the current state table) and get familiar with the new environment
check self-compiled software for continued operability
relink/recompile software as needed
adapt and test job scripts and workflows
submit test jobs to the new "cpu-clx:test" SLURM partition
read the Q&A section and ask for support in case of further questions, problems, or software requests (support@nhr.zib.de)