Inhalt |
---|
HLRN NHR provides tailored WORK file systems for improved IO throughput of IO intense job workloads.
...
WORK is the default shared file system for all jobs and can be accessed using the $WORK
variable. WORK is accessible for all users and consists of 8 Metadata Targets (MDT's) with NVMe SSDs and 28 Object Storage Targets (OST's) on Lise and 100 OST's on Emmy handling the data. Both using classical hard drives.
...
Especially large shared file IO patterns will benefit from striping. Up to 28 OSTs on Lise and up to 100 OST's on Emmy can be used, recommended are up to 8 OSTs for Lise and 32 OSTs on Emmy. We have preconfigured a progressive file layout (PFL), which sets an automatic striping based on the file size.
Access: create a new directory in $WORK
and set lfs setstripe -c <stripsize> <dir>
...
Some Compute Nodes are installed with local SSD storage up to 2 TB on Lise and 400 GB on Emmy.
...
. These node share the following properties.
- 2 TB SSD locally attached to the node
- Data on SSD will be deleted after the job is finished.
- Data on SSD can not be shared across nodes
...
- .
For unshared local IO this is the best performing file system to use.
Lise: SSD | Lise: CAS |
---|
slurm partition |
using |
via partition: large96
and huge96
using $LOCAL_TMPDIR
medium40, large40, standard96:ssd, large96, huge96
cpu-clx:large and using | |
Type and size | 2 TB Intel NVMe SSD DC P4511 |
Intel NVMe SSD DC P4511 (2 TB) using Intel Optane SSD DC P4801X (200 GB) as write-trough cache |
Intel DC S4500 (400 GB)
Intel NVMe SSD (1TB)
FastIO
...
WORK is extended with 4 additional OST's using NVMe SSDs to accelerate heavy (random) IO-demands. To accelerate specific IO-demands further striping for up to these 4 OSTs is available.
Access:
create a new directory in $WORK
and set lfs setstripe -p flash <dir>
Size:
55 TiB - quoted
...
IME - Emmy only
...
Using the Burst Buffer for random IO helps to avoid overloading the global filesystem which results in slow job runtimes for all users. Beside the POSIX interface a native API and a MPI-IO module for further acceleration is available.
IME servers are currently available for use in EMMY.
Access: ask support@hlrn.de for access
Size: 48 TiB
Finding the right File System
If your jobs have a significant IO part we recommend asking your consultant via support@hlrnsupport@nhr.zib.de to recommend the right file system for you.
...
If you have a significant amount of node-local IO which is not needed to be accessed after job end and will be smaller than 2 TB on Lise and 400 GB on Emmy we recommend using $LOCAL_TMPDIR. Depending on your IO pattern this may accelerate IO to up to 100%.
...
Global IO is defined as shared IO which will be able to be accessed from multiple nodes at the same time and will be persistent after job end.
Especially random IO on small files will be accelerated up to 200% using FastIO on Liseor IME on Emmy.