...
- Write intermediate results and checkpoints as seldom as possible.
- Try to write/read larger data volumes (>1 MiB) and reduce the number of files concurrently managed in WORK.
- For inter-process communication use proper protocols (e.g. MPI) instead of files in WORK.
- If you want to control your jobs externally, consider to use POSIX signals, instead of using files frequently opened/read/closed by your program. You can send signals e.g. to batch jobs via "scancel --signal..."
- Use MPI-IO to coordinate your I/O instead of each MPI task doing individual POSIX I/O (HDF5 and netCDF make may help you with this).
For some of the codes we are aware of certain issues:
...