Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.

Auszug

a versatile package to perform molecular dynamics for systems with hundreds to millions of particles.

...

  • GROMACS provides extremely high performance compared to all other programs.

  • GROMACS can make simultaneous use of both CPU and GPU available in a system. There are options to statically and dynamically balance the load between the different resources.

  • GROMACS is user-friendly, with topologies and parameter files written in clear text format.

  • Both run input files and trajectories are independent of hardware endian-ness, and can thus be read by any version GROMACS.

  • GROMACS comes with a large selection of flexible tools for trajectory analysis.

  • GROMACS can be run in parallel, using the standard MPI communication protocol.

  • GROMACS contains several state-of-the-art algorithms.

  • GROMACS is Free Software, available under the GNU Lesser General Public License (LGPL).

Weaknesses

  • GROMACS does not do to much further analysis to get very high simulation speed.

  • Sometimes it is challenging to get non-standard information about the simulated system.

  • Different versions sometimes have differences in default parameters/methods. Reproducing older version simulations with a newer version can be difficult.

  • Additional tools and utilities provided by GROMACS are sometimes not the top quality.

...

Version

Module file

Thread-MPI

(gmx)

MPI

(gmx_mpi) 

Plumed

(gmx_mpi_plumed)

Prerequisites

CPU CLX partition

2021.7

gromacs/2021.7

(Fehler)

(Haken)

(Fehler)

impi/2021.13

2023.0

gromacs/2023.0

(Fehler)

(Haken)

(Haken)

impi/2021.13

CPU Genoa partition

GPU A100 partition

2022.5

gromacs/2022.5

(Fehler)

(Haken)

(Haken)

gcc/11.3.0 cuda/11.8 impiopenmpi/2021gcc.112023./4.1.4

2023.0

gromacs/2023.0_tmpi

(Haken)

(Fehler)

(Fehler)

gcc/11.3.0 intel/2023.0.0 cuda/11.8

2024.0

gromacs/2024.0_tmpi

(Haken)

(Fehler)

(Fehler)

gcc/11.3.0 intel/2023.0.0 cuda/12.3

GPU PVC partition

Version

Installation Path

modulefile

compiler

comment

Modules for running on CPUs 

Deprecated versions

2018.4

/sw/chem/gromacs/2018.4/skl/impi

gromacs/2018.4

intelmpi

2018.4

/sw/chem/gromacs/2018.4/skl/impi-plumed

gromacs/2018.4-plumed

intelmpi

with plumed

2019.6

/sw/chem/gromacs/2019.6/skl/impi

gromacs/2019.6

intelmpi

2019.6

/sw/chem/gromacs/2019.6/skl/impi-plumed

gromacs/2019.6-plumed

intelmpi

with plumed

2021.2

/sw/chem/gromacs/2021.2/skl/impi

gromacs/2021.2

intelmpi

2021.2

/sw/chem/gromacs/2021.2/skl/impi-plumed

gromacs/2021.2-plumed

intelmpi

with plumed

2022.5

/sw/chem/gromacs/2022.5/skl/impi

gromacs/2022.5

intelmpi

2022.5

/sw/chem/gromacs/2022.5/skl/impi-plumed

gromacs/2022.5-plumed

intelmpi

with plumed

2023.0

/sw/chem/gromacs/2023/clx.el9/mpi & /sw/chem/gromacs/2023/clx.el9/mpi_plumed 

gromacs/2023.0

gcc/g++ with Intel MPI

contains normal gmx_mpi binary and PLUMED-patched gmx_mpi_plumed binary

Modules for running on GPUs

2022.5

/sw/chem/gromacs/2022.5/a100/impi

gromacs/2022.5

gcc/g++ with intel MPI

contains normal gmx_mpi binary and PLUMED-patched gmx_mpi_plumed binary

2023.0

/sw/chem/gromacs/2023.0/a100/tmpi_gcc

gromacs/2023.0_tmpi 

2024.0

/sw/chem/gromacs/2024.0/a100/tmpi

gromacs/2024.0_tmpi 

*Release notes can be found here

These modules can be loaded by using a module load command. Note that Intel MPI module file should be loaded first:

...

This provides access to the binary gmx_mpi wich can be used to run simulations with sub-commands as gmx_mpi mdrun

In order to run simulations MPI runner should be used: 

...

In order to load the GPU enabled version (avaiable only on the bgn nodes):

Modules can be loaded by using a module load command. Note that the following module files should be loaded first:

module load gcc/11.3.0 intel/2023.0.0 cuda/11.8 gromacs/2023.0_tmpi

Submission script examples

Simple CPU job script 

A simple case of a GROMACS job using a total of 640 CPU cores for 12 hours. The requested amount of cores in the example does not include all available cores on the allocated nodes. The job will execute 92 ranks on 3 nodes + 91 ranks on 4 nodes. You can use this example if you know the exact amount of required ranks you want to use.*Release notes can be found here

These modules can be loaded by using a module load command. Note that Intel MPI module file should be loaded first:

module load impi/2019.5 gromacs/2019.6

This provides access to the binary gmx_mpi wich can be used to run simulations with sub-commands as gmx_mpi mdrun

In order to run simulations MPI runner should be used: 

mpirun gmx_mpi mdrun MDRUNARGUMENTS

In order to load the GPU enabled version (avaiable only on the bgn nodes):

Modules can be loaded by using a module load command. Note that the following module files should be loaded first:

module load gcc/11.3.0 intel/2023.0.0 cuda/11.8 gromacs/2023.0_tmpi

Submission script examples

Simple CPU job script 

A simple case of a GROMACS job using a total of 640 CPU cores for 12 hours. The requested amount of cores in the example does not include all available cores on the allocated nodes. The job will execute 92 ranks on 3 nodes + 91 ranks on 4 nodes. You can use this example if you know the exact amount of required ranks you want to use.

Codeblock
languagebash
#!/bin/bash
#SBATCH -t 12:00:00
#SBATCH -p standard96
#SBATCH -n 640

export SLURM_CPU_BIND=none

module load impi/2019.5
module load gromacs/2019.6

mpirun gmx_mpi mdrun MDRUNARGUMENTS

Whole node CPU job script

In case you want to use all cores on the allocated nodes, there are another options of the batch system to request the amount of nodes and number of tasks. The example below will result in running 672 ranks. 

Codeblock
languagebash
#!/bin/bash
#SBATCH -t 12:00:00
#SBATCH -p standard96
#SBATCH -N 7
#SBATCH --tasks-per-nnode 64096

export SLURM_CPU_BIND=none

module load impi/2019.5
module load gromacs/2019.6

mpirun gmx_mpi mdrun MDRUNARGUMENTS

...

GPU job script

In case you want to use all cores on the allocated nodes, there are another options of the batch system to request the amount of nodes and number of tasks. The example below will result in running 672 ranks. Following script using four thread-MPI ranks. One is dedicated to the long-range PME calculation. Using the -gputasks 0001 keyword: the first 3 threads offload their short-range non-bonded calculations to the GPU with ID 0, the 4th (PME) thread offloads its calculations to the GPU with ID 1.

Codeblock
languagebash
#!/bin/bash 
#SBATCH --t time=12:00:00
#SBATCH --partition=gpu-pa100
standard96
#SBATCH -N 7
#SBATCH --tasks-per-node 96

-ntasks=72

export SLURM_CPU_BIND=none

module load impi/2019.5 gcc/11.3.0 intel/2023.0.0 cuda/11.8
module load gromacs/20192023.60_tmpi

mpirun gmx_mpi mdrun MDRUNARGUMENTS

GPU job script

Following script using four thread-MPI ranks. One is dedicated to the long-range PME calculation. Using the -gputasks 0001 keyword: the first 3 threads offload their short-range non-bonded calculations to the GPU with ID 0, the 4th (PME) thread offloads its calculations to the GPU with ID 1.

Codeblock
language
export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true

OMP_NUM_THREADS=9

gmx mdrun -ntomp 9 -ntmpi 4 -nb gpu -pme gpu -npme 1 -gputasks 0001 OTHER MDRUNARGUMENTS

If you are using MPI versions (non-thread-MPI, or eg., to take advantage of PLUMED) GPU-accelerated GROMACS, you can proceed in a similar fashion, but instead use the mpirun/mpiexec task launcher before the GROMACS binary. An example job script asking for 2 A100 GPUs across 2 nodes is shown below: 

Codeblock
languagebash
#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --partition=gpu-a100
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=724

export SLURM_CPU_BIND=none

module load gcc/11.3.0 intelcuda/202311.0.08 cudaopenmpi/gcc.11/4.1.84
module load gromacs/20232022.0_tmpi5

export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true
export OMP_GMX_ENABLE_DIRECT_GPU_COMM=true

OMP_NUM_THREADS=9

mpiexec -np 8 -npernode 4 gmx_mpi mdrun -ntomp 9 -ntmpi 4 -nb gpu -pme gpu -npme 1 -gputasksgpu_id 000101 OTHER MDRUNARGUMENTS

...

Whole node GPU job script

To setup a whole node GPU job use the -gputasks keyword. 

Codeblock
languagebash
#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --partition=gpu-a100
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=ntasks=72

export SLURM_CPU_BIND=none

module load gcc/11.3.0 intel/2023.0.0 cuda/11.8 impi/2021.11
module load gromacs/20222023.50_tmpi

export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true
export GMX_ENABLE_DIRECT_GPU_COMM=true

OMP_NUM_THREADS=9

mpirungmx mdrun -npntomp 49 -ppn 2 gmx_mpi mdrunntmpi 16 -ntompgputasks 9 -ntmpi 4 -nb gpu -pme gpu -npme 1 -gpu_id 01 OTHER MDRUNARGUMENTS

Whole node GPU job script

To setup a whole node GPU job use the -gputasks keyword. 

Codeblock
languagebash
#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --partition=gpu-a100
#SBATCH --ntasks=72

export SLURM_CPU_BIND=none

module load gcc/11.3.0 intel/2023.0.0 cuda/11.8
module load gromacs/2023.0_tmpi

export GMX_GPU_DD_COMMS=true
export GMX_GPU_PME_PP_COMMS=true

OMP_NUM_THREADS=9

gmx mdrun -ntomp 9 -ntmpi 16 -gputasks 0000111122223333 MDRUNARGUMENTS

Note: Settings of the Thread-MPI ranks and OpenMP threads is for achieve optimal performance. The number of ranks should be a multiple of the number of sockets, and the number of cores per node should be a multiple of the number of threads per rank.

Related Modules

Gromacs-Plumed

PLUMED is an open-source, community-developed library that provides a wide range of different methods, such as enhanced-sampling algorithms, free-energy methods and tools to analyze the vast amounts of data produced by molecular dynamics (MD) simulations. PLUMED works together with some of the most popular MD engines.

...

0000111122223333 MDRUNARGUMENTS

Note: Settings of the Thread-MPI ranks and OpenMP threads is for achieve optimal performance. The number of ranks should be a multiple of the number of sockets, and the number of cores per node should be a multiple of the number of threads per rank.

Related Modules

Gromacs-Plumed

PLUMED is an open-source, community-developed library that provides a wide range of different methods, such as enhanced-sampling algorithms, free-energy methods and tools to analyze the vast amounts of data produced by molecular dynamics (MD) simulations. PLUMED works together with some of the most popular MD engines.

Since the migration of the CPU partition from CentOS to Rocky 9 Linux, all GROMACS-PLUMED modules have now been combined with normal GROMACS modules. For example, to use GROMACS 2023.0 with PLUMED, one can load gromacs/2022.5 and have access to both normal (gmx_mpi) and PLUMED-patched (gmx_mpi_plumed) binaries.

PLUMED can be used to bias GROMACS simulations with an appropriate PLUMED data file supplied as input for the -plumed option for the gmx_mpi_plumed mdrun command:

Codeblock
languagebash
#!/bin/bash 
#SBATCH --time=12:00:00
#SBATCH --partition=cpu-clx
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=72

export SLURM_CPU_BIND=none

module load gcc/11.3.0 cuda/11.8 openmpi/gcc.11/4.1.4
module load gromacs/2022.5

OMP_NUM_THREADS=2

mpiexec -np 144 -npernode 72 gmx_mpi mdrun -ntomp 2 -npme 1 -pin on -plumed plumed.dat OTHER MDRUNARGUMENTS
Hinweis

Not every GROMACS GPU option is compatible with PLUMED operations. For example, -update gpu, which normally can greatly accelerate all operations in normal GROMACS by forcing simulation updates to occur on the GPU, will lead to incorrect results with PLUMED (https://github.com/plumed/plumed2/commit/20f8be272efa268a31af65c56c3c71af8c13402c#diff-f21f46830c34c99766c30157316251b8354efbf1f6359f18b961c7339af97a77R2144 ). Please familiarize yourself with PLUMED/GROMACS/GPU limitations, observe output warnings, and report any issues to NHR@ZIB staff for assistance.

For additional information about PLUMED, please visit the official website.

Analyzing results

GROMACS Tools

...

Turbo-boost has been mostly disabled on Emmy at GWDG (partitions medium40, large40, standard96large96, and huge96) in order to save energy. However, this has a particularly strong performance impact on GROMACS in the range of 20-40%. Therefore, we recommend that GROMACS jobs be submitted requesting turbo-boost to be enabled with the --constraint=turbo_on option given to srun or sbatch.

Useful links

References

  1. GROMACS User-Guide

  2. PLUMED Home