2. MPSD HPC system#

Note

Following a major upgrade early 2023, the documentation and functionality may be suboptimal in places. Please do not hesitate to get in touch with questions and error reports.

For the Raven and Ada machine, please check Overview computing services.

2.1. Login nodes#

Login nodes are mpsd-hpc-login1.desy.de and mpsd-hpc-login2.desy.de.

If you have not got access to the system and are a member of the Max Planck Institute for the Structure and Dynamics of Matter (MPSD), please request access to the MPSD HPC system by emailing MPSD IT at support[at]mpsd[dot]mpg[dot]de. Please provide your DESY username in the email and send the email from your Max Planck email account.

Note

During the first login your home directory will be created. This could take up to a minute. Please be patient.

2.2. Job submission#

Job submission is via Slurm.

Example slurm submission jobs are available below (Example batch scripts).

2.2.1. Partitions#

The following partitions are available to all (partial output from sinfo):

PARTITION  AVAIL  TIMELIMIT  NODES NODELIST
bigmem        up 7-00:00:00      8 mpsd-hpc-hp-[001-008]
gpu           up 7-00:00:00      2 mpsd-hpc-gpu-[001-002]
gpu-ayyer     up 7-00:00:00      3 mpsd-hpc-gpu-[003-005]
public*       up 7-00:00:00     49 mpsd-hpc-ibm-[001-030,035-036,043-049,053-062]
public2       up 7-00:00:00     63 mpsd-hpc-pizza-[001-063]

Please use the machines in the gpu partition only if your code supports nvidia-cuda.

Hardware resources per node:

  • public

    • 16 physical cores (no hyperthreading, 16 CPUs in Slurm terminology)

    • 64GB RAM

    • at most 2 nodes can be used for multi-node jobs (as the 10GB ethernet for MPI has a relatively high latency)

    • microarchitecture: sandybridge

  • public2

    • 40 physical cores (80 with hyperthreading, 80 CPUs in Slurm terminology)

    • at least 256GB RAM (some nodes have up to 768GB RAM)

    • only single node jobs are permitted (as the 1GB ethernet is inefficient for MPI jobs across nodes)

    • microarchitecture broadwell

  • bigmem

    • 96 physical cores (192 with hyperthreading, 192 CPUs in Slurm terminology)

    • 2TB RAM

    • fast FDR infiniband for MPI communication

    • microarchitecture: broadwell

  • gpu

    • 16 physical cores (32 with hyperthreading, 32 CPUs in Slurm terminology)

    • 1.5TB RAM

    • fast FDR infiniband for MPI communication

    • 8 Tesla V100 GPUs

    • microarchitecture: skylake_avx512

  • gpu-ayyer

    • 16 physical cores (32 with hyperthreading, 32 CPUs in Slurm terminology)

    • 372GB RAM

    • fast FDR infiniband for MPI communication

    • 4 Tesla V100 GPUs

    • microarchitecture: cascadelake

2.2.2. Slurm default values#

Slurm defaults are an execution time of 1 hour, one task, one CPU, 4000MB of RAM, and the public partition.

The maximum runtime for any job is 7 days. Interactive jobs are limited to 12 hours.

A node is a physical multi-core shared-memory computer and can (by default) be shared between multiple users. (i.e. use is not exclusive, unless requested).

Logging onto nodes via ssh is only possible once the nodes are allocated (either via sbatch when the job starts, or using salloc for interactive use, see Interactive use of HPC nodes). This avoids accidental over-use of resources and enables energy saving measures (such as switching compute nodes off automatically if they are not in use).

2.2.3. Slurm CPUs#

Whereas in usual language we would consider a “CPU” the entire processor package (i.e: The device you attach to the motherboard socket), in Slurm terminology a “CPU” is a computational core (or a thread if hyperthreading is configured). This is what is sometimes called a “Logical Core”. A computing node that has an 8-core processor with Simultaneous Multithreading (Hyperthreading) technology, would appear to Slurm as a node with “16 CPUs”.

As this document refers to Slurm and its various commands, we use the slurm terminology throughout.

2.2.4. Interactive use of HPC nodes#

For production computation, we typically write a batch file (see Example batch scripts), and submit these using the sbatch command.

Sometimes, it can be helpful to login into an HPC node for example to compile software or run interactive tests. The command to use in this case is salloc.

For example, requesting a job with all default settings:

user@mpsd-hpc-login1:~$ salloc
salloc: Granted job allocation 1272
user@mpsd-hpc-ibm-058:~$

We can see from the prompt (user@mpsd-hpc-ibm-058:~$) that the Slurm system has allocated the requested resources on node mpsd-hpc-ibm-058 to us.

We can use the mpsd-show-job-resources command to check some details of the allocation:

user@mpsd-hpc-ibm-058:~$ mpsd-show-job-resources
 345415 Nodes: mpsd-hpc-ibm-058
 345415 Local Node: mpsd-hpc-ibm-058

 345415 CPUSET: 0
 345415 MEMORY: 4000 M

Here we see (CPUSET: 0) that we have been allocated one CPU (in Slurm terminology) and that CPU has got the index 0. If we have requested multiple CPUs, we would find multiple numbers displayed (see below).

We can finish our interactive session by typing exit:

user@mpsd-hpc-ibm-058:~$ exit
exit
salloc: Relinquishing job allocation 1272
user@mpsd-hpc-login1:~$

Using a tmux session while working interactively is advisable, as it allows you to get back to the terminal in case you lose connection to the session (e.g. due to network issues). A quick start for tmux can be found tmux/tmux

If we desire exclusive use of a node (i.e. not shared with others), we can use salloc --exclusive (here we request a session time of 120 minutes):

user@mpsd-hpc-login2:~$ salloc  --exclusive --time=120
salloc: Granted job allocation 1279
user@mpsd-hpc-ibm-061:~$ mpsd-show-job-resources
  65911 Nodes: mpsd-hpc-ibm-061
  65911 Local Node: mpsd-hpc-ibm-061

  65911 CPUSET: 0-15
  65911 MEMORY: 56000 M

We can see (in the output above) that all 16 CPUs of the node are allocated to us.

Assume we need 16 CPUs and 10GB of RAM for our interactive session (the 16 CPUs corresponds to the number of OpenMP threads, see OpenMP):

user@mpsd-hpc-login1:~$ salloc --mem=10000 --cpus-per-task=16
salloc: Granted job allocation 1273
user@mpsd-hpc-ibm-058:~$ mpsd-show-job-resources
 345446 Nodes: mpsd-hpc-ibm-058
 345446 Local Node: mpsd-hpc-ibm-058

 345446 CPUSET: 0-15
 345446 MEMORY: 10000 M
user@mpsd-hpc-ibm-058:~$

If we execute MPI programs, we can specify the number of nodes (a node is a computer node, with typically one, two or four CPU sockets), and how many (MPI) tasks (=processes) we want to run on that node. Imagine we ask for two nodes, and want to run 4 MPI processes on each:

user@mpsd-hpc-login1:~$ salloc --nodes=2 --tasks-per-node=4
salloc: Granted job allocation 1276
user@mpsd-hpc-ibm-058:~$ mpsd-show-job-resources
 345591 Nodes: mpsd-hpc-ibm-[058-059]
 345591 Local Node: mpsd-hpc-ibm-058

 345591 CPUSET: 0-3
 345591 MEMORY: 14000 M
user@mpsd-hpc-ibm-058:~$ srun hostname
mpsd-hpc-ibm-059
mpsd-hpc-ibm-059
mpsd-hpc-ibm-059
mpsd-hpc-ibm-059
mpsd-hpc-ibm-058
mpsd-hpc-ibm-058
mpsd-hpc-ibm-058
mpsd-hpc-ibm-058

The srun command starts the execution of our (MPI) tasks. We use the hostname command above and can see that we have 4 of these commands run on each node.

Jobs default to the public partition, but specifying -p followed by a partition name directs them to a different partition.

user@mpsd-hpc-login1:~$ salloc --mem=1000 -p bigmem --cpus-per-task=12
salloc: Granted job allocation 1277
salloc: Waiting for resource configuration
salloc: Nodes mpsd-hpc-hp-002 are ready for job
user@mpsd-hpc-hp-003:~$ mpsd-show-job-resources
  32114 Nodes: mpsd-hpc-hp-002
  32114 Local Node: mpsd-hpc-hp-002

  32114 CPUSET: 48-53,144-149
  32114 MEMORY: 1000 M

This allocates memory from the bigmem partition for the job.

2.2.5. Finding about my jobs#

There are multiple ways of finding out about your slurm jobs:

  • squeue --me lists only your jobs (see below for output)

  • mpsd-show-job-resources can be used ‘inside’ the job (to verify hardware allocation is as desired)

  • scontrol show job JOBID provides a lot of detail

Example: We request 2 nodes, with 4 tasks (and by default one CPU per task)

user@mpsd-hpc-login1:~$ squeue --me
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
user@mpsd-hpc-login1:~$ salloc --nodes=2 --tasks-per-node=4
salloc: Granted job allocation 1276
user@mpsd-hpc-ibm-058:~$ squeue --me
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   1276    public interact  user  R      11:37      2 mpsd-hpc-ibm-[058-059]
user@mpsd-hpc-ibm-058:~$ mpsd-show-job-resources
 345591 Nodes: mpsd-hpc-ibm-[058-059]
 345591 Local Node: mpsd-hpc-ibm-058

 345591 CPUSET: 0-3
 345591 MEMORY: 14000 M
user@mpsd-hpc-ibm-058:~$ scontrol show job 1276
JobId=1276 JobName=interactive
   UserId=user(28479) GroupId=cfel(3512) MCS_label=N/A
   <...>
   RunTime=00:14:08 TimeLimit=01:00:00 TimeMin=N/A
   Partition=public AllocNode:Sid=mpsd-hpc-login1.desy.de:3116660
   NodeList=mpsd-hpc-ibm-[058-059]
   BatchHost=mpsd-hpc-ibm-058
   NumNodes=2 NumCPUs=8 NumTasks=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=8,mem=28000M,node=2,billing=8
   Socks/Node=* NtasksPerN:B:S:C=4:0:*:1 CoreSpec=*
   MinCPUsNode=4 MinMemoryCPU=4000M MinTmpDiskNode=0
   <...>

2.3. Storage and quotas#

The MPSD HPC system provides two file systems: /home and /scratch:

/home/$USER ($HOME)

  • home file system for code and scripts

  • user quota (storage limit): 100 GB

  • regular backups

  • users have access to the backup of their data under $HOME/.zfs/snapshots

/scratch/$USER

  • scratch file system for simulation output and other temporary data

  • there are no backups for /scratch: hardware error or human error can lead to data loss.

  • A per-user quota of (by default) 25TB is applied. This is in place to prevent jobs that (unintentionally) write arbitrary amount of data to /scratch from filling up the file system and blocking the system for everyone.

  • The following policy is applied to manage overall usage of /scratch:

    If /scratch fills up, the cluster becomes unusable. Should this happen, we will make space available through the following actions:

    1. purchase and installation of additional hardware to increase storage available /scratch (if funding and other constraints allow this)

    2. ask users to voluntarily reduce their usage of /scratch (by, for example, deleting some data, or archiving completed projects elsewhere)

    3. if 1. and 2. do not resolve the situation, a script will be started that deletes some of the files on /scratch (starting with the oldest files). Notice will be given of this procedure.

You can view your current file system usage using the mpsd-quota command. Example output:

username@mpsd-hpc-login2:~$ mpsd-quota
location                        used          avail            use%
/home/username               8.74 GB       98.63 GB           8.86%
/scratch/username          705.27 GB       25.00 TB           2.82%

Recommendation for usage of /home/$USER and /scratch/$USER:

Put small files and important data into /home/$USER. For example source code, scripts to compile your source, compiled software, scripts to submit jobs to slurm, post processing scripts, and perhaps also small data files.

Put simulation output (in particular if the files are large) into /scratch/$USER. All the data in /scratch should be re-computable with reasonable effort (typically by running scripts stored somewhere in /home/$USER). This re-computation may be needed if data loss occurs on /scratch, the hardware retires, or if data needs to be deleted from /scratch because we run out of space on /scratch.

Note

To facilitate the joint analysis of data, the /home and /scratch directories are set up such that all users can read all directories of all other users. If you want to keep your data in subfolder DIR private, you should run a command like chmod -R og-rx DIR.

The permissions on /home/$USER and /scratch/$USER are such that other users can enter your directory but not run ls (i.e. see what files and directories you have). To share data with someone else you need to tell them the full path to the relevant data. (This does by default not apply to any subfolders you create, so once others know a subfolder they can find and read all other content inside that subfolder.)

2.4. Software#

The software on the MPSD HPC system is provided via environment modules. This facilitates providing different versions of the same software. The software is organised in a hierarchical structure.

First, you need to decide which MPSD software environment version you need. These are named according to calendar years: the most recent one is 24a. We select that version using the mpsd-modules command, for example mpsd-modules 24a.

In order to use a module we first have to load a base compiler and MPI. That way we can choose between different compilers and MPI implementations for a software. More details are given below.

From a high-level perspective, the required steps to use a particular module are:

  1. Activate the MPSD software environment version of modules

  2. Search for the module to find available versions and required base modules

  3. Load required base modules (such as a compiler)

  4. Load the desired module

2.4.1. TLDR#

  • Modules are organised in a hierarchical structure with compiler and MPI implementation as base modules.

  • Modules may be compiled with different feature sets. Use mpsd-modules for switching.

    • a generic feature set (runs on all nodes), activated by default

      user@mpsd-hpc-login1:~$ mpsd-modules 24a
      
    • architecture-dependent feature sets (depending on the CPU microarchitecture $MPSD_MICROARCH of the nodes, a suitable optimised set is automatically selected when using the option native)

      user@mpsd-hpc-login1:~$ mpsd-modules 24a native
      

      For more options refer to mpsd-modules --help.

  • To subsequently find and load modules:

    • module avail

    • module spider <module-name>

    • module load <module1> [<module2> ...]

  • toolchain modules replicate easybuild toolchains (and add a few generic packages such as cmake and ninja)

  • octopus-dependencies modules depend on the toolchain modules and contain additional required and optional dependencies to compile Octopus. (Prior to MPSD software release 24a the split into toolchain and octopus-dependencies did not exist; instead a single toolchains module was available to load all modules of both categories.)

  • Configure wrapper scripts to compile octopus are available under /opt_mpsd/linux-debian11/24a/spack-environments/octopus/. The configure wrapper script path is also available as an environment variable $MPSD_OCTOPUS_CONFIGURE after a toolchain is loaded.

  • Once a module X is loaded, the environment variable $MPSD_X_ROOT provides the location of the module’s files. For example:

    user@mpsd-hpc-login1:~$ module load gcc/11.3.0 gsl/2.7.1
    user@mpsd-hpc-login1:~$ echo $MPSD_GSL_ROOT
    /opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/gsl-2.7.1-4zajlwxv4rm2mjkjoouvujth6lorbcm6
    
  • Set the rpath for dependencies; do not use LD_LIBRARY_PATH. See Setting the rpath (finding libraries at runtime).

  • If you compile a software with cmake you may run into problems with missing rpath in your resulting binary. If you face that problem you can unset CPATH and unset LIBRARY_PATH as a workaround.

2.4.2. Initial setup#

The MPSD HPC system consists of a heterogeneous set of compute nodes with different CPU features. This is reflected in the available software stack by providing both a generic set of modules that can be used on all nodes as well as specialised sets of modules for the different available (hardware) microarchitectures. The latter will only run on certain nodes.

A versioning scheme is used for the MPSD software environment to improve reproducibility. Currently, all software is available in the 24a release (i.e. the first release in 2024). Additional modules will be added to this environment as long as they do not break anything. Therefore, users should always specify the version of the modules they use (even if only a single version is available). A new release will be made if any addition/change would break backwards compatibility.

The heterogeneous setup makes it necessary to first add an additional path where module files can be found. To activate the different sets of modules we can use mpsd-modules. The function takes two arguments: the release number (of the MPSD software environment, mandatory) and the feature set (optional, the generic set is used by default). Calling mpsd-modules list lists all available releases, mpsd-modules <release number> list lists all available feature sets. Calling mpsd-modules --help will show help and list available options. The microarchitecture of each node is stored in the environment variable $MPSD_MICROARCH (and can also be obtained via archspec cpu).

To demonstrate the use of mpsd-modules we activate the generic module set of the software environment 24a. These modules can be used on all HPC nodes.

user@mpsd-hpc-login1:~$ mpsd-modules 24a

Now, we can list available modules. At the time of writing this produces (truncated):

user@mpsd-hpc-login1:~$ module avail

------------------------ /opt_mpsd/linux-debian11/24a/sandybridge/lmod/Core ------------------------
  anaconda3/2022.10                              toolchain/foss2022b-serial
  anaconda3/2023.09-0                     (D)    toolchain/foss2023a-mpi
  gcc/11.3.0                                     toolchain/foss2023a-serial
  gcc/12.2.0                                     toolchain/foss2023b-mpi
  gcc/12.3.0                                     toolchain/foss2023b-serial
  gcc/13.2.0                              (D)    toolchain/intel2022a-mpi
  intel-oneapi-compilers-classic/2021.6.0        toolchain/intel2022a-serial
  intel-oneapi-compilers-classic/2021.7.1        toolchain/intel2022b-mpi
  intel-oneapi-compilers-classic/2021.9.0 (D)    toolchain/intel2022b-serial
  toolchain/foss2022a-mpi                        toolchain/intel2023a-mpi
  toolchain/foss2022a-serial                     toolchain/intel2023a-serial (D)
  toolchain/foss2022b-mpi

--------------------------------- /usr/share/lmod/lmod/modulefiles ---------------------------------
  Core/lmod    Core/settarg (D)

---------------------------------- /usr/share/modules/modulefiles ----------------------------------
  mathematica    mathematica12p2    matlab    matlab2021b

  Where:
  D:  Default Module

We can only see a small number of modules. The reason for this is the hierarchical structure mentioned before. The majority of modules are only visible once we load a compiler (and depending on the package an MPI implementation).

We can load a compiler and again list available modules. Now many more are available:

user@mpsd-hpc-login1:~$ module load gcc/13.2.0
user@mpsd-hpc-login1:~$ module avail

--------------------- /opt_mpsd/linux-debian11/24a/sandybridge/lmod/gcc/13.2.0 ---------------------
  autoconf-archive/2023.02.20           libxdmcp/1.1.4
  autoconf/2.72                         libxfont/1.5.4
  automake/1.16.5                       libxml2/2.10.3
  bdftopcf/1.1                          libyaml/0.2.5
  berkeley-db/18.1.40                   lz4/1.9.4
  berkeleygw/3.1.0                      m4/1.4.19
  bigdft-atlab/1.9.3                    metis/5.1.0
  bigdft-futile/1.9.3                   mkfontdir/1.0.7
  bigdft-psolver/1.9.3                  mkfontscale/1.2.2
  binutils/2.40                         mpfr/4.2.0
  bison/3.8.2                           nasm/2.15.05
  boost/1.83.0                          ncurses/6.4
  bzip2/1.0.8                           netcdf-c/4.9.2
  c-blosc/1.21.5                        netcdf-fortran/4.6.1
  ca-certificates-mozilla/2023-05-30    nfft/3.5.3
  cgal/5.6                              nghttp2/1.57.0
  check/0.15.2                          ninja/1.10.2
  cmake/3.27.9                          ninja/1.11.1                (D)
  curl/8.4.0                            nlopt/2.7.1
  dftbplus/23.1                         numactl/2.0.14
  diffutils/3.9                         octopus-dependencies/full
  eigen/3.4.0                           openblas/0.3.24
  etsf-io/1.0.4                         openmpi/4.1.6
  expat/2.5.0                           openssh/9.5p1
  fftw/3.3.10                           openssl/3.1.3
  findutils/4.9.0                       pcre2/10.42
  flex/2.6.3                            perl-yaml/1.30
  font-util/1.4.0                       perl/5.38.0
  fontconfig/2.14.2                     pigz/2.7
  fontsproto/2.1.3                      pkgconf/1.9.5
  freetype/2.11.1                       pmix/5.0.1
  gdbm/1.23                             py-cython/0.29.36
  gettext/0.22.3                        py-docutils/0.20.1
  gmake/4.4.1                           py-flit-core/3.9.0
  gmp/6.2.1                             py-h5py/3.8.0
  gperf/3.1                             py-numpy/1.26.1
  gsl/2.7.1                             py-packaging/23.1
  hdf5/1.14.3                           py-pip/23.1.2
  hwloc/2.9.1                           py-pkgconfig/1.5.5
  inputproto/2.3.2                      py-poetry-core/1.6.1
  kbproto/1.0.7                         py-pyproject-metadata/0.7.1
  knem/1.1.4                            py-pyyaml/6.0
  krb5/1.20.1                           py-setuptools/68.0.0
  libaec/1.0.6                          py-wheel/0.41.2
  libbsd/0.11.7                         python/3.11.7
  libedit/3.1-20210216                  rdma-core/41.0
  libevent/2.1.12                       re2c/2.2
  libffi/3.4.4                          readline/8.2
  libfontenc/1.1.7                      snappy/1.1.10
  libgd/2.3.3                           sparskit/develop
  libiconv/1.17                         spglib/2.1.0
  libjpeg-turbo/3.0.0                   sqlite/3.43.2
  libmd/1.0.4                           swig/4.1.1
  libnl/3.3.0                           tar/1.34
  libpciaccess/0.17                     texinfo/7.0.3
  libpng/1.6.39                         ucx/1.15.0
  libpspio/0.3.0                        util-linux-uuid/2.38.1
  libpthread-stubs/0.4                  util-macros/1.19.3
  libsigsegv/2.14                       valgrind/3.20.0
  libtiff/4.5.1                         xcb-proto/1.15.2
  libtool/2.4.7                         xextproto/7.3.0
  libvdwxc/0.4.0                        xproto/7.0.31
  libx11/1.8.4                          xtrans/1.4.0
  libxau/1.0.8                          xz/5.4.1
  libxc/6.2.2                           zlib-ng/2.1.4
  libxcb/1.14                           zlib/1.3
  libxcrypt/4.4.35                      zstd/1.5.5

------------------------ /opt_mpsd/linux-debian11/24a/sandybridge/lmod/Core ------------------------
  anaconda3/2022.10                                toolchain/foss2022b-serial
  anaconda3/2023.09-0                     (D)      toolchain/foss2023a-mpi
  gcc/11.3.0                                       toolchain/foss2023a-serial
  gcc/12.2.0                                       toolchain/foss2023b-mpi
  gcc/12.3.0                                       toolchain/foss2023b-serial
  gcc/13.2.0                              (L,D)    toolchain/intel2022a-mpi
  intel-oneapi-compilers-classic/2021.6.0          toolchain/intel2022a-serial
  intel-oneapi-compilers-classic/2021.7.1          toolchain/intel2022b-mpi
  intel-oneapi-compilers-classic/2021.9.0 (D)      toolchain/intel2022b-serial
  toolchain/foss2022a-mpi                          toolchain/intel2023a-mpi
  toolchain/foss2022a-serial                       toolchain/intel2023a-serial (D)
  toolchain/foss2022b-mpi

...

We now unload all loaded modules:

user@mpsd-hpc-login1:~$ module purge

2.4.3. Loading specific packages#

To find a specific package we can use the module spider command. Without extra arguments this would list all modules. We can search for a specific module by adding the module name. For example, let us find the anaconda distribution:

user@mpsd-hpc-login1:~$ module spider anaconda

------------------------------------------------------------------------------------------------
  anaconda3:
------------------------------------------------------------------------------------------------
    Versions:
        anaconda3/2022.10
        anaconda3/2023.09-0

------------------------------------------------------------------------------------------------
  For detailed information about a specific "anaconda3" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

    $ module spider anaconda3/2023.09-0
------------------------------------------------------------------------------------------------

We can see that two different versions are available. We can get more specific information if we specify the version:

user@mpsd-hpc-login1:~$ module spider anaconda/2023.09-0

------------------------------------------------------------------------------------------------
  anaconda3: anaconda3/2023.09-0
------------------------------------------------------------------------------------------------

    This module can be loaded directly: module load anaconda3/2023.09-0

    Help:
      Anaconda is a free and open-source distribution of the Python and R
      programming languages for scientific computing, that aims to simplify
      package management and deployment. Package versions are managed by the
      package management system conda.

We can directly load the anaconda module:

user@mpsd-hpc-login1:~$ module load anaconda3/2023.09-0
user@mpsd-hpc-login1:~$ python --version
Python 3.11.5
user@mpsd-hpc-login1:~$ which python
/opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-x86_64_v2/gcc-10.2.1/anaconda3-2023.09-0-w3dmolygyqx4w6teluo3p5bq2taxnouo/bin/python

Most modules cannot be loaded directly. Instead we first have to load a compiler and sometimes also an MPI implementation. As an example we search for FFTW in version 3.3.10 (which we happen to know is available):

user@mpsd-hpc-login1:~$ module spider fftw/3.3.10

----------------------------------------------------------------------------
  fftw: fftw/3.3.10
----------------------------------------------------------------------------

    You will need to load all module(s) on any one of the lines below before the "fftw/3.3.10" module is available to load.

      gcc/11.3.0
      gcc/11.3.0  openmpi/4.1.4

    Help:
      FFTW is a C subroutine library for computing the discrete Fourier
      ...

FFTW 3.3.10 is available in two differen variants, with and without MPI support. We can load the version with MPI support by first loading gcc and openmpi:

user@mpsd-hpc-login1:~$ module load gcc/11.3.0 openmpi/4.1.4 fftw/3.3.10

Likewise, we can load the version without MPI support by just loading a compiler and FFTW:

user@mpsd-hpc-login1:~$ module purge
user@mpsd-hpc-login1:~$ module load gcc/11.3.0 fftw/3.3.10

If we need to know the location of the files associated with a module X, you can use the MPSD_X_ROOT environment variable. For example:

user@mpsd-hpc-login1:~$ echo $MPSD_FFTW_ROOT
/opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-qra6ez6es3unvk2i56hmkpfnmd2oxy3b

To get more detailed information, we can use module show X:

user@mpsd-hpc-login1:~$ module show fftw
--------------------------------------------------------------------------------------------------
  /opt_mpsd/linux-debian11/24a/sandybridge/lmod/openmpi/4.1.4-7imdm7p/gcc/11.3.0/fftw/3.3.10.lua:
--------------------------------------------------------------------------------------------------
whatis("Name : fftw")
whatis("Version : 3.3.10")
whatis("Target : sandybridge")
whatis("Short description : FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST). We believe that FFTW, which is free software, should become the FFT library of choice for most applications.")
...
prepend_path("LIBRARY_PATH","/opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-qra6ez6es3unvk2i56hmkpfnmd2oxy3b/lib")
prepend_path("CPATH","/opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-qra6ez6es3unvk2i56hmkpfnmd2oxy3b/include")
prepend_path("PATH","/opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-qra6ez6es3unvk2i56hmkpfnmd2oxy3b/bin")
...
prepend_path("CMAKE_PREFIX_PATH","/opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-11.3.0/fftw-3.3.10-qra6ez6es3unvk2i56hmkpfnmd2oxy3b/.")
...

2.4.4. Octopus#

As a second example for loading pre-compiled packages let us search for octopus:

user@mpsd-hpc-login1:~$ mpsd-modules 24a
user@mpsd-hpc-login1:~$ module spider octopus
------------------------------------------------------------------------------------------------
  octopus:
------------------------------------------------------------------------------------------------
    Versions:
        octopus/13.0
        octopus/14.0
...

Multiple versions of octopus are available. We can specify a particular version in order to get more information on how to load the module:

user@mpsd-hpc-login1:~$ module spider octopus/14.0
------------------------------------------------------------------------------------------------
  octopus: octopus/14.0
------------------------------------------------------------------------------------------------

    You will need to load all module(s) on any one of the lines below before the "octopus/14.0" module is available to load.

      gcc/11.3.0  openmpi/4.1.4

...

We can see that we have to first load gcc/11.3.0 and openmpi/4.1.4 in order to be able to load and use octopus/14.0.

Note

Sometimes module spider will suggest to either only load a compiler or compiler + MPI implementation. Then, we generally want to also load the MPI implementation as only this version of the program will use MPI. Loading the MPI-enabled version of the desired program is crucial when running a slurm job on multiple nodes.

We load gcc/11.3.0, openmpi/4.1.4 and finally octopus/14.0. All of this can be done in one line as long as the packages are given in the correct order (as shown by module spider):

user@mpsd-hpc-login1:~$ module load gcc/11.3.0 openmpi/4.1.4 octopus/14.0

As a first simple check we display the version number of octopus:

user@mpsd-hpc-login1:~$ octopus --version
octopus 14.0 (git commit )

2.4.5. Octopus with CUDA support#

To use Octopus on the GPU nodes, we need CUDA support. This is provided in the microarchitecture skylake_avx512 of the GPU nodes. To demonstrate this we first allocate interactive resources on the gpu partition, load the native software stack and then search for Octopus.

user@mpsd-hpc-login1:~$ salloc --partition gpu --gpus 1 --cpus-per-gpu 4
salloc: tres_per_job includes gpu: gpu:1
salloc: Granted job allocation 225605
salloc: Waiting for resource configuration
salloc: Nodes mpsd-hpc-gpu-002 are ready for job
user@mpsd-hpc-gpu-002:~$ mpsd-module 24a native
user@mpsd-hpc-gpu-002:~$ module spider octopus

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  octopus:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Versions:
        octopus/13.0-cuda-11.4.4
        octopus/13.0
        octopus/14.0-cuda-11.4.4
        octopus/14.0
...

user@mpsd-hpc-gpu-002:~$ module spider octopus/14.0-cuda-11.4.4

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  octopus: octopus/14.0-cuda-11.4.4
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    You will need to load all module(s) on any one of the lines below before the "octopus/14.0-cuda-11.4.4" module is available to load.

      gcc/11.3.0  openmpi/4.1.4-cuda-11.4.4

    Help:
      A real-space finite-difference (time-dependent) density-functional
      theory code.

...

Thus, we can activate Octopus with CUDA support using

user@mpsd-hpc-gpu-002:~$ module load gcc/11.3.0 openmpi/4.1.4-cuda-11.4.4 octopus/14.0-cuda-11.4.4

2.4.6. Python#

To use Python we can load the anaconda3 module:

user@mpsd-hpc-login1:~$ module load anaconda3/2023.09-0

Anaconda comes with a wide variety of pre-installed Python packages such as numpy, scipy, matplotlib, etc.

  1. Numpy example

    We can execute a small demo program called hello-numpy.py. The file has the following content.

    Todo

    Do we want to make the source for this available or use a different example

    import numpy as np
    
    print("Hello World")
    print(f"numpy version: {np.__version__}")
    x = np.arange(5)
    y = x**2
    print(y)
    
    user@mpsd-hpc-login1:~$ python3 hello_numpy.py
    Hello World
    numpy version: 1.21.5
    [ 0  1  4  9 16]
    
  2. Custom conda environment

    We can also create a separate conda environment if we need additional Python software or different versions. We suggest to use Miniconda for this (as documented here in more detail).

    First, we have to download the Miniconda installer from https://docs.conda.io/projects/miniconda/en/latest/#latest-miniconda-installer-links. Then, we can run the installer and follow the instructions.

    user@mpsd-hpc-login1:~$ wget https://repo.anaconda.com/miniconda/Miniconda3-py310_22.11.1-1-Linux-x86_64.sh
    user@mpsd-hpc-login1:~$ bash Miniconda3-py310_22.11.1-1-Linux-x86_64.sh
    

    Recommendations

    • run conda init when asked about to get access to the conda executable and

    • remove the auto-activation of the base environment to avoid potential conflicts: conda config --set auto_activate_base false

      (if you later need to use the base environment, use conda activate base)

    As an example we now create a new environment, called my_conda_env, with an older version of Python and a specific numpy version from the conda-forge channel.

    user@mpsd-hpc-login1:~$ conda create -n my_conda_env -c conda-forge python=3.9 numpy=1.23
    

    We can now activate the environment and check the versions of Python and numpy.

    user@mpsd-hpc-login1:~$ conda activate my_conda_env
    user@mpsd-hpc-login1:~$ python --version
    Python 3.9.16
    user@mpsd-hpc-login1:~$ python -c "import numpy; print(numpy.__version__)"
    1.23.5
    

    We can deactivate and remove the environment using:

    user@mpsd-hpc-login1:~$ conda deactivate my_conda_env
    user@mpsd-hpc-login1:~$ conda env remove -n my_conda_env
    

    Warning

    The local conda executable becomes unusable if you load and unload the anaconda3 module. If running into this problem start a new shell.

    Tip

    When using conda inside a slurm submission script (or other non-interactive shell), it is necesary to add the following line before activating an environment:

    eval "$(conda shell.bash hook)"
    

    Without this line, you may encounter errors related to conda not being able to find or activate the desired environment in your script or conda itself not being found.

  3. mpi4py example

    Todo

    review and consider suggesting combination with custom miniconda instead of anaconda module

    mpi4py is a Python package that allows you to use MPI from Python. This example needs installation of mpi4py in a custom (conda) environment. To be able to install it and use it, we need to load the openmpi module along with the anaconda3 module:

    user@mpsd-hpc-login1:~$ mpsd-modules dev-23a
    user@mpsd-hpc-login1:~$ module load gcc/11.3.0 openmpi/4.1.4
    user@mpsd-hpc-login1:~$ module load anaconda3
    user@mpsd-hpc-login1:~$ echo $MPICC
    

    We echo the value of the MPICC environment variable to check that it is set. This variable should be the same as the result from which mpicc. This variable is required for installing mpi4py to compile and link against the MPI library.

    Anaconda does not come with mpi4py pre-installed, so we install it in an environment called my_conda_env:

    user@mpsd-hpc-login1:~$ conda create -n my_conda_env -c conda-forge python=3.11
    user@mpsd-hpc-login1:~$ conda activate my_conda_env
    user@mpsd-hpc-login1:~$ pip install mpi4py
    

    We use pip install instead of conda install because the mpi4py package from the conda-forge channel is incompatible with the openmpi module we loaded.

    To quickly test the installation, we can run the hello world example:

    user@mpsd-hpc-login1:~$ srun -n 5 python -m mpi4py.bench helloworld
    Hello, World! I am process 0 of 5 on mpsd-hpc-ibm-023.
    Hello, World! I am process 1 of 5 on mpsd-hpc-ibm-023.
    Hello, World! I am process 2 of 5 on mpsd-hpc-ibm-023.
    Hello, World! I am process 3 of 5 on mpsd-hpc-ibm-023.
    Hello, World! I am process 4 of 5 on mpsd-hpc-ibm-023.
    

    Here is how you could replicate the same default hello world example in a python script:

    from mpi4py import MPI
    
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    
    print(f"Hello, World! I am process {rank} of {size} on {MPI.Get_processor_name()}.")
    

    Which can be run as previously mentioned using the srun command:

    user@mpsd-hpc-login1:~$ srun -n 5 python hello_mpi4py.py
    Hello, World! I am process 0 of 5 on mpsd-hpc-ibm-023.
    Hello, World! I am process 2 of 5 on mpsd-hpc-ibm-023.
    Hello, World! I am process 1 of 5 on mpsd-hpc-ibm-023.
    Hello, World! I am process 4 of 5 on mpsd-hpc-ibm-023.
    Hello, World! I am process 3 of 5 on mpsd-hpc-ibm-023.
    

    Recommendations

    • One needs to keep track of which conda environment was created with which version of the openmpi module: for example via the environment name, or a suitable script that loads the right openmpi module and activates the corresponding conda environment. If you need to use a different version of openmpi you need to create a new conda environment.

    • Always use the srun command to run MPI programs. This ensures that the MPI processes are started and managed by slurm scheduler on the allocated nodes and not on the login node.

2.4.7. Jupyter notebooks#

You can use a Jupyter notebook on a dedicated HPC node as follows:

  1. Ensure you are at MPSD or have the DESY VPN set up.

  2. Login to a login node (for example mosh mpsd-hpc-login1.desy.de, mosh is recommended over ssh to avoid losing the session in case of short connection interruptions e.g. on WiFi)

  3. Request a node for interactive use. For example, 1 node with 8 CPUs for 6 hours from the public partition:

    user@mpsd-hpc-login1:~$ salloc --nodes=1 --cpus-per-task=8 --time=6:00:00 -p public
    salloc: Granted job allocation 227596
    salloc: Waiting for resource configuration
    salloc: Nodes mpsd-hpc-ibm-021 are ready for job
    
  4. You can install Jupyter yourself, or you activate an installed version with the following commands:

    user@mpsd-hpc-ibm-021:~$ mpsd-modules dev-23a
    user@mpsd-hpc-ibm-021:~$ module load anaconda3
    
  5. Limit numpy (and other libraries) to the available cores

    user@mpsd-hpc-ibm-021:~$ export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    
  6. Start the Jupyter notebook (or Jupyter lab) server on that node with

    user@mpsd-hpc-ibm-021:~$ jupyter-notebook --no-browser --ip=${HOSTNAME}.desy.de
    

    Watch the output displayed in your terminal. There is a line similar to this one:

    http://mpsd-hpc-ibm-055.desy.de:8888/?token=8814fea339b8fe7d3a52e7d03c2ce942a3f35c8c263ff0b8

    which you can paste as a URL into your browser (on your laptop/Desktop), and you should be connected to the Notebook server on the compute node.

2.4.8. Matlab#

To use Matlab, we load the matlab module.

user@mpsd-hpc-login1:~$ module load matlab

We can execute a small demo program called hello_matlab.m. The file has the following content.

% hello_matlab.m
disp('Hello, MATLAB!');
a = [1 2 3; 4 5 6; 7 8 9];
b = ones(3, 3);
result = a + b;
disp('Matrix addition result:');
disp(result);
user@mpsd-hpc-login1:~$ matlab -nodisplay -r "run('hello_matlab');exit;"
Hello, MATLAB!
Matrix addition result:
   2     3     4
   5     6     7
   8     9    10

An interactive interface can be loaded by using matlab on the terminal.

user@mpsd-hpc-login1:~$ matlab

2.4.9. Loading a toolchain to compile Octopus#

There is also a set of special meta-modules, called toolchain. These load groups of modules (e.g. compiler, MPI, blas, cmake, …). The versions of the individual packages follow the easybuild toolchains (some additional packages are made available in our toolchains).

In addition to the toolchains we also provide a meta-module octopus-dependencies that provides all additional required and optional dependencies to compile Octopus from source. This meta-module depends on the toolchain meta-module: depending on the loaded toolchain version/variant different modules are loaded via octopus-dependencies.

Here, we show two examples how to compile Octopus, a serial and an MPI version. Following this guide is only recommended if you need to compile Octopus from source. We also provide pre-compiled modules for Octopus as outlined in Loading specific packages above.

As mentioned before, different variants of (most) modules are available that support different CPU feature sets. So far we mainly discussed the generic set that can be used on all nodes. In order to make use of all available features on a specific node we can instead load a more optimised set of modules. The CPU architecture is available in the environment variable $MPSD_MICORARCH.

First, we remove the generic module set and activate the optimised set for the current node (native automatically selects a suitable optimised module set).

user@mpsd-hpc-login1:~$ module purge
user@mpsd-hpc-login1:~$ mpsd-modules 24a native
  1. Parallel version of octopus

    We can load the toolchain foss2023a-mpi and octopus-dependencies to compile octopus using gcc 12.3.0 and openmpi 4.1.5:

    user@mpsd-hpc-login1:~$ module load toolchain/foss2023a-mpi octopus-dependencies
    

    Next, we clone Octopus:

    user@mpsd-hpc-login1:~$ git clone https://gitlab.com/octopus-code/octopus.git
    

    (If you intend to make changes in the octopus code, and push them back as a merge request later, you may want to use git clone git@gitlab.com:octopus-code/octopus.git to clone using ssh instead of https.)

    After cloning the Octopus repository, proceed with:

    user@mpsd-hpc-login1:~$ cd octopus
    user@mpsd-hpc-login1:~$ autoreconf -fi
    

    The SSU Computational science maintains a set of configure wrapper scripts that sources the configure script of Octopus with the right parameters for each toolchain. These can be used to compile Octopus with standard feature sets. These scripts are available at /opt_mpsd/linux-debian11/24a/spack-environments/octopus/. Here is an example of the script (foss2023a-mpi-config.sh) :

    #!/bin/sh
    export CC="mpicc"
    MARCH_FLAG="-march=${GCC_ARCH:-native}"
    OPTIMISATION_LEVEL="-O3"
    export CFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments"
    export CXX="mpicxx"
    export CXXFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments"
    export FC="mpif90"
    export FCFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments -ffree-line-length-none -fallow-argument-mismatch -fallow-invalid-boz"
    # default to the parent directory unless OCTOPUS_SRC is set
    [ "${OCTOPUS_SRC+1}" ] || export OCTOPUS_SRC='..'
    
    # Prepare setting the RPATHs for the executable
    export LDFLAGS=`echo ${LIBRARY_PATH:+:$LIBRARY_PATH} | sed -e 's/:/ -Wl,-rpath=/g'`
    
    # help configure to find the ELPA F90 modules and libraries
    try_mpsd_elpa_version=`expr match "$MPSD_ELPA_ROOT" '.*/elpa-\([0-9.]\+\)'`
    if [ -n "$try_mpsd_elpa_version" ] ; then
        if [ -r $MPSD_ELPA_ROOT/include/elpa_openmp-$try_mpsd_elpa_version/modules/elpa.mod ]; then
            export FCFLAGS_ELPA="-I$MPSD_ELPA_ROOT/include/elpa_openmp-$try_mpsd_elpa_version/modules"
            export LIBS_ELPA="-lelpa_openmp"
        elif [ -r $MPSD_ELPA_ROOT/include/elpa-$try_mpsd_elpa_version/modules/elpa.mod ] ; then
            export FCFLAGS_ELPA="-I$MPSD_ELPA_ROOT/include/elpa-$try_mpsd_elpa_version/modules"
        fi
    fi
    unset try_mpsd_elpa_version
    
    # always keep options in the order listed by ``configure --help``
    $OCTOPUS_SRC/configure \
        --enable-mpi \
        --enable-openmp \
        --with-libxc-prefix="$MPSD_LIBXC_ROOT" \
        --with-libvdwxc-prefix="$MPSD_LIBVDWXC_ROOT" \
        --with-blas="-L$MPSD_OPENBLAS_ROOT/lib -lopenblas" \
        --with-gsl-prefix="$MPSD_GSL_ROOT" \
        --with-fftw-prefix="$MPSD_FFTW_ROOT" \
        --with-pfft-prefix="$MPSD_PFFT_ROOT" \
        --with-nfft="$MPSD_NFFT_ROOT" \
        --with-pnfft-prefix="$MPSD_PNFFT_ROOT" \
        --with-berkeleygw-prefix="$MPSD_BERKELEYGW_ROOT" \
        --with-sparskit="$MPSD_SPARSKIT_ROOT/lib/libskit.a" \
        --with-nlopt-prefix="$MPSD_NLOPT_ROOT" \
        --with-blacs="-L$MPSD_NETLIB_SCALAPACK_ROOT/lib -lscalapack" \
        --with-scalapack="-L$MPSD_NETLIB_SCALAPACK_ROOT/lib -lscalapack" \
        --with-elpa-prefix="$MPSD_ELPA_ROOT" \
        --with-cgal="$MPSD_CGAL_ROOT" \
        --with-boost="$MPSD_BOOST_ROOT" \
        --with-metis-prefix="$MPSD_METIS_ROOT" \
        --with-parmetis-prefix="$MPSD_PARMETIS_ROOT" \
        --with-psolver-prefix="$MPSD_BIGDFT_PSOLVER_ROOT" \
        --with-futile-prefix="$MPSD_BIGDFT_FUTILE_ROOT" \
        --with-atlab-prefix="$MPSD_BIGDFT_ATLAB_ROOT" \
        --with-dftbplus-prefix="$MPSD_DFTBPLUS_ROOT" \
        --with-netcdf-prefix="$MPSD_NETCDF_FORTRAN_ROOT" \
        --with-etsf-io-prefix="$MPSD_ETSF_IO_ROOT" \
        "$@" | tee 00-configure.log 2>&1
    echo "-------------------------------------------------------------------------------" >&2
    echo "configure output has been saved to 00-configure.log" >&2
    if [ "x${GCC_ARCH-}" = x ] ; then
        echo "Microarchitecture optimization: native (set \$GCC_ARCH to override)" >&2
    else
        echo "Microarchitecture optimization: $GCC_ARCH (from override \$GCC_ARCH)" >&2
    fi
    echo "-------------------------------------------------------------------------------" >&2
    

    You can either source this script manually or use the environment variable MPSD_OCTOPUS_CONFIGURE which points to the right script for the current toolchain, once loaded. In the following example to configure and compile octopus, we show the use of the environment variable:

    1user@mpsd-hpc-login1:~$ autoreconf -i
    2user@mpsd-hpc-login1:~$ mkdir _build && cd _build
    3user@mpsd-hpc-login1:~$ source $MPSD_OCTOPUS_CONFIGURE --prefix=`pwd`
    4user@mpsd-hpc-login1:~$ make
    5user@mpsd-hpc-login1:~$ make check-short
    6user@mpsd-hpc-login1:~$ make install
    

    After line 5 (make) has completed, the Octopus binary is located at _build/src/octopus.

    If you need to use the make install target (line 7), you can define the location to which Octopus installs its binaries (bin), libraries (lib), documentation (doc) and more using the --prefix flag in line 4. In this example, the install prefix is the current working directory (pwd) in line 4, i.e. the _build directory, with the Octopus executable at _build/bin/octopus.

  2. Serial version of octopus

    Compiling the serial version in principle consists of the same steps as the parallel version. We use a different toolchain and configuration script.

    We load the serial toolchain:

    user@mpsd-hpc-login1:~$ module load toolchain/foss2023b-serial octopus-dependencies
    

    and use the following configure script (foss2023b-serial-config.sh):

    #!/bin/sh
    export CC="gcc"
    MARCH_FLAG="-march=${GCC_ARCH:-native}"
    OPTIMISATION_LEVEL="-O3"
    export CFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments"
    export CXX="g++"
    export CXXFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments"
    export FC="gfortran"
    export FCFLAGS="$MARCH_FLAG $OPTIMISATION_LEVEL -g -fno-var-tracking-assignments -ffree-line-length-none -fallow-argument-mismatch -fallow-invalid-boz"
    # default to the parent directory unless OCTOPUS_SRC is set
    [ "${OCTOPUS_SRC+1}" ] || export OCTOPUS_SRC='..'
    
    # Prepare setting the RPATHs for the executable
    export LDFLAGS=`echo ${LIBRARY_PATH:+:$LIBRARY_PATH} | sed -e 's/:/ -Wl,-rpath=/g'`
    
    # help configure to find the ELPA F90 modules and libraries
    try_mpsd_elpa_version=`expr match "$MPSD_ELPA_ROOT" '.*/elpa-\([0-9.]\+\)'`
    if [ -n "$try_mpsd_elpa_version" ] ; then
        if [ -r $MPSD_ELPA_ROOT/include/elpa_openmp-$try_mpsd_elpa_version/modules/elpa.mod ]; then
            export FCFLAGS_ELPA="-I$MPSD_ELPA_ROOT/include/elpa_openmp-$try_mpsd_elpa_version/modules"
            export LIBS_ELPA="-lelpa_openmp"
        elif [ -r $MPSD_ELPA_ROOT/include/elpa-$try_mpsd_elpa_version/modules/elpa.mod ] ; then
            export FCFLAGS_ELPA="-I$MPSD_ELPA_ROOT/include/elpa-$try_mpsd_elpa_version/modules"
        fi
    fi
    unset try_mpsd_elpa_version
    
    # always keep options in the order listed by ``configure --help``
    $OCTOPUS_SRC/configure \
        --enable-openmp \
        --with-libxc-prefix="$MPSD_LIBXC_ROOT" \
        --with-libvdwxc-prefix="$MPSD_LIBVDWXC_ROOT" \
        --with-blas="-L$MPSD_OPENBLAS_ROOT/lib -lopenblas" \
        --with-gsl-prefix="$MPSD_GSL_ROOT" \
        --with-fftw-prefix="$MPSD_FFTW_ROOT" \
        --with-nfft="$MPSD_NFFT_ROOT" \
        --with-netcdf-prefix="$MPSD_NETCDF_ROOT" \
        --with-etsf-io-prefix="$MPSD_ETSF_IO_ROOT" \
        --with-berkeleygw-prefix="$MPSD_BERKELEYGW_ROOT" \
        --with-sparskit="$MPSD_SPARSKIT_ROOT/lib/libskit.a" \
        --with-nlopt-prefix="$MPSD_NLOPT_ROOT" \
        --with-cgal="$MPSD_CGAL_ROOT" \
        --with-boost="$MPSD_BOOST_ROOT" \
        --with-metis-prefix="$MPSD_METIS_ROOT" \
        --with-psolver-prefix="$MPSD_BIGDFT_PSOLVER_ROOT" \
        --with-futile-prefix="$MPSD_BIGDFT_FUTILE_ROOT" \
        --with-atlab-prefix="$MPSD_BIGDFT_ATLAB_ROOT" \
        --with-dftbplus-prefix="$MPSD_DFTBPLUS_ROOT" \
        --with-netcdf-prefix="$MPSD_NETCDF_FORTRAN_ROOT" \
        --with-etsf-io-prefix="$MPSD_ETSF_IO_ROOT" \
        "$@" | tee 00-configure.log 2>&1
    echo "-------------------------------------------------------------------------------" >&2
    echo "configure output has been saved to 00-configure.log" >&2
    if [ "x${GCC_ARCH-}" = x ] ; then
        echo "Microarchitecture optimization: native (set \$GCC_ARCH to override)" >&2
    else
        echo "Microarchitecture optimization: $GCC_ARCH (from override \$GCC_ARCH)" >&2
    fi
    echo "-------------------------------------------------------------------------------" >&2
    

    We can then follow the instructions from the previous section.

  3. Passing additional arguments to configure

    The *-config.sh scripts shown above accept additional arguments and pass those on to the configure command of Octopus, for example foss2021a-serial-config.sh --prefix=<basedir> would pass the --prefix=<basedir> to the configure script (see Octopus documentation).

2.4.10. Compiling custom code#

To compile other custom code we may require a different collection of modules than the one provided by the toolchain and octopus-dependencies meta-modules. In these cases it might be necessary to manually load all required modules. The same general notes about generic and optimised module sets explained in the previous section apply.

Here, we show two different examples. The sources are available under /opt_mpsd/linux-debian11/24a/examples/slurm-examples

  1. Serial “Hello world” in Fortran

    First, we want to compile the following “hello world” Fortran program using gcc. We assume it is saved in a file hello.f90. The source is available in /opt_mpsd/linux-debian11/24a/examples/slurm-examples/serial-fortran.

    program hello
      write(*,*) "Hello world!"
    end program
    

    We have to load gcc:

    module load gcc/12.3.0
    

    Then, we can compile and execute the program:

    user@mpsd-hpc-login1:~$ gfortran -o hello hello.f90
    user@mpsd-hpc-login1:~$ ./hello
     Hello world!
    
  2. MPI-parallelised “Hello world” in C

    Todo

    consider using CMake for this example to showcase it

    As a second example we compile an MPI-parallelised “Hello world” C program, again using gcc. We assume the source is saved in a file hello-mpi.c (source available under /opt_mpsd/linux-debian11/24a/examples/slurm-examples/mpi-c).

    #include <mpi.h>
    #include <stdio.h>
    
    int main(int argc, char** argv) {
        MPI_Init(NULL, NULL);
    
        int world_size;
        MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    
        int world_rank;
        MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    
        char processor_name[MPI_MAX_PROCESSOR_NAME];
        int name_len;
        MPI_Get_processor_name(processor_name, &name_len);
    
        printf("Hello world from rank %d out of %d on %s.\n",
               world_rank, world_size, processor_name);
    
        MPI_Finalize();
    }
    

    We have to load gcc and openmpi:

    user@mpsd-hpc-login1:~$ module load gcc/12.3.0 openmpi/4.1.5
    

    Now, we can compile and execute the test program:

    user@mpsd-hpc-login1:~$ mpicc -o hello-mpi hello-mpi.c
    user@mpsd-hpc-login1:~$ orterun -n 4 ./hello-mpi
    Hello world from rank 2 out of 4 on mpsd-hpc-login1.
    Hello world from rank 3 out of 4 on mpsd-hpc-login1.
    Hello world from rank 1 out of 4 on mpsd-hpc-login1.
    Hello world from rank 0 out of 4 on mpsd-hpc-login1.
    

    Note

    Inside a slurm job srun has to be used instead of orterun.

2.4.11. Setting the rpath (finding libraries at runtime)#

This section is relevant if you compile your own software and need to link to libraries provided on the MPSD HPC system.

  1. Background

    At compile time (i.e. when compiling and building an executable), we need to tell the linker where to find external libraries. This happens via the -L flags and the environment variable LIBRARY_PATH which the compiler (for example gcc) passes on to the linker.

    At runtime, the dynamic linker ld.so needs to find libraries with the same SONAME for our executable by searching through one or more given directories. These directories can be taken from (in decreasing order of priority),

      1. a LD_LIBRARY_PATH if set,

      1. one or more rpath entries set in the executable,

    • (iii) if not found yet, the default search path defined in /etc/ld.so.conf.

  2. Use rpath; do not set LD_LIBRARY_PATH

    When we compile software on HPC systems, we generally want to use the rpath option. That means

    • (a) we must not set LD_LIBRARY_PATH environment variable. It also means

    • (b) we must set the rpath in the executable. To embed /PATH/TO/LIBRARY in the rpath entry in the header of the executable, we need to append -Wl,-rpath=/PATH/TO/LIBRARY to the call of the compiler.

  3. Example: Linking to FFTW

    Given this C program with name fftw_test.c:

    #include <stdio.h>
    #include <fftw3.h>
    
    #define N 32
    
    int main(int ARGC, char *ARGV) {
        fftw_complex *in, *out;
        fftw_plan p;
        in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
        out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
        p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
        fftw_execute(p); /* repeat as needed */
        fftw_destroy_plan(p);
        fftw_free(in); fftw_free(out);
        printf("Done.\n");
        return 0;
    }
    

    we can compile it as follows:

    $ mpsd-modules 24a
    $ module load gcc/12.3.0 fftw
    $ gcc -lfftw3 -L$MPSD_FFTW_ROOT/lib -Wl,-rpath=$MPSD_FFTW_ROOT/lib fftw_test.c -o fftw_test
    

    In the compile (and link) line, we have to specify the path to the relevant file libfftw3.so. For every package, the MPSD HPC system provides the relevant path to the package root in a environment variable of the form MPSD_<PACKAGE_NAME>_ROOT:

    $ echo $MPSD_FFTW_ROOT
    /opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-12.3.0/fftw-3.3.10-av3adybtusz4beo3ygg4fhubiezgymgc
    

    If we replace the variables in the compiler call, it would look as follows.

    $ gcc -lfftw3 \
        -L/opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-12.3.0/fftw-3.3.10-av3adybtusz4beo3ygg4fhubiezgymgc/lib \
        -Wl,-rpath=/opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-12.3.0/fftw-3.3.10-av3adybtusz4beo3ygg4fhubiezgymgc/lib \
        fftw_test.c   -o fftw_test
    

    Users are strongly advised to use the environment variables. They help ensure you are not pointing to incorrect or stale versions of the libraries used by outdated modules

    When loading modules (or complete toolchains) using the module command, the MPSD HPC system also populates a variable LIBRARY_PATH, which the compiler will use as an argument for -L if the variable exists. We can thus omit the -L in the call:

    $ gcc -lfftw3 -Wl,-rpath=$MPSD_FFTW_ROOT/lib fftw_test.c -o fftw_test
    

    We can use the ldd command to check which libraries the dynamic linker identifies:

    $ ldd fftw_test
      linux-vdso.so.1 (0x00007ffed3cfe000)
      libfftw3.so.3 => /opt_mpsd/linux-debian11/24a/sandybridge/spack/opt/spack/linux-debian11-sandybridge/gcc-12.3.0/fftw-3.3.10-av3adybtusz4beo3ygg4fhubiezgymgc/lib/libfftw3.so.3 (0x00007fa4f1e4f000)
      libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa4f1c67000)
      libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa4f1b23000)
      /lib64/ld-linux-x86-64.so.2 (0x00007fa4f2041000)
    
  4. Remember to check whether your build system can help you

    Doing these sort of calls by hand can be tedious and awkward, which in turn makes them error prone. If you are using a modern build system, e.g. CMake, there is a good chance that it can manage the rpath for you. Consult the documentation of your build tool to check if can support setting rpath and how to activate it.

    Note

    You may have to unset some of the environment variables that are exported when loading modules to avoid conflicts.

    E.g. when using CMake run unset LIBRARY_PATH and unset CPATH after loading all modules to avoid unexpected side-effects (e.g. missing rpath in the resulting binary).

  5. Setting rpath for multiple dependencies (useful template)

    As the LIBRARY_PATH variable contains the paths to all loaded libraries, we can use it to create the relevant -Wl,-rpath=... argument automatically as shown in this example:

    $ export LDFLAGS=`echo ${LIBRARY_PATH:+:$LIBRARY_PATH} | sed -e 's/:/ -Wl,-rpath=/g'`
    $ gcc -lfftw3 $LDFLAGS fftw_test.c -o fftw_test
    
  6. Related documentation

    Related documentation (addressing the same issue and approach) from the MPCDF is available at https://docs.mpcdf.mpg.de/faq/hpc_software.html#how-do-i-set-the-rpath

2.5. Example batch scripts#

Here, we show a number of example batch scripts for different types of jobs. All examples are available on the HPC system under /opt_mpsd/linux-debian11/24a/examples/slurm-examples together with the example programs. One can also get the latest copy of the scripts from the git repository here. We use the public partition and the generic module set for all examples.

To test an example on the HPC system we can copy the relevant directory into our scratch directory. If required we can compile the code using make and then submit the job using sbatch submission-script.sh.

2.5.1. MPI#

The source code and submission script are in /opt_mpsd/linux-debian11/24a/examples/slurm-examples/mpi-c.

#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J MPI-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=16
#SBATCH --time=00:10:00

. setup-env.sh

srun ./hello-mpi

2.5.2. MPI + OpenMP#

The source code and submission script are in /opt_mpsd/linux-debian11/24a/examples/slurm-examples/mpi-openmp-c.

#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J MPI-OpenMP-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=8
#SBATCH --time=00:10:00

. setup-env.sh

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# TODO from the MPCDF example, does the same apply here?
# For pinning threads correctly:
export OMP_PLACES=cores

srun ./hello-mpi-openmp

2.5.3. OpenMP#

The source code and submission script are in /opt_mpsd/linux-debian11/24a/examples/slurm-examples/openmp-c.

#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J OpenMP-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --time=00:10:00

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# TODO from the MPCDF example, does the same apply here?
# For pinning threads correctly:
export OMP_PLACES=cores

srun ./hello-openmp

2.5.4. Python with numpy or multiprocessing#

The source code and submission script are in /opt_mpsd/linux-debian11/24a/examples/slurm-examples/python-numpy.

#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J python-numpy-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --time=00:10:00

module purge
mpsd-modules 24a

module load anaconda3/2022.10
eval "$(conda shell.bash hook)"

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

srun python3 ./hello-numpy.py

2.5.5. Single-core job#

The source code and submission script are in /opt_mpsd/linux-debian11/24a/examples/slurm-examples/serial-fortran.

#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J serial-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:10:00

# TODO do we want to use any modules for this example

srun ./hello

2.5.6. Serial Python#

The source code and submission script are in /opt_mpsd/linux-debian11/24a/examples/slurm-examples/python-serial.

#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p public
#
# job name
#SBATCH -J python-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:10:00

module purge
mpsd-modules 24a

module load anaconda3/2022.10
eval "$(conda shell.bash hook)"

export OMP_NUM_THREADS=1  # restrict numpy (and other libraries) to one core

srun python3 ./hello.py

2.5.7. GPU jobs#

For GPU jobs, we recommend to specify desired hardware resources as follows. In parenthesis we provide the typical application (MPI, OpenMP) for guidance.

  • nodes - how many computers to use, for example --nodes=1

  • tasks-per-node - how many (MPI) processed to run per node: --tasks-per-node=4

  • gpus-per-task - how many GPUs per (MPI) process to use (often 1): --gpus-per-task=1

  • cpus-per-task - how many CPUs (OpenMP threads) to use: --cpus-per-task=4

Example:

user@mpsd-hpc-login1:~$ salloc --nodes=1 --tasks-per-node=4 --gpus-per-task=1 --cpus-per-task=4 --mem=128G -p gpu
user@mpsd-hpc-gpu-002:~$ mpsd-show-job-resources
   9352 Nodes: mpsd-hpc-gpu-002
   9352 Local Node: mpsd-hpc-gpu-002

   9352 CPUSET: 0-7,16-23
   9352 MEMORY: 131072 M

   9352 GPUs (Interconnects, CPU Affinity, NUMA Affinity):
   9352 GPU0     X  NV1 NV1 NV2 SYS 0-7,16-23   0-1
   9352 GPU1    NV1  X  NV2 NV1 SYS 0-7,16-23   0-1
   9352 GPU2    NV1 NV2  X  NV2 SYS 0-7,16-23   0-1
   9352 GPU3    NV2 NV1 NV2  X  SYS 0-7,16-23   0-1
user@mpsd-hpc-gpu-002:~$

We can see from the output that we have one node (mpsd-hpc-gpu-002), 16 CPUs (with ids 0 to 7 and 16 to 23), 128GB (=131072MiB), and 4 GPUs allocated (GPU0 to GPU3).

We can confirm the number of (MPI) tasks to be 4:

user@mpsd-hpc-gpu-002:~$ srun echo `hostname`
mpsd-hpc-gpu-002
mpsd-hpc-gpu-002
mpsd-hpc-gpu-002
mpsd-hpc-gpu-002

The source code and submission script for one CUDA example are in /opt_mpsd/linux-debian11/24a/examples/slurm-examples/cuda.

#!/bin/bash --login
#
# Standard output and error
#SBATCH -o ./out.%j
#SBATCH -e ./err.%j
#
# working directory
#SBATCH -D ./
#
# partition
#SBATCH -p gpu
#
# job name
#SBATCH -J CUDA-MPI-example
#
#SBATCH --mail-type=ALL
#
# job requirements
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --gpus-per-task=1
#SBATCH --cpus-per-task=2
#SBATCH --time=00:02:00

. setup-env.sh

srun ./hello-cuda

2.5.8. Multiple tasks per GPU#

If multiple tasks (i.e. multiple MPI ranks) should be used per GPU, we recommend to request resources from the perspective of a GPU

  • nodes - how many computers to use, for example --nodes=1

  • gpus-per-node - how many GPUs in each node you want to use

  • cpus-per-gpu - how many CPUs per GPU you want to use

  • cpus-per-task - how many CPUs you want to use in each task

So in this case:

  • you set the total number of tasks only implicitly

  • but each task is running on CPUs with the fastest access to the allocated GPU

When trying other setups for this scenario, SLURM complained and even put Array-Jobs on Hold.

Example:

user@mpsd-hpc-login1:~$ salloc --nodes=1 --gpus-per-node=1 --cpus-per-gpu=8 --cpus-per-task=4 --mem=128G -p gpu
user@mpsd-hpc-gpu-003:~$ mpsd-show-job-resources
 122314 Nodes: mpsd-hpc-gpu-003
 122314 Local Node: mpsd-hpc-gpu-003

 122314 CPUSET: 0,2,4,6,40,42,44,46
 122314 MEMORY: 65536 M

 122314 GPUs (Interconnects, CPU Affinity, NUMA Affinity):
 122314 GPU0     X  SYS 0,2,4,6,40  0-1
user@mpsd-hpc-gpu-002:~$

We expect 2 tasks because we have 4 cpus per task, and 8 cpus in total. We can confirm the number of (MPI) tasks:

user@mpsd-hpc-gpu-003:~$ srun echo `hostname`
mpsd-hpc-gpu-003
mpsd-hpc-gpu-003