LUMI (CSC)

The LUMI cluster is located at CSC.

Introduction

If you are new to this system, please see the following resources:

Lumi user guide
Batch system: Slurm
Jupyter service ?
Production directories:
- LUMI-P: 4 independent [Lustre](https://docs.lumi-supercomputer.eu/hardware/storage/lumip/#lustre) file systems
- LUMI-F: a fast Lustre file system
- LUMI-O: object storage

Installation

Use the following commands to download the WarpX source code and switch to the correct branch:

git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx

We use the following modules and environments on the system ($HOME/lumi_warpx.profile).

Listing 30 You can copy this file from Tools/machines/lumi-csc/lumi_warpx.profile.example.

# please set your project account
#export proj=<yourProject>

# optional: just an additional text editor
# module load nano

# required dependencies
module load CrayEnv
module load craype-accel-amd-gfx90a
module load cray-mpich
module load rocm
module load buildtools/22.08

# optional: faster re-builds
#module load ccache

# optional: for PSATD in RZ geometry support
# TODO: BLAS++, LAPACK++

# optional: for QED lookup table generation support
# TODO: BOOST

# optional: for openPMD support
# TODO: HDF5, ADIOS2

# optional: Ascent in situ support
# TODO

# optional: for Python bindings or libEnsemble
# TODO

if [ -d "$HOME/sw/venvs/warpx-lumi" ]
then
  source $HOME/sw/venvs/warpx-lumi/bin/activate
fi

# an alias to request an interactive batch node for two hours
#   for paralle execution, start on the batch node: jsrun <command>
#alias getNode="..."
# an alias to run a command on a batch node for up to 30min
#   usage: nrun <command>
#alias runNode="..."

# GPU-aware MPI
export MPICH_GPU_SUPPORT_ENABLED=1

# optimize ROCm/HIP compilation for MI250X
export AMREX_AMD_ARCH=gfx90a

# compiler environment hints
export CC=$(which cc)
export CXX=$(which CC)
export FC=$(which ftn)

We recommend to store the above lines in a file, such as $HOME/lumi_warpx.profile, and load it into your shell after a login:

source $HOME/lumi_warpx.profile

Then, cd into the directory $HOME/src/warpx and use the following commands to compile:

cd $HOME/src/warpx
rm -rf build

cmake -S . -B build -DWarpX_DIMS=3 -DWarpX_COMPUTE=HIP -DWarpX_PSATD=ON
cmake --build build -j 6

The general cmake compile-time options apply as usual.

That’s it! A 3D WarpX executable is now in build/bin/ and can be run with a 3D example inputs file. Most people execute the binary directly or copy it out to a location in LUMI-P.

Running

MI250X GPUs (2x64 GB)

In non-interactive runs:

Listing 31 You can copy this file from Tools/machines/lumi-csc/submit.sh.

#!/bin/bash

#SBATCH -A <project id>
#SBATCH -J warpx
#SBATCH -o %x-%j.out
#SBATCH -t 00:10:00
# Early access to the GPU partition
#SBATCH -p eap
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1
#SBATCH --gpus-per-node=8
#SBATCH --gpu-bind=closest

export MPICH_GPU_SUPPORT_ENABLED=1

# note (12-12-22)
# this environment setting is currently needed on LUMI to work-around a
# known issue with Libfabric
#export FI_MR_CACHE_MAX_COUNT=0  # libfabric disable caching
# or, less invasive:
export FI_MR_CACHE_MONITOR=memhooks  # alternative cache monitor

# note (9-2-22, OLCFDEV-1079)
# this environment setting is needed to avoid that rocFFT writes a cache in
# the home directory, which does not scale.
export ROCFFT_RTC_CACHE_PATH=/dev/null

export OMP_NUM_THREADS=1
srun ../warpx inputs > outputs

Post-Processing

Note

TODO: Document any Jupyter or data services.

Known System Issues

Warning

December 12th, 2022: There is a caching bug in libFabric that causes WarpX simulations to occasionally hang on LUMI on more than 1 node.

As a work-around, please export the following environment variable in your job scripts until the issue is fixed:

#export FI_MR_CACHE_MAX_COUNT=0  # libfabric disable caching
# or, less invasive:
export FI_MR_CACHE_MONITOR=memhooks  # alternative cache monitor

Warning

January, 2023: We discovered a regression in AMD ROCm, leading to 2x slower current deposition (and other slowdowns) in ROCm 5.3 and 5.4. Reported to AMD and investigating.

Stay with the ROCm 5.2 module to avoid.