Fugaku (Riken)

The Fugaku cluster is located at the Riken Center for Computational Science (Japan).

Introduction

If you are new to this system, please see the following resources:

Installation

Use the following commands to download the WarpX source code and switch to the correct branch:

git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx

Compiling WarpX on Fugaku is more pratical on a compute node. Use the following commands to acquire a compute node for one hour:

pjsub --interact -L "elapse=02:00:00" -L "node=1" --sparam "wait-time=300" --mpi "max-proc-per-node=48" --all-mount-gfscache

Then, load cmake and ninja using spack:

. /vol0004/apps/oss/spack/share/spack/setup-env.sh
spack load cmake@3.21.4%fj@4.8.1 arch=linux-rhel8-a64fx

# optional: faster builds
spack load ninja@1.11.1%fj@4.8.1

# avoid harmless warning messages "[WARN] xos LPG [...]"
export LD_LIBRARY_PATH=/lib64:$LD_LIBRARY_PATH

At this point we need to download and compile the libraries required for OpenPMD support:

Finally, cd into the directory $HOME/src/warpx and use the following commands to compile:

cd $HOME/src/warpx
rm -rf build

export CC=$(which mpifcc)
export CXX=$(which mpiFCC)
export CFLAGS="-Nclang"
export CXXFLAGS="-Nclang"

cmake -S . -B build -DWarpX_COMPUTE=OMP \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_FLAGS_RELEASE="-Ofast -mllvm -polly -mllvm -polly-parallel" \
-DAMReX_DIFFERENT_COMPILER=ON \
-DWarpX_MPI_THREAD_MULTIPLE=OFF

cmake --build build -j 48

The general cmake compile-time options apply as usual.

That’s it! A 3D WarpX executable is now in build/bin/ and can be run with a 3D example inputs file.

Running

A64FX CPUs

In non-interactive runs, you can use pjsub submit.sh where submit.sh can be adapted from:

Listing 20 You can copy this file from Tools/machines/fugaku-riken/submit.sh.
#!/bin/bash
#PJM -L "node=48"
#PJM -L "rscgrp=small"
#PJM -L "elapse=0:30:00"
#PJM -s
#PJM -L "freq=2200,eco_state=2"
#PJM --mpi "max-proc-per-node=12"
#PJM -x PJM_LLIO_GFSCACHE=/vol0004:/vol0003
#PJM --llio localtmp-size=10Gi
#PJM --llio sharedtmp-size=10Gi

export NODES=48
export MPI_RANKS=$((NODES * 12))
export OMP_NUM_THREADS=4

export EXE="./warpx"
export INPUT="i.3d"

export XOS_MMM_L_PAGING_POLICY=demand:demand:demand

llio_transfer ${EXE}

mpiexec -stdout-proc ./output.%j/%/1000r/stdout -stderr-proc ./output.%j/%/1000r/stderr -n ${MPI_RANKS} ${EXE} ${INPUT}

llio_transfer --purge ${EXE}

Note: the Boost Eco Mode mode that is set in this example increases the default frequency of the A64FX from 2 GHz to 2.2 GHz, while at the same time switching off one of the two floating-point arithmetic pipelines. Some preliminary tests with WarpX show that this mode achieves performances similar to those of the normal mode but with a reduction of the energy consumption of approximately 20%.