Perlmutter (NERSC)
The Perlmutter cluster is located at NERSC.
Introduction
If you are new to this system, please see the following resources:
Batch system: Slurm
-
$PSCRATCH
: per-user production directory, purged every 30 days (<TBD>TB)/global/cscratch1/sd/m3239
: shared production directory for users in the projectm3239
, purged every 30 days (50TB)/global/cfs/cdirs/m3239/
: community file system for users in the projectm3239
(100TB)
Installation
Use the following commands to download the WarpX source code and switch to the correct branch:
git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx
On Perlmutter, you can run either on GPU nodes with fast A100 GPUs (recommended) or CPU nodes.
We use the following modules and environments on the system (
$HOME/perlmutter_gpu_warpx.profile
).# please set your project account #export proj="<yourProject>_g" # change me # required dependencies module load cmake/3.22.0 # optional: for QED support with detailed tables export BOOST_ROOT=/global/common/software/spackecp/perlmutter/e4s-22.05/78535/spack/opt/spack/cray-sles15-zen3/gcc-11.2.0/boost-1.79.0-lmdngktoeuwdi6ty55noznunah2mvk5w # optional: for openPMD and PSATD+RZ support module load cray-hdf5-parallel/1.12.2.1 export CMAKE_PREFIX_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/c-blosc-1.21.1:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/adios2-2.8.3:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/blaspp-master:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/lapackpp-master:$CMAKE_PREFIX_PATH export LD_LIBRARY_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/c-blosc-1.21.1/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/adios2-2.8.3/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/blaspp-master/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/lapackpp-master/lib64:$LD_LIBRARY_PATH # optional: CCache export PATH=/global/common/software/spackecp/perlmutter/e4s-22.05/78535/spack/opt/spack/cray-sles15-zen3/gcc-11.2.0/ccache-4.5.1-ybl7xefvggn6hov4dsdxxnztji74tolj/bin:$PATH # optional: for Python bindings or libEnsemble module load cray-python/3.9.13.1 if [ -d "${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/venvs/warpx" ] then source ${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/venvs/warpx/bin/activate fi # an alias to request an interactive batch node for one hour # for parallel execution, start on the batch node: srun <command> alias getNode="salloc -N 1 --ntasks-per-node=4 -t 1:00:00 -q interactive -C gpu --gpu-bind=single:1 -c 32 -G 4 -A $proj" # an alias to run a command on a batch node for up to 30min # usage: runNode <command> alias runNode="srun -N 1 --ntasks-per-node=4 -t 0:30:00 -q interactive -C gpu --gpu-bind=single:1 -c 32 -G 4 -A $proj" # necessary to use CUDA-Aware MPI and run a job export CRAY_ACCEL_TARGET=nvidia80 # optimize CUDA compilation for A100 export AMREX_CUDA_ARCH=8.0 # optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3) # note: the cc/CC/ftn wrappers below add those export CXXFLAGS="-march=znver3" export CFLAGS="-march=znver3" # compiler environment hints export CC=cc export CXX=CC export FC=ftn export CUDACXX=$(which nvcc) export CUDAHOSTCXX=CCWe recommend to store the above lines in a file, such as
$HOME/perlmutter_gpu_warpx.profile
, and load it into your shell after a login:source $HOME/perlmutter_gpu_warpx.profileAnd since Perlmutter does not yet provide a module for them, install ADIOS2, BLAS++ and LAPACK++:
# c-blosc (I/O compression) git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git src/c-blosc rm -rf src/c-blosc-pm-build cmake -S src/c-blosc -B src/c-blosc-pm-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/c-blosc-1.21.1 cmake --build src/c-blosc-pm-build --target install --parallel 16 # ADIOS2 git clone -b v2.8.3 https://github.com/ornladios/ADIOS2.git src/adios2 rm -rf src/adios2-pm-build cmake -S src/adios2 -B src/adios2-pm-build -DADIOS2_USE_Blosc=ON -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/adios2-2.8.3 cmake --build src/adios2-pm-build --target install -j 16 # BLAS++ (for PSATD+RZ) git clone https://github.com/icl-utk-edu/blaspp.git src/blaspp rm -rf src/blaspp-pm-build CXX=$(which CC) cmake -S src/blaspp -B src/blaspp-pm-build -Duse_openmp=OFF -Dgpu_backend=cuda -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/blaspp-master cmake --build src/blaspp-pm-build --target install --parallel 16 # LAPACK++ (for PSATD+RZ) git clone https://github.com/icl-utk-edu/lapackpp.git src/lapackpp rm -rf src/lapackpp-pm-build CXX=$(which CC) CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S src/lapackpp -B src/lapackpp-pm-build -DCMAKE_CXX_STANDARD=17 -Dbuild_tests=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_INSTALL_PREFIX=${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/lapackpp-master cmake --build src/lapackpp-pm-build --target install --parallel 16Optionally, download and install Python packages for PICMI or dynamic ensemble optimizations (libEnsemble):
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv
python3 -m pip cache purge
rm -rf ${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/venvs/warpx
python3 -m venv ${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/venvs/warpx
source ${CFS}/${proj%_g}/${USER}/sw/perlmutter/gpu/venvs/warpx/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade cython
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
python3 -m pip install --upgrade scipy
MPICC="cc -target-accel=nvidia80 -shared" python3 -m pip install --upgrade mpi4py --no-build-isolation --no-binary mpi4py
python3 -m pip install --upgrade openpmd-api
python3 -m pip install --upgrade matplotlib
python3 -m pip install --upgrade yt
# optional: for libEnsemble
python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt
# optional: for optimas (based on libEnsemble & ax->botorch->gpytorch->pytorch)
python3 -m pip install --upgrade torch # CUDA 11.7 compatible wheel
python3 -m pip install -r $HOME/src/warpx/Tools/optimas/requirements.txt
Then, cd
into the directory $HOME/src/warpx
and use the following commands to compile:
cd $HOME/src/warpx
rm -rf build
cmake -S . -B build -DWarpX_DIMS=3 -DWarpX_COMPUTE=CUDA -DWarpX_PSATD=ON
cmake --build build -j 16
We use the following modules and environments on the system (
$HOME/perlmutter_cpu_warpx.profile
).# please set your project account #export proj="<yourProject>" # change me # required dependencies module load cpu module load cmake/3.22.0 module load cray-fftw/3.3.10.3 # optional: for QED support with detailed tables export BOOST_ROOT=/global/common/software/spackecp/perlmutter/e4s-22.05/78535/spack/opt/spack/cray-sles15-zen3/gcc-11.2.0/boost-1.79.0-lmdngktoeuwdi6ty55noznunah2mvk5w # optional: for openPMD and PSATD+RZ support module load cray-hdf5-parallel/1.12.2.1 export CMAKE_PREFIX_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/c-blosc-1.21.1:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/adios2-2.8.3:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/blaspp-master:$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/lapackpp-master:$CMAKE_PREFIX_PATH export LD_LIBRARY_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/c-blosc-1.21.1/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/adios2-2.8.3/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/blaspp-master/lib64:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/lapackpp-master/lib64:$LD_LIBRARY_PATH # optional: CCache export PATH=/global/common/software/spackecp/perlmutter/e4s-22.05/78535/spack/opt/spack/cray-sles15-zen3/gcc-11.2.0/ccache-4.5.1-ybl7xefvggn6hov4dsdxxnztji74tolj/bin:$PATH # optional: for Python bindings or libEnsemble module load cray-python/3.9.13.1 if [ -d "${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/venvs/warpx" ] then source ${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/venvs/warpx/bin/activate fi # an alias to request an interactive batch node for one hour # for parallel execution, start on the batch node: srun <command> alias getNode="salloc --nodes 1 --qos interactive --time 01:00:00 --constraint cpu --account=$proj" # an alias to run a command on a batch node for up to 30min # usage: runNode <command> alias runNode="srun --nodes 1 --qos interactive --time 01:00:00 --constraint cpu $proj" # optimize CPU microarchitecture for AMD EPYC 3rd Gen (Milan/Zen3) # note: the cc/CC/ftn wrappers below add those export CXXFLAGS="-march=znver3" export CFLAGS="-march=znver3" # compiler environment hints export CC=cc export CXX=CC export FC=ftnWe recommend to store the above lines in a file, such as
$HOME/perlmutter_cpu_warpx.profile
, and load it into your shell after a login:source $HOME/perlmutter_cpu_warpx.profileAnd since Perlmutter does not yet provide a module for them, install ADIOS2, BLAS++ and LAPACK++:
# c-blosc (I/O compression) git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git src/c-blosc rm -rf src/c-blosc-pm-build cmake -S src/c-blosc -B src/c-blosc-pm-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/c-blosc-1.21.1 cmake --build src/c-blosc-pm-build --target install --parallel 16 # ADIOS2 git clone -b v2.8.3 https://github.com/ornladios/ADIOS2.git src/adios2 rm -rf src/adios2-pm-build cmake -S src/adios2 -B src/adios2-pm-build -DADIOS2_USE_Blosc=ON -DADIOS2_USE_CUDA=OFF -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/adios2-2.8.3 cmake --build src/adios2-pm-build --target install -j 16 # BLAS++ (for PSATD+RZ) git clone https://github.com/icl-utk-edu/blaspp.git src/blaspp rm -rf src/blaspp-pm-build CXX=$(which CC) cmake -S src/blaspp -B src/blaspp-pm-build -Duse_openmp=ON -Dgpu_backend=OFF -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/blaspp-master cmake --build src/blaspp-pm-build --target install --parallel 16 # LAPACK++ (for PSATD+RZ) git clone https://github.com/icl-utk-edu/lapackpp.git src/lapackpp rm -rf src/lapackpp-pm-build CXX=$(which CC) CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S src/lapackpp -B src/lapackpp-pm-build -DCMAKE_CXX_STANDARD=17 -Dbuild_tests=OFF -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_INSTALL_PREFIX=${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/lapackpp-master cmake --build src/lapackpp-pm-build --target install --parallel 16Optionally, download and install Python packages for PICMI or dynamic ensemble optimizations (libEnsemble):
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv
python3 -m pip cache purge
rm -rf ${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/venvs/warpx
python3 -m venv ${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/venvs/warpx
source ${CFS}/${proj%_g}/${USER}/sw/perlmutter/cpu/venvs/warpx/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade cython
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
python3 -m pip install --upgrade scipy
MPICC="cc -shared" python3 -m pip install --upgrade mpi4py --no-build-isolation --no-binary mpi4py
python3 -m pip install --upgrade openpmd-api
python3 -m pip install --upgrade matplotlib
python3 -m pip install --upgrade yt
# optional: for libEnsemble
python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt
Then, cd
into the directory $HOME/src/warpx
and use the following commands to compile:
cd $HOME/src/warpx
rm -rf build
cmake -S . -B build -DWarpX_DIMS=3 -DWarpX_COMPUTE=OMP -DWarpX_PSATD=ON
cmake --build build -j 16
The general cmake compile-time options apply as usual.
That’s it!
A 3D WarpX executable is now in build/bin/
and can be run with a 3D example inputs file.
Most people execute the binary directly or copy it out to a location in $PSCRATCH
.
For a full PICMI install, follow the instructions for Python (PICMI) bindings:
export WARPX_COMPUTE=CUDA
export WARPX_COMPUTE=OMP
# PICMI build
cd $HOME/src/warpx
# install or update dependencies
python3 -m pip install -r requirements.txt
# compile parallel PICMI interfaces in 3D, 2D, 1D and RZ
WARPX_MPI=ON WARPX_PSATD=ON BUILD_PARALLEL=16 python3 -m pip install --force-reinstall --no-deps -v .
Or, if you are developing, do a quick PICMI install of a single geometry (see: WarpX_DIMS) using:
# find dependencies & configure
cmake -S . -B build -DWarpX_COMPUTE=${WARPX_COMPUTE} -DWarpX_PSATD=ON -DWarpX_LIB=ON -DWarpX_DIMS=RZ
# build and then call "python3 -m pip install ..."
cmake --build build --target pip_install -j 16
Running
A100 GPUs (40 GB)
The batch script below can be used to run a WarpX simulation on multiple nodes (change -N
accordingly) on the supercomputer Perlmutter at NERSC.
This partition as up to 1536 nodes.
Replace descriptions between chevrons <>
by relevant values, for instance <input file>
could be plasma_mirror_inputs
.
Note that we run one MPI rank per GPU.
#!/bin/bash -l
# Copyright 2021-2023 Axel Huebl, Kevin Gott
#
# This file is part of WarpX.
#
# License: BSD-3-Clause-LBNL
#SBATCH -t 00:10:00
#SBATCH -N 2
#SBATCH -J WarpX
# note: <proj> must end on _g
#SBATCH -A <proj>
#SBATCH -q regular
# A100 40GB (most nodes)
#SBATCH -C gpu
# A100 80GB (256 nodes)
#S BATCH -C gpu&hbm80g
#SBATCH --exclusive
#SBATCH --gpu-bind=none
#SBATCH --gpus-per-node=4
#SBATCH -o WarpX.o%j
#SBATCH -e WarpX.e%j
# executable & inputs file or python interpreter & PICMI script here
EXE=./warpx
INPUTS=inputs_small
# pin to closest NIC to GPU
export MPICH_OFI_NIC_POLICY=GPU
# threads for OpenMP and threaded compressors per MPI rank
export SRUN_CPUS_PER_TASK=32
# depends on https://github.com/ECP-WarpX/WarpX/issues/2009
#GPU_AWARE_MPI="amrex.the_arena_is_managed=0 amrex.use_gpu_aware_mpi=1"
GPU_AWARE_MPI=""
# CUDA visible devices are ordered inverse to local task IDs
# Reference: nvidia-smi topo -m
srun --cpu-bind=cores bash -c "
export CUDA_VISIBLE_DEVICES=\$((3-SLURM_LOCALID));
${EXE} ${INPUTS} ${GPU_AWARE_MPI}" \
> output.txt
To run a simulation, copy the lines above to a file perlmutter_gpu.sbatch
and run
sbatch perlmutter.sbatch
to submit the job.
A100 GPUs (80 GB)
Perlmutter has 256 nodes that provide 80 GB HBM per A100 GPU.
Replace -C gpu
with -C gpu&hbm80g
in the above job script to use these large-memory GPUs.
CPUs: 2x AMD EPYC 7763
The Perlmutter CPU partition as up to 3072 nodes.
#!/bin/bash -l
# Copyright 2021-2023 WarpX
#
# This file is part of WarpX.
#
# Authors: Axel Huebl
# License: BSD-3-Clause-LBNL
#SBATCH -t 00:10:00
#SBATCH -N 2
#SBATCH -J WarpX
#SBATCH -A <proj>
#SBATCH -q regular
#SBATCH -C cpu
#SBATCH --ntasks-per-node=16
#SBATCH --exclusive
#SBATCH -o WarpX.o%j
#SBATCH -e WarpX.e%j
# executable & inputs file or python interpreter & PICMI script here
EXE=./warpx
INPUTS=inputs_small
# each CPU node on Perlmutter (NERSC) has 64 hardware cores with
# 2x Hyperthreading/SMP
# https://en.wikichip.org/wiki/amd/epyc/7763
# https://www.amd.com/en/products/cpu/amd-epyc-7763
# Each CPU is made up of 8 chiplets, each sharing 32MB L3 cache.
# This will be our MPI rank assignment (2x8 is 16 ranks/node).
# threads for OpenMP and threaded compressors per MPI rank
export SRUN_CPUS_PER_TASK=16 # 8 cores per chiplet, 2x SMP
export OMP_PLACES=threads
export OMP_PROC_BIND=spread
srun --cpu-bind=cores \
${EXE} ${INPUTS} \
> output.txt
Post-Processing
For post-processing, most users use Python via NERSC’s Jupyter service (Docs).
Please follow the same process as for NERSC Cori post-processing. Important: The environment + Jupyter kernel must separate from the one you create for Cori.
The Perlmutter $PSCRATCH
filesystem is only available on Perlmutter Jupyter nodes.
Likewise, Cori’s $SCRATCH
filesystem is only available on Cori Jupyter nodes.
You can use the Community FileSystem (CFS) from everywhere.