Cori (NERSC)

The Cori cluster is located at NERSC.

Introduction

If you are new to this system, please see the following resources:

GPU nodes
Cori user guide
Batch system: Slurm
Jupyter service
Production directories:
- $SCRATCH: per-user production directory, purged every 30 days (20TB)
- /global/cscratch1/sd/m3239: shared production directory for users in the project m3239, purged every 30 days (50TB)
- /global/cfs/cdirs/m3239/: community file system for users in the project m3239 (100TB)

Installation

Use the following commands to download the WarpX source code and switch to the correct branch:

git clone https://github.com/ECP-WarpX/WarpX.git $HOME/src/warpx

KNL

We use the following modules and environments on the system ($HOME/knl_warpx.profile).

Listing 1 You can copy this file from Tools/machines/cori-nersc/knl_warpx.profile.example.

module swap craype-haswell craype-mic-knl
module swap PrgEnv-intel PrgEnv-gnu
module load cmake/3.22.1
module switch cray-libsci cray-libsci/20.09.1
module load cray-hdf5-parallel/1.10.5.2
module load cray-fftw/3.3.8.10
module load cray-python/3.9.7.1

export PKG_CONFIG_PATH=$FFTW_DIR/pkgconfig:$PKG_CONFIG_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/knl/c-blosc-1.12.1-install:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/knl/adios2-2.7.1-install:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/knl/blaspp-master-install:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/knl/lapackpp-master-install:$CMAKE_PREFIX_PATH

if [ -d "$HOME/sw/knl/venvs/knl_warpx" ]
then
  source $HOME/sw/knl/venvs/knl_warpx/bin/activate
fi

export CXXFLAGS="-march=knl"
export CFLAGS="-march=knl"

And install ADIOS2, BLAS++ and LAPACK++:

source $HOME/knl_warpx.profile

# c-blosc (I/O compression)
git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git src/c-blosc
rm -rf src/c-blosc-knl-build
cmake -S src/c-blosc -B src/c-blosc-knl-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/knl/c-blosc-1.12.1-install
cmake --build src/c-blosc-knl-build --target install --parallel 16

# ADIOS2
git clone -b v2.7.1 https://github.com/ornladios/ADIOS2.git src/adios2
rm -rf src/adios2-knl-build
cmake -S src/adios2 -B src/adios2-knl-build -DADIOS2_USE_Blosc=ON -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/knl/adios2-2.7.1-install
cmake --build src/adios2-knl-build --target install --parallel 16

# BLAS++ (for PSATD+RZ)
git clone https://github.com/icl-utk-edu/blaspp.git src/blaspp
rm -rf src/blaspp-knl-build
cmake -S src/blaspp -B src/blaspp-knl-build -Duse_openmp=ON -Duse_cmake_find_blas=ON -DBLAS_LIBRARIES=${CRAY_LIBSCI_PREFIX_DIR}/lib/libsci_gnu.a -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=$HOME/sw/knl/blaspp-master-install
cmake --build src/blaspp-knl-build --target install --parallel 16

# LAPACK++ (for PSATD+RZ)
git clone https://github.com/icl-utk-edu/lapackpp.git src/lapackpp
rm -rf src/lapackpp-knl-build
CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S src/lapackpp -B src/lapackpp-knl-build -Duse_cmake_find_lapack=ON -DBLAS_LIBRARIES=${CRAY_LIBSCI_PREFIX_DIR}/lib/libsci_gnu.a -DLAPACK_LIBRARIES=${CRAY_LIBSCI_PREFIX_DIR}/lib/libsci_gnu.a -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=$HOME/sw/knl/lapackpp-master-install
cmake --build src/lapackpp-knl-build --target install --parallel 16

For PICMI and Python workflows, also install a virtual environment:

# establish Python dependencies
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv

python3 -m venv $HOME/sw/knl/venvs/knl_warpx
source $HOME/sw/knl/venvs/knl_warpx/bin/activate

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade cython
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
python3 -m pip install --upgrade scipy
MPICC="cc -shared" python3 -m pip install -U --no-cache-dir -v mpi4py
python3 -m pip install --upgrade openpmd-api
python3 -m pip install --upgrade matplotlib
python3 -m pip install --upgrade yt
# optional: for libEnsemble
#python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt

Haswell

We use the following modules and environments on the system ($HOME/haswell_warpx.profile).

Listing 2 You can copy this file from Tools/machines/cori-nersc/haswell_warpx.profile.example.

module swap PrgEnv-intel PrgEnv-gnu
module load cmake/3.22.1
module switch cray-libsci cray-libsci/20.09.1
module load cray-hdf5-parallel/1.10.5.2
module load cray-fftw/3.3.8.10
module load cray-python/3.9.7.1

export PKG_CONFIG_PATH=$FFTW_DIR/pkgconfig:$PKG_CONFIG_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/haswell/c-blosc-1.12.1-install:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/haswell/adios2-2.7.1-install:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/haswell/blaspp-master-install:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/haswell/lapackpp-master-install:$CMAKE_PREFIX_PATH

if [ -d "$HOME/sw/haswell/venvs/haswell_warpx" ]
then
  source $HOME/sw/haswell/venvs/haswell_warpx/bin/activate
fi

And install ADIOS2, BLAS++ and LAPACK++:

source $HOME/haswell_warpx.profile

# c-blosc (I/O compression)
git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git src/c-blosc
rm -rf src/c-blosc-haswell-build
cmake -S src/c-blosc -B src/c-blosc-haswell-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/haswell/c-blosc-1.12.1-install
cmake --build src/c-blosc-haswell-build --target install --parallel 16

# ADIOS2
git clone -b v2.7.1 https://github.com/ornladios/ADIOS2.git src/adios2
rm -rf src/adios2-haswell-build
cmake -S src/adios2 -B src/adios2-haswell-build -DADIOS2_USE_Blosc=ON -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/haswell/adios2-2.7.1-install
cmake --build src/adios2-haswell-build --target install --parallel 16

# BLAS++ (for PSATD+RZ)
git clone https://github.com/icl-utk-edu/blaspp.git src/blaspp
rm -rf src/blaspp-haswell-build
cmake -S src/blaspp -B src/blaspp-haswell-build -Duse_openmp=ON -Duse_cmake_find_blas=ON -DBLAS_LIBRARIES=${CRAY_LIBSCI_PREFIX_DIR}/lib/libsci_gnu.a -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=$HOME/sw/blaspp-master-haswell-install
cmake --build src/blaspp-haswell-build --target install --parallel 16

# LAPACK++ (for PSATD+RZ)
git clone https://github.com/icl-utk-edu/lapackpp.git src/lapackpp
rm -rf src/lapackpp-haswell-build
CXXFLAGS="-DLAPACK_FORTRAN_ADD_" cmake -S src/lapackpp -B src/lapackpp-haswell-build -Duse_cmake_find_lapack=ON -DBLAS_LIBRARIES=${CRAY_LIBSCI_PREFIX_DIR}/lib/libsci_gnu.a -DLAPACK_LIBRARIES=${CRAY_LIBSCI_PREFIX_DIR}/lib/libsci_gnu.a -DCMAKE_CXX_STANDARD=17 -DCMAKE_INSTALL_PREFIX=$HOME/sw/haswell/lapackpp-master-install
cmake --build src/lapackpp-haswell-build --target install --parallel 16

For PICMI and Python workflows, also install a virtual environment:

# establish Python dependencies
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv

python3 -m venv $HOME/sw/haswell/venvs/haswell_warpx
source $HOME/sw/haswell/venvs/haswell_warpx/bin/activate

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade cython
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
python3 -m pip install --upgrade scipy
MPICC="cc -shared" python3 -m pip install -U --no-cache-dir -v mpi4py
python3 -m pip install --upgrade openpmd-api
python3 -m pip install --upgrade matplotlib
python3 -m pip install --upgrade yt
# optional: for libEnsemble
#python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt

GPU (V100)

Cori provides a partition with 18 nodes that include V100 (16 GB) GPUs. We use the following modules and environments on the system ($HOME/gpu_warpx.profile). You can copy this file from Tools/machines/cori-nersc/gpu_warpx.profile.example:

Listing 3 You can copy this file from Tools/machines/cori-nersc/gpu_warpx.profile.example.

export proj="m1759"

module purge
module load modules
module load cgpu
module load esslurm
module load gcc/8.3.0 cuda/11.4.0 cmake/3.22.1
module load openmpi

export CMAKE_PREFIX_PATH=$HOME/sw/cori_gpu/c-blosc-1.12.1-install:$CMAKE_PREFIX_PATH
export CMAKE_PREFIX_PATH=$HOME/sw/cori_gpu/adios2-2.7.1-install:$CMAKE_PREFIX_PATH

if [ -d "$HOME/sw/cori_gpu/venvs/cori_gpu_warpx" ]
then
  source $HOME/sw/cori_gpu/venvs/cori_gpu_warpx/bin/activate
fi

# compiler environment hints
export CC=$(which gcc)
export CXX=$(which g++)
export FC=$(which gfortran)
export CUDACXX=$(which nvcc)
export CUDAHOSTCXX=$(which g++)

# optimize CUDA compilation for V100
export AMREX_CUDA_ARCH=7.0

# allocate a GPU, e.g. to compile on
#   10 logical cores (5 physical), 1 GPU
function getNode() {
    salloc -C gpu -N 1 -t 30 -c 10 --gres=gpu:1 -A $proj
}

And install ADIOS2:

source $HOME/gpu_warpx.profile

# c-blosc (I/O compression)
git clone -b v1.21.1 https://github.com/Blosc/c-blosc.git src/c-blosc
rm -rf src/c-blosc-gpu-build
cmake -S src/c-blosc -B src/c-blosc-gpu-build -DBUILD_TESTS=OFF -DBUILD_BENCHMARKS=OFF -DDEACTIVATE_AVX2=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/cori_gpu/c-blosc-1.12.1-install
cmake --build src/c-blosc-gpu-build --target install --parallel 16

git clone -b v2.7.1 https://github.com/ornladios/ADIOS2.git src/adios2
rm -rf src/adios2-gpu-build
cmake -S src/adios2 -B src/adios2-gpu-build -DADIOS2_USE_Blosc=ON -DADIOS2_USE_Fortran=OFF -DADIOS2_USE_Python=OFF -DADIOS2_USE_ZeroMQ=OFF -DCMAKE_INSTALL_PREFIX=$HOME/sw/cori_gpu/adios2-2.7.1-install
cmake --build src/adios2-gpu-build --target install --parallel 16

For PICMI and Python workflows, also install a virtual environment:

# establish Python dependencies
python3 -m pip install --user --upgrade pip
python3 -m pip install --user virtualenv

python3 -m venv $HOME/sw/cori_gpu/venvs/gpu_warpx
source $HOME/sw/cori_gpu/venvs/gpu_warpx/bin/activate

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade wheel
python3 -m pip install --upgrade cython
python3 -m pip install --upgrade numpy
python3 -m pip install --upgrade pandas
python3 -m pip install --upgrade scipy
python3 -m pip install -U --no-cache-dir -v mpi4py
python3 -m pip install --upgrade openpmd-api
python3 -m pip install --upgrade matplotlib
python3 -m pip install --upgrade yt
# optional: for libEnsemble
#python3 -m pip install -r $HOME/src/warpx/Tools/LibEnsemble/requirements.txt

Building WarpX

We recommend to store the above lines in individual warpx.profile files, as suggested above. If you want to run on either of the three partitions of Cori, open a new terminal, log into Cori and source the environment you want to work with:

# KNL:
source $HOME/knl_warpx.profile

# Haswell:
#source $HOME/haswell_warpx.profile

# GPU:
#source $HOME/gpu_warpx.profile

Warning

Consider that all three Cori partitions are incompatible.

Do not source multiple ...warpx.profile files in the same terminal session. Open a new terminal and log into Cori again, if you want to switch the targeted Cori partition.

If you re-submit an already compiled simulation that you ran on another day or in another session, make sure to source the corresponding ...warpx.profile again after login!

Then, cd into the directory $HOME/src/warpx and use the following commands to compile:

cd $HOME/src/warpx
rm -rf build

#                       append if you target GPUs:    -DWarpX_COMPUTE=CUDA
cmake -S . -B build -DWarpX_DIMS=3
cmake --build build -j 16

The general cmake compile-time options apply as usual.

That’s it! A 3D WarpX executable is now in build/bin/ and can be run with a 3D example inputs file. Most people execute the binary directly or copy it out to a location in $SCRATCH.

The general cmake compile-time options and instructions for Python (PICMI) bindings apply as usual:

# PICMI build
cd $HOME/src/warpx

# install or update dependencies
python3 -m pip install -r requirements.txt

# compile parallel PICMI interfaces with openPMD support and 3D, 2D and RZ
WARPX_MPI=ON BUILD_PARALLEL=16 python3 -m pip install --force-reinstall --no-deps -v .

Testing

To run all tests (here on KNL), do:

change in Regressions/WarpX-tests.ini from mpiexec to srun: MPIcommand = srun -n @nprocs@ @command@

# set test directory to a shared directory available on all nodes
#   note: the tests will create the directory automatically
export WARPX_CI_TMP="$HOME/warpx-regression-tests"

# compile with more cores
export WARPX_CI_NUM_MAKE_JOBS=16

# run all integration tests
#   note: we set MPICC as a build-setting for mpi4py on KNL/Haswell
MPICC="cc -shared" ./run_test.sh

Running

Navigate (i.e. cd) into one of the production directories (e.g. $SCRATCH) before executing the instructions below.

KNL

The batch script below can be used to run a WarpX simulation on 2 KNL nodes on the supercomputer Cori at NERSC. Replace descriptions between chevrons <> by relevant values, for instance <job name> could be laserWakefield.

Do not forget to first source $HOME/knl_warpx.profile if you have not done so already for this terminal session.

For PICMI Python runs, the <path/to/executable> has to read python3 and the <input file> is the path to your PICMI input script.

Listing 4 You can copy this file from Tools/machines/cori-nersc/cori_knl.sbatch.

#!/bin/bash -l

# Copyright 2019 Maxence Thevenet
#
# This file is part of WarpX.
#
# License: BSD-3-Clause-LBNL


#SBATCH -N 2
#SBATCH -t 01:00:00
#SBATCH -q regular
#SBATCH -C knl
#SBATCH -S 4
#SBATCH -J <job name>
#SBATCH -A <allocation ID>
#SBATCH -e WarpX.e%j
#SBATCH -o WarpX.o%j

export OMP_PLACES=threads
export OMP_PROC_BIND=spread

# KNLs have 4 hyperthreads max
export CORI_MAX_HYPETHREAD_LEVEL=4
# We use 64 cores out of the 68 available on Cori KNL,
# and leave 4 to the system (see "#SBATCH -S 4" above).
export CORI_NCORES_PER_NODE=64

# Typically use 8 MPI ranks per node without hyperthreading,
# i.e., OMP_NUM_THREADS=8
export WARPX_NMPI_PER_NODE=8
export WARPX_HYPERTHREAD_LEVEL=1

# Compute OMP_NUM_THREADS and the thread count (-c option)
export CORI_NHYPERTHREADS_MAX=$(( ${CORI_MAX_HYPETHREAD_LEVEL} * ${CORI_NCORES_PER_NODE} ))
export WARPX_NTHREADS_PER_NODE=$(( ${WARPX_HYPERTHREAD_LEVEL} * ${CORI_NCORES_PER_NODE} ))
export OMP_NUM_THREADS=$(( ${WARPX_NTHREADS_PER_NODE} / ${WARPX_NMPI_PER_NODE} ))
export WARPX_THREAD_COUNT=$(( ${CORI_NHYPERTHREADS_MAX} / ${WARPX_NMPI_PER_NODE} ))

# for async_io support: (optional)
export MPICH_MAX_THREAD_SAFETY=multiple

srun --cpu_bind=cores -n $(( ${SLURM_JOB_NUM_NODES} * ${WARPX_NMPI_PER_NODE} )) -c ${WARPX_THREAD_COUNT} \
  <path/to/executable> <input file> \
  > output.txt

To run a simulation, copy the lines above to a file cori_knl.sbatch and run

sbatch cori_knl.sbatch

to submit the job.

For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell solver on Cori KNL for a well load-balanced problem (in our case laser wakefield acceleration simulation in a boosted frame in the quasi-linear regime), the following set of parameters provided good performance:

amr.max_grid_size=64 and amr.blocking_factor=64 so that the size of each grid is fixed to 64**3 (we are not using load-balancing here).
8 MPI ranks per KNL node, with OMP_NUM_THREADS=8 (that is 64 threads per KNL node, i.e. 1 thread per physical core, and 4 cores left to the system).
2 grids per MPI, i.e., 16 grids per KNL node.

Haswell

The batch script below can be used to run a WarpX simulation on 1 Haswell node on the supercomputer Cori at NERSC.

Do not forget to first source $HOME/haswell_warpx.profile if you have not done so already for this terminal session.

Listing 5 You can copy this file from Tools/machines/cori-nersc/cori_haswell.sbatch.

#!/bin/bash -l

# Just increase this number of you need more nodes.
#SBATCH -N 1
#SBATCH -t 03:00:00
#SBATCH -q regular
#SBATCH -C haswell
#SBATCH -J <job name>
#SBATCH -A <allocation ID>
#SBATCH -e error.txt
#SBATCH -o output.txt
# one MPI rank per half-socket (see below)
#SBATCH --tasks-per-node=4
# request all logical (virtual) cores per half-socket
#SBATCH --cpus-per-task=16


# each Cori Haswell node has 2 sockets of Intel Xeon E5-2698 v3
# each Xeon CPU is divided into 2 bus rings that each have direct L3 access
export WARPX_NMPI_PER_NODE=4

# each MPI rank per half-socket has 8 physical cores
#   or 16 logical (virtual) cores
# over-subscribing each physical core with 2x
#   hyperthreading leads to a slight (3.5%) speedup
# the settings below make sure threads are close to the
#   controlling MPI rank (process) per half socket and
#   distribute equally over close-by physical cores and,
#   for N>8, also equally over close-by logical cores
export OMP_PROC_BIND=spread
export OMP_PLACES=threads
export OMP_NUM_THREADS=16

# for async_io support: (optional)
export MPICH_MAX_THREAD_SAFETY=multiple

EXE="<path/to/executable>"

srun --cpu_bind=cores -n $(( ${SLURM_JOB_NUM_NODES} * ${WARPX_NMPI_PER_NODE} )) \
  ${EXE} <input file> \
  > output.txt

To run a simulation, copy the lines above to a file cori_haswell.sbatch and run

sbatch cori_haswell.sbatch

to submit the job.

For a 3D simulation with a few (1-4) particles per cell using FDTD Maxwell solver on Cori Haswell for a well load-balanced problem (in our case laser wakefield acceleration simulation in a boosted frame in the quasi-linear regime), the following set of parameters provided good performance:

4 MPI ranks per Haswell node (2 MPI ranks per Intel Xeon E5-2698 v3), with OMP_NUM_THREADS=16 (which uses 2x hyperthreading)

GPU (V100)

Do not forget to first source $HOME/gpu_warpx.profile if you have not done so already for this terminal session.

Due to the limited amount of GPU development nodes, just request a single node with the above defined getNode function. For single-node runs, try to run one grid per GPU.

A multi-node batch script template can be found below:

Listing 6 You can copy this file from Tools/machines/cori-nersc/cori_gpu.sbatch.

#!/bin/bash -l

# Copyright 2021 Axel Huebl
# This file is part of WarpX.
# License: BSD-3-Clause-LBNL
#
# Ref:
# - https://docs-dev.nersc.gov/cgpu/hardware/
# - https://docs-dev.nersc.gov/cgpu/access/
# - https://docs-dev.nersc.gov/cgpu/usage/#controlling-task-and-gpu-binding

# Just increase this number of you need more nodes.
#SBATCH -N 2
#SBATCH -t 03:00:00
#SBATCH -J <job name>
#SBATCH -A m1759
#SBATCH -q regular
#SBATCH -C gpu
# 8 V100 GPUs (16 GB) per node
#SBATCH --gres=gpu:8
#SBATCH --exclusive
# one MPI rank per GPU (a quarter-socket)
#SBATCH --tasks-per-node=8
# request all logical (virtual) cores per quarter-socket
#SBATCH --cpus-per-task=10
#SBATCH -e WarpX.e%j
#SBATCH -o WarpX.o%j


# each Cori GPU node has 2 sockets of Intel Xeon Gold 6148 ('Skylake') @ 2.40 GHz
export WARPX_NMPI_PER_NODE=8

# each MPI rank per half-socket has 10 physical cores
#   or 20 logical (virtual) cores
# we split half-sockets again by 2 to have one MPI rank per GPU
# over-subscribing each physical core with 2x
#   hyperthreading leads to often to slight speedup on Intel
# the settings below make sure threads are close to the
#   controlling MPI rank (process) per half socket and
#   distribute equally over close-by physical cores and,
#   for N>20, also equally over close-by logical cores
export OMP_PROC_BIND=spread
export OMP_PLACES=threads
export OMP_NUM_THREADS=10

# for async_io support: (optional)
export MPICH_MAX_THREAD_SAFETY=multiple

EXE="<path/to/executable>"

srun --cpu_bind=cores --gpus-per-task=1 --gpu-bind=map_gpu:0,1,2,3,4,5,6,7 \
  -n $(( ${SLURM_JOB_NUM_NODES} * ${WARPX_NMPI_PER_NODE} )) \
  ${EXE} <input file> \
  > output.txt

Post-Processing

For post-processing, most users use Python via NERSC’s Jupyter service (Docs).

As a one-time preparatory setup, create your own Conda environment as described in NERSC docs. In this manual, we often use this conda create line over the officially documented one:

conda create -n myenv -c conda-forge python mamba ipykernel ipympl==0.8.6 matplotlib numpy pandas yt openpmd-viewer openpmd-api h5py fast-histogram dask dask-jobqueue pyarrow

We then follow the Customizing Kernels with a Helper Shell Script section to finalize the setup of using this conda-environment as a custom Jupyter kernel.

kernel_helper.sh should read:

#!/bin/bash
module load python
source activate myenv
exec "$@"

When opening a Jupyter notebook, just select the name you picked for your custom kernel on the top right of the notebook.

Additional software can be installed later on, e.g., in a Jupyter cell using !mamba install -c conda-forge .... Software that is not available via conda can be installed via !python -m pip install ....

Warning

Jan 6th, 2022 (NERSC-INC0179165 and ipympl #416): Above, we fixated the ipympl version to not take the latest release of Matplotlib Jupyter Widgets. This is an intentional work-around; the ipympl version needs to exactly fit the version pre-installed on the Jupyter base system.