Deploying a new spack environment
Where do we update polaris dependencies?
Mache
If system modules change in E3SM, we try to stay in sync:
compilers
MPI libraries
netcdf-C
netcdf-fortran
pnetcdf
mkl (or other linear algebra libs)
When we update the mache
version in Polaris, we also need to bump the
Polaris version (typically either the major or the minor version) and then
re-deploy shared spack environments on each supported machine.
Spack
Spack is used to build libraries used by E3SM components and tools that need system MPI:
ESMF
MOAB
SCORPIO
Metis
Parmetis
Trilinos
Albany
PETSc
Netlib LAPACK
We build one spack environment for tools (e.g. ESMF and MOAB) and another for
libraries. This allows us to build the tools with one set of compilers and
MPI libraries and the libraries with another. This is sometimes necessary,
since ESMF, MOAB and/or their dependencies can’t always be built or don’t
run correctly with all compiler and MPI combinations. For example, we have
experienced problems running ESMF built with intel compilers on Perlmutter.
We are also not able to build ESMF or the Eigen dependency of MOAB using
nvidiagpu
compilers.
When we update the versions of any of these packages in Polaris, we also need to bump the Polaris version (typically either the major or the minor version) and then re-deploy shared spack environments on each supported machine.
Conda
Conda (via conda-forge) is used for python packages and related dependencies that don’t need system MPI. Conda environments aren’t shared between developers because the polaris python package you’re developing is part of the conda environment.
When we update the constraints on conda dependencies, we also need to bump the Polaris alpha, beta or rc version. We do not need to re-deploy spack environments on share machines because they remain unaffected.
Mache
A brief tour of mache. For additional information about mache, check out the mache documentation.
Identifying E3SM machines
Polaris and other packages use mache to identify what machine they’re on (when you’re on a login node). This is used when configuring polaris and creating a conda environment. Because of this, there is a “bootstrapping” process where a conda environment is created with mache installed in it, then the machine is identified and the rest of the polaris setup can continue.
Config options describing E3SM machines
Mache has config files for each E3SM machine that tell us where E3SM-Unified is, where to find diagnostics, and much more, see machines. These config options are shared across packages including:
MPAS-Analysis
E3SM_Diags
zppy
polaris
compass
E3SM-Unified
Polaris uses these config options to know how to make a job script, where to find locally cached files in its “databases” and much more.
Modules, env. variables, etc. for E3SM machines
Mache keeps its own copy of the E3SM file
config_machines.xml
in the package here.
We try to keep a close eye on E3SM master and update mache when system modules
for machines that mache knows about get updated. When this happens, we update
mache’s copy of config_machines.xml
and that tells me which modules to
update in spack, see below.
Creating spack environments
Mache has templates for making spack environments on some of the E3SM supported machines. See spack. It also has functions for building the spack environments with these templates using E3SM’s fork of spack (see below).
Updating spack from polaris with mache from a remote branch
If you haven’t cloned polaris and added Xylar’s fork, here’s the process:
mkdir polaris
cd polaris/
git clone git@github.com:E3SM-Project/polaris.git main
cd main/
git remote add xylar/polaris git@github.com:xylar/polaris.git
Now, we need to set up polaris and build spack packages making use of the updated mache. This involves changing the mache version in a couple of places in polaris and updating the version of polaris itself to a new alpha, beta or rc. As an example, we will use the branch simplify_local_mache.
Often, we will need to test with a mache
branch that has changes needed
by polaris. Here, we will use <fork>
as a stand-in for the fork of mache
to use (e.g. E3SM-Project/mache
) and <branch>
as the stand-in for a branch on
that fork (e.g. main
).
We also need to make sure there is a spack branch for the version of Polaris.
The spack branch is a branch off of the develop branch on
E3SM’s spack repo that has any
updates to packages required for this version of mache. The remote branch
is named after the release version of mache (omitting any alpha, beta or rc
suffix because it is intended to be the spack branch we will use once the
mache
release happens). In this example, we will work with the branch
spack_for_mache_1.12.0.
The local clone is instead named after the Polaris version (again any omitting
alpha, beta or rc) plus the compiler and MPI library because we have discovered
two users cannot make modifications to the same git clone. Giving each clone
of the spack branch a unique name ensures that they are independent.
Here’s how to get a branch of polaris we’re testing (simplify_local_mache
in this case) as a local worktree:
# get my branch
git fetch --all -p
# make a worktree for checking out my branch
git worktree add ../simplify_local_mache -b simplify_local_mache \
--checkout xylar/polaris/simplify_local_mache
cd ../simplify_local_mache/
You will also need a local installation of Miniforge3. Polaris can do this for you if you haven’t already installed it. If you want to download it manually, use the Linux x86_64 version for all our supported machines.
Note
We have found that an existing Miniconda3 installation does not always work well for polaris, so please start with Miniforge3 instead.
Note
You definitely need your own local Miniforge3 installation – you can’t use a system version or a shared one like E3SM-Unified.
Define a location where Miniforge3 is installed or where you want to install it:
# change to your conda installation
export CONDA_BASE=${HOME}/miniforge3
Okay, we’re finally ready to do a test spack build for polaris.
To do this, we call the configure_polaris_env.py
script using
--mache_fork
, --mache_branch
, --update_spack
, --spack
and
--tmpdir
. Here is an example appropriate for Anvil or Chrysalis:
export TMPDIR=/lcrc/group/e3sm/${USER}/spack_temp
./configure_polaris_envs.py \
--conda ${CONDA_BASE} \
--mache_fork <fork> \
--mache_branch <branch> \
--update_spack \
--spack /lcrc/group/e3sm/${USER}/spack_test \
--tmpdir ${TMPDIR} \
--compiler intel intel gnu \
--mpi openmpi impi openmpi \
--recreate \
--verbose
The directory you point to with --conda
either doesn’t exist or contains
your existing installation of Miniforge3.
When you supply --mache_fork
and --mache_branch
, polaris will clone
a fork of the mache
repo and check out the requested branch, then install
that version of mache into both the polaris installation conda environment and
the final polaris environment.
mache
gets installed twice because the deployment tools need mache
to
even know how to install polaris and build the spack environment on supported
machines. The “prebootstrap” step in deployment is creating the installation
conda environment. The “bootstrap” step is creating the conda environment that
polaris will actually use and (in this case with --update_spack
) building
spack packages, then creating the “load” or “activation” script that you will
need to build MPAS components and run polaris.
For testing, you want to point to a different location for installing spack
using --spack
.
On many machines, the /tmp
directory is not a safe place to build spack
packages. Use --tmpdir
to point to another place, e.g., your scratch
space.
The --recreate
flag may not be strictly necessary but it’s a good idea.
This will make sure both the bootstrapping conda environment (the one that
installs mache to identify the machine) and the polaris conda environment are
created fresh.
The --compiler
flag is a list of one or more compilers to build for and the
--mpi
flag is the corresponding list of MPI libraries. To see what is
supported on each machine, take a look at Supported Machines.
Be aware that not all compilers and MPI libraries support Albany and PETSc, as discussed below.
Testing spack with PETSc (and Netlib LAPACK)
If you want to build PETSc (and Netlib LAPACK), use the --with_petsc
flag.
Currently, this only works with some
compilers, but that may be more that I was trying to limit the amount of work
for the polaris support team. There is a file,
petsc_supported.txt,
that lists supported compilers and MPI libraries on each machine.
Here is an example:
export TMPDIR=/lcrc/group/e3sm/${USER}/spack_temp
./configure_polaris_envs.py \
--conda ${CONDA_BASE} \
--mache_fork <fork> \
--mache_branch <branch> \
--update_spack \
--spack /lcrc/group/e3sm/${USER}/spack_test \
--tmpdir ${TMPDIR} \
--compiler intel gnu \
--mpi openmpi \
--with_petsc \
--recreate \
--verbose
Testing spack with Albany
If you also want to build Albany, use the --with_albany
flag. Currently,
this only works with Gnu compilers. There is a file,
albany_support.txt,
that lists supported compilers and MPI libraries on each machine.
Here is an example:
export TMPDIR=/lcrc/group/e3sm/${USER}/spack_temp
./configure_polaris_envs.py \
--conda ${CONDA_BASE} \
--mache_fork <fork> \
--mache_branch <branch> \
--update_spack \
--spack /lcrc/group/e3sm/${USER}/spack_test \
--tmpdir ${TMPDIR} \
--compiler gnu \
--mpi openmpi \
--with_albany \
--recreate \
--verbose
Troubleshooting spack
If you encounter an error like:
==> spack env activate dev_polaris_0_2_0_gnu_mpich
==> Error: Package 'armpl' not found.
You may need to run 'spack clean -m'.
during the attempt to build spack, you will first need to find the path to
setup-env.sh
(see polaris/build_*/build*.sh
) and source that script to
get the spack
command, e.g.:
source ${PSCRATCH}/spack_test/dev_polaris_0_2_0_gnu_mpich/share/spack/setup-env.sh
Then run the suggested command:
spack clean -m
After that, re-running ./configure_polaris_envs.py
should work correctly.
This issue seems to be related to switching between spack v0.18 and v0.19 (used by different versions of polaris).
Testing polaris
Testing MPAS-Ocean without PETSc
Please use the E3SM-Project submodule in polaris for testing, rather than E3SM’s master branch. The submodule is the version we know works with polaris and serves as kind of a baseline for other testing.
# source whichever load script is appropriate
source load_dev_polaris_0.2.0_chrysalis_intel_openmpi.sh
git submodule update --init --recursive
cd E3SM-Project/components/mpas-ocean
# this will build with PIO and OpenMP
make ifort
polaris suite -c ocean -t pr -p . \
-w /lcrc/group/e3sm/ac.xylar/polaris/test_20230202/ocean_pr_chrys_intel_openmpi
cd /lcrc/group/e3sm/ac.xylar/polaris/test_20230202/ocean_pr_chrys_intel_openmpi
sbatch job_script.pr.bash
You can make other worktrees of E3SM-Project for testing other compilers if that’s helpful. It also might be good to open a fresh terminal to source a new load script. This isn’t required but you’ll get some warnings.
source load_dev_polaris_0.2.0_chrysalis_gnu_openmpi.sh
cd E3SM-Project
git worktree add ../e3sm_chrys_gnu_openmpi
cd ../e3sm_chrys_gnu_openmpi
git submodule update --init --recursive
cd components/mpas-ocean
make gfortran
polaris suite -c ocean -t pr -p . \
-w /lcrc/group/e3sm/ac.xylar/polaris/test_20230202/ocean_pr_chrys_gnu_openmpi
cd /lcrc/group/e3sm/ac.xylar/polaris/test_20230202/ocean_pr_chrys_gnu_openmpi
sbatch job_script.pr.bash
You can also explore the utility in utils/matrix to test on several compilers automatically.
Testing MALI with Albany
Please use the MALI-Dev submodule in polaris for testing, rather than MALI-Dev develop branch. The submodule is the version we know works with polaris and serves as kind of a baseline for other testing.
# source whichever load script is appropriate
source load_dev_polaris_0.2.0_chrysalis_gnu_openmpi_albany.sh
git submodule update --init --recursive
cd MALI-Dev/components/mpas-albany-landice
# you need to tell it to build with Albany
make ALBANY=true gfortran
polaris suite -c landice -t full_integration -p . \
-w /lcrc/group/e3sm/ac.xylar/polaris/test_20230202/landice_full_chrys_gnu_openmpi
cd /lcrc/group/e3sm/ac.xylar/polaris/test_20230202/landice_full_chrys_gnu_openmpi
sbatch job_script.full_integration.bash
Testing MPAS-Ocean with PETSc
The tests for PETSc use nonhydrostatic capabilities not yet integrated into E3SM. So you can’t use the E3SM-Project submodule. You need to use Sara Calandrini’s nonhydro branch.
# source whichever load script is appropriate
source load_dev_polaris_0.2.0_chrysalis_intel_openmpi_petsc.sh
git submodule update --init
cd E3SM-Project
git remote add scalandr/E3SM git@github.com:scalandr/E3SM.git
git worktree add ../nonhydro_chrys_intel_openmpi -b nonhydro_chrys_intel_openmpi \
--checkout scalandr/E3SM/ocean/nonhydro
cd ../nonhydro_chrys_intel_openmpi
git submodule update --init --recursive
cd components/mpas-ocean
# this will build with PIO, Netlib LAPACK and PETSc
make ifort
polaris list | grep nonhydro
# update these numbers for the 2 nonhydro tasks
polaris setup -n 245 246 -p . \
-w /lcrc/group/e3sm/ac.xylar/polaris/test_20230202/nonhydro_chrys_intel_openmpi
cd /lcrc/group/e3sm/ac.xylar/polaris/test_20230202/nonhydro_chrys_intel_openmpi
sbatch job_script.custom.bash
As with non-PETSc MPAS-Ocean and MALI, you can have different worktrees with Sara’s nonhydro branch for building with different compilers or use utils/matrix to build (and run).