Deploying on HPCs
Once a release candidate of E3SM-Unified is ready, it must be deployed and
tested on HPC systems using a combination of Spack and Conda-based tools.
Deployment scripts and configurations live within the e3sm_supported_machines
directory of the E3SM-Unified repo.
This document explains the deployment workflow, what needs to be updated, and how to test and validate the install.
Deployment Components
Deployment happens via the following components:
🔧 deploy_e3sm_unified.py
The main entry point for deploying E3SM-Unified
Installs the combined Conda + Spack environment on supported systems
Reads deployment config from
default.cfg
and shared logic inshared.py
You can find the full list of command-line flags with:
./deploy_e3sm_unified.py --help
You must supply --conda
at a minimum. This is the path to a conda
installation (typically in your home directory) where the deployment tool
can create a conda environment (temp_e3sm_unified_install
) used to install
E3SM-Unified. This environment includes the mache
package, which can
automatically recognize the machine you are on and configure accordingly.
For release builds (but not release candidates), you should supply
--release
. If this flag is not supplied, the activation scripts
created during deployment will start with test_e3sm_unified_...
whereas
the release versions will be called load_latest_e3sm_unified_...
and
load_e3sm_unified_<version>...
.
Other flags are optional and will be discussed below.
📁 default.cfg
Specifies which packages and versions to install via Spack as well as the versions of some conda packages required in the installation environment (notably
mache
)Version numbers here should match
meta.yaml
unless diverging for a reasonA special case is
esmpy = None
, required so ESMPy comes from conda-forge, not Spack.
🧰 bootstrap.py
Used by
deploy_e3sm_unified.py
to build and configure environments once thetemp_e3sm_unified_install
conda environment has been created.
🧪 Templates
The e3sm_supported_machines/templates
subdirectory contains jinja2 templates
used during deployment.
Build script template:
build.template
: Used during deployment to build and install versions of the following packages using system compilers and MPI (if requested):mpi4py ilamb esmpy xesmf
Maintainers may need to add new packages to the template over time. Typically, the dependencies here are python-based but use system compilers and/or MPI. Spack must not install Python itself, as this would conflict with the Conda-managed Python environment. All Python packages need to be installed into the Conda environment.
Activation script templates:
load_e3sm_unified.sh.template
load_e3sm_unified.csh.template
Since E3SM itself cannot be built when E3SM-Unified is active, these scripts set:
CIME_MODEL="ENVIRONMENT_RUNNING_E3SM_UNIFIED_USE_ANOTHER_TERMINAL"
This is supposed to tell users that they cannot build E3SM with this terminal window (because E3SM-Unified is loaded) and they should open a new one. Some users have not found this very intuitive but we don’t currently have a better way for E3SM to detect that E3SM-Unified is active.
These scripts also detect whether the user is on a compute or login node via
$SLURM_JOB_ID
or$COBALT_JOBID
environment variables (which should only be set on compute nodes).Maintainers will need to edit these scripts to support new queuing systems (e.g. PBS).
Typical Deployment Steps
Update config files:
Set the target version in
shared.py
Update
default.cfg
with package versions (Spack + Conda)Update
mache
config files (see Updatingmache
)
Test the build on one or more HPC machines:
cd e3sm_supported_machines ./deploy_e3sm_unified.py --conda ~/miniforge3
Note: This can take a lot of time. If the connection to the HPC machine is not stable, you should use
screen
or similar to preserve your connection and you should pipe the output to a log file, e.g.:./deploy_e3sm_unified.py --conda ~/miniforge3 | tee deploy.log
Note: It is not recommended that you try to deploy E3SM-Unified simultaneously on two different machines that share the same base conda environment (e.g. Anvil and Chrysalis). The two deployments will step on each other’s toes.
Check terminal output and validate that:
Spack built the expected packages
Conda environment was created and activated
Activation scripts were generated and symlinked correctly
Permissions have been updated successfully (read only for everyone except the E3SM-Unified maintainer)
Manually test tools in the installed environment
Load via:
source test_e3sm_unified_<version>_<machine>.sh
Run tools like
zppy
,e3sm_diags
,mpas_analysis
Deploy more broadly once core systems pass testing
Optional flags to deploy_e3sm_unified.py
Here, we start with the flags that a mainainer is most likely to need, with less useful flags at the bottom.
--recreate
: Rebuilds the Conda environment if it already exists. This will also recreate the installation environmenttemp_e3sm_unified_install
.Note: This will not rebuild Spack packages from scratch. To do that, manually delete the corresponding Spack directory before running the deployment script again. These directories are typically located under:
spack/e3sm_unified_<version>_<machine>_<compiler>_<mpi>
--mache_fork
and--mache_branch
: It is common to need to co-develop E3SM-Unified andmache
, and it is impractical to tag a release candidate and build the associated conda-forge package every time. Instead, use these flags to point to your fork and branch ofmache
to install into both the installation and testingconda
environments. Do not use this for release deployments.--tmpdir
: Set the$TMPDIR
environment variable for Spack to use in case/tmp
is full or not a desirable place to install.--version
: Typically you want to deploy the latest release candidate or release, which should be the hard-coded default. You can set this to a different value to perform a deployment of an earlier version if needed.--python
: Deploy with a different version of python than specified indefault.cfg
-m
or--machine
: Specify the machine ifmache
did not detect it correctly for some reason.-c
or--compiler
: Specify a different compiler than the default. To determine the default compiler, find the machine under mache’s machine config files. To determine which other compilers are supported, look at the list of mache spack templates (yaml
files).-i
or--mpi
: Similar to compilers, use this flag to specify an MPI variant other than the default. As above, you can determine the defaults and supported alternatives by looking in the configs and templates inmache
.-f
or--config_file
: You can provide a config file to override defaults fromdefault.cfg
or the config file for the specific machine frommache
. Use this with caution because this approach will be hard for other maintainers to reproduce in the future.--use_local
: Typically not useful but can be used in a pinch if you have built conda package locally in the installation you pointed to with--conda
and want to use them in the deployment.
Notes for Maintainers
A partial deployment is expected during RC testing; not all systems must be built initially. Chrysalis and Perlmutter are good places to start.
Always ensure that the E3SM spack fork has a
spack_for_mache_<version>
branch (e.g.spack_for_mache_1.32.0
) for the version ofmache
you are testing (e.g.mache
1.32.0rc1).Be aware of potential permission or filesystem issues when writing to shared software locations.
➡ Next: Troubleshooting Deployment