Deploying on HPCs
Once a release candidate of E3SM-Unified is ready, it must be deployed and
tested on HPC systems using the deployment configuration in this repository’s
deploy/ directory together with mache deploy. Pixi manages the deployed
conda environments, and Spack is used only when the selected package variant
requires machine-specific system libraries.
This document explains the deployment workflow, what needs to be updated, and how to test and validate the install.
Deployment Components
Deployment happens via the following components:
🔧 ./deploy.py
The main entry point for deploying E3SM-Unified from this repository
Reads
deploy/pins.cfg,deploy/cli_spec.json, anddeploy/custom_cli_spec.jsonCreates a bootstrap Pixi environment and then invokes
mache deploy runinternally
The workflow should start from the repository root with ./deploy.py.
The CLI exposed by ./deploy.py is assembled from deploy/cli_spec.json
plus deploy/custom_cli_spec.json.
You can inspect the routed arguments in:
./deploy.py --help
cat deploy/cli_spec.json
cat deploy/custom_cli_spec.json
The wrapper also supports repository-specific flags such as --release,
--package-source, --package-mpi, --env-layout, and
--load-script-dir.
When mache itself changes, refresh the mache-owned deployment assets with
mache deploy update before testing deployments. See
Updating mache for the exact sequence and for the list of
files that still must be edited manually afterward.
📁 deploy/pins.cfg
Pins the bootstrap Python version, deployed Python version, and
macheversionPins versions for Spack-managed packages and optional post-install packages
Is not updated by
mache deploy update; maintainers must edit it manually after refreshing mache-owned assetsShould stay aligned with
recipes/e3sm-unified/e3sm-unified-feedstock/recipe/recipe.yamlunless there is an intentional reason to diverge
⚙️ deploy/config.yaml.j2
Defines the generic
mache deployproject configurationDelegates machine-specific and E3SM-Unified-specific decisions to hooks
Controls Pixi prefixes, permissions, runtime version checks, and Spack behavior
🧰 deploy/hooks.py
Resolves the deployed E3SM-Unified version from the feedstock recipe unless overridden
Selects channels, package source, package MPI, and environment layout
Enables or disables Spack dynamically based on the chosen package variant
Adds E3SM-Unified-specific environment variables through
deploy/load.sh
🧪 Templates
The key templates live in the deploy/ directory:
deploy/pixi.toml.j2deploy/spack.yaml.j2deploy/config.yaml.j2
mache contributes the higher-level deployment templates and generated wrapper
scripts. E3SM-Unified supplies the repository-specific pieces above.
When you run mache deploy update, the mache-owned wrapper and template files
in deploy/ are refreshed automatically. The maintainer-owned version pins in
deploy/pins.cfg and the CI dependency pin in pixi.toml still need manual
updates.
🧪 ci/render_pixi_manifest.py
Used only in CI to validate that locally built packages can be installed with Pixi from the generated local channel
Typical Deployment Steps
Update config files:
If you updated
mache, first runmache deploy updateas described in UpdatingmacheUpdate the version and dependency pins in
recipes/e3sm-unified/e3sm-unified-feedstock/recipe/recipe.yamlUpdate
deploy/pins.cfgwith bootstrap,mache, Spack, and post-install pinsUpdate
pixi.tomlso the CI/test environment uses the intendedmacheversionReview
deploy/hooks.pyif package-variant or channel behavior changedUpdate
macheconfig files (see Updatingmache)
Test the build on one or more HPC machines:
./deploy.py --machine <machine>
Note: This can take a lot of time. If the connection to the HPC machine is not stable, you should use
screenor similar to preserve your connection and you should pipe the output to a log file, e.g.:./deploy.py --machine <machine> 2>&1 | tee deploy.log
Note: It is not recommended that you try to deploy E3SM-Unified simultaneously on two different machines that share the same deployment prefix. The runs will step on each other’s state.
Check terminal output and validate that:
The expected Pixi environment or environments were created
Spack built the expected packages when
--package-mpi hpcwas usedActivation scripts were generated and symlinked correctly
Permissions have been updated successfully (read only for everyone except the E3SM-Unified maintainer)
Verify compute-node activation
When
--env-layout dualis used, the load scripts are designed to load a login-friendly environment on login nodes and the compute-node environment when scheduler variables indicate that you are inside an allocation. Before manual testing, confirm that sourcing the script on a compute node loads the expected package variant.Steps:
Start an interactive job on a compute node, for example:
Slurm:
salloc -N 1 -t 10:00
Cobalt:
qsub -I -n 1 -t 10
PBS (example):
qsub -I -l select=1:ncpus=1:mpiprocs=1,walltime=00:10:00
On the compute node, source the activation script:
Bash/zsh:
source test_e3sm_unified_<version>_<machine>.sh
csh/tcsh:
source test_e3sm_unified_<version>_<machine>.csh
For release builds, use the corresponding
load_e3sm_unified_<version>_<machine>.*orload_latest_e3sm_unified_<machine>.*script names.Verify that the MPI environment is active (not the no-MPI one):
echo "$E3SMU_MPI" # should reflect the compute-node variant which python # should point to the active Pixi env python -c "import mpi4py, xarray; print('mpi4py:', mpi4py.__version__)"
Optional quick MPI sanity check (if mpirun/srun is available on the node):
mpirun -n 2 python -c "from mpi4py import MPI; print(MPI.COMM_WORLD.Get_size())" # or, for Slurm srun -n 2 python -c "from mpi4py import MPI; print(MPI.COMM_WORLD.Get_size())"
If the script loads the login environment on a compute node, check that the scheduler environment variables are present on compute nodes for this machine and update the relevant
machedeployment logic if needed.Manually test tools in the installed environment
Load via:
source test_e3sm_unified_<version>_<machine>.shRun tools like
zppy,e3sm_diags,mpas_analysis
Deploy more broadly once core systems pass testing
Common flags
These are the repository-specific flags maintainers are most likely to use.
--recreate: Rebuilds the Pixi environment if it already exists.This will not necessarily rebuild Spack packages from scratch. To do that, manually delete the corresponding Spack environment before rerunning.
--release: Performs a shared release deployment. This forbids local package builds and forkedmachesources.--package-source: Choose betweenconda-forgeandlocal-build.local-buildis useful while validating a release candidate before the package has been published.--package-mpi: Choosenompi,mpich,openmpi, orhpc.On supported HPC systems,
hpcis typically the target release variant.--env-layout: Choosesingleordual.dualis typically used for shared HPC deployments so login and compute nodes can use different runtime environments through one load script.--e3sm-unified-version: Override the version resolved from the feedstock recipe.--mache-forkand--mache-branch: Test against a development branch ofmacheinstead of a tagged release. Do not use these for release deployments.--spack-path: Point to a specific Spack checkout.--spack-tmpdir: Set a temporary directory for Spack builds.--python: Override the Python version pinned indeploy/pins.cfg.-mor--machine: Specify the machine ifmachedid not detect it correctly for some reason.-cor--compiler: Specify a different compiler than the default. To determine the default compiler, find the machine under mache’s machine config files. To determine which other compilers are supported, look at the list of mache spack templates (yamlfiles).-ior--mpi: Similar to compilers, use this flag to specify an MPI variant other than the default. As above, you can determine the defaults and supported alternatives by looking in the configs and templates inmache.--load-script-dir: Write a copy of the generated load script to a specific directory.
Notes for Maintainers
A partial deployment is expected during RC testing; not all systems must be built initially. Chrysalis and Perlmutter are good places to start.
Always ensure that the E3SM spack fork has a
spack_for_mache_<version>branch (e.g.spack_for_mache_1.32.0) for the version ofmacheyou are testing (e.g.mache1.32.0rc1).Be aware of potential permission or filesystem issues when writing to shared software locations.
➡ Next: Troubleshooting Deployment