Testing E3SM Diagnostics

Testing at a Glance

E3SM Diagnostics uses four test layers across local, CI/CD, and LCRC environments. The diagram below shows how they fit together.

Testing architecture diagram showing test layers by environment.

Local Workflows

Default Local Check

To run the repository’s default automated local checks in one command:

./tests/test.sh

For Layer 3, ./tests/test.sh first looks for a local downloaded-data tree at /e3sm_diags_downloaded_data. If it is not present, the helper pulls the same OCI test-data image used by GitHub Actions and copies tests/integration/integration_test_data from that image into the working tree. If you need a nonstandard setup, use --source-root or --image with tests.integration.download_data directly.

Layer 1: Unit Tests

Covers: unit-level code correctness and API stability.

When to run: first during local development.

Run:

pytest tests/e3sm_diags

Layer 2: Targeted Image-Regression Tests

Covers: pixel-level regressions from code or dependency changes using targeted synthetic cases with committed baselines.

When to run: after Layer 1, especially for changes that may affect plotting or rendered output.

Run:

pytest tests/integration/test_plot_image_regressions.py -m image_regression

How it works:

This suite compares generated PNGs against committed baselines in tests/integration/baselines/ and writes dependency metadata for provenance.

It currently covers targeted synthetic regressions for lat_lon, polar, zonal_mean_2d, and cosp_histogram.

If a test fails:

Rerun with a persistent artifact directory:

IMAGE_REGRESSION_ARTIFACT_DIR=tests/integration/image_check_failures \
pytest tests/integration/test_plot_image_regressions.py -m image_regression

Inspect tests/integration/image_check_failures to determine whether the change is expected. Each failed image artifact directory includes the generated runtime_metadata.json and a dependency_diff.json comparing the runtime environment to the committed baseline_metadata.json.

Note

In GitHub Actions, build artifacts for failed image-regression tests are saved and can be downloaded from the bottom of the workflow run summary page.

How to update baselines:

If a targeted image change is intentional:

python -m tests.integration.refresh_plot_image_baselines
pytest tests/integration/test_plot_image_regressions.py -m image_regression

The refresh command regenerates all targeted Layer 2 baselines by default. To refresh only one targeted case:

python -m tests.integration.refresh_plot_image_baselines --case polar

Commit the updated PNGs and baseline_metadata.json.

Layer 3: Broad Downloaded-Data Integration Tests

Covers: broader workflow smoke coverage using downloaded data to ensure diagnostic workflows complete and generate outputs, without pixel-level image matching.

When to run: when you want wider integration coverage than Layers 1 and 2, but do not need exact image comparisons.

Run:

python -m tests.integration.download_data --data-only
CHECK_IMAGES=False pytest tests/integration -m 'not image_regression'

How it works:

These tests exercise broader diagnostics workflows with downloaded test data. They run with CHECK_IMAGES=False, so they are intended to catch integration and workflow regressions rather than serve as the visual regression authority.

By default, tests.integration.download_data uses the local /e3sm_diags_downloaded_data tree when it exists. Otherwise it pulls the same OCI image used by CI and copies the requested test-data directory from /e3sm_diags_downloaded_data inside that image using crane export. For nonstandard setups, use the --source-root or --image command-line options.

Role relative to Layer 2:

Layer 2 is the primary image-regression gate. Layer 3 provides wider smoke coverage.

CI/CD Workflows

Main GitHub Actions Workflow

The main GitHub Actions CI/CD workflow runs on pull requests and on main. It runs:

  1. Layer 1 unit tests

  2. Layer 2 targeted image-regression tests

  3. Layer 3 broad integration smoke tests with CHECK_IMAGES=False

CI/CD is the enforcement backstop. Contributors should still run relevant local checks before opening a pull request.

Within CI, Layer 2 is the primary visual regression gate. Layer 3 provides wider smoke coverage, but is not the image-matching authority.

E3SM-Unified Compatibility Workflow

GitHub Actions also runs a separate E3SM Unified Latest Release Compatibility job.

Purpose:

This job checks e3sm_diags against the most recent released linux-64 nompi e3sm-unified package on conda-forge. It is a production-regression check against the latest published E3SM-Unified environment, not a preview of unreleased feedstock changes.

What it runs:

This workflow runs Layer 2 in an environment derived from the latest released E3SM-Unified package on conda-forge, which may differ from the main CI environment if dependencies have changed since the last E3SM-Unified release.

Note

Implementation details: this job starts from conda-env/ci.yml, resolves the latest released e3sm-unified package metadata from conda-forge/linux-64/repodata.json.bz2, substitutes the released package dependency set into the CI environment, caches conda packages with the generated environment hash, and then runs Layer 2.

The compatibility workflow uses the same targeted image baselines as the main Layer 2 suite.

Manual LCRC Validation

Layer 4: Complete-Run Validation

Covers: full-run validation of all diagnostics against LCRC-hosted expected results.

When to run: when complete-run validation is needed on LCRC-hosted data and an E3SM-Unified environment on Anvil or Chrysalis.

Run:

tests/integration/complete_run.py is separate from CI/CD.

It compares images generated by a full diagnostics run against LCRC-hosted expected results. This test is manual because it depends on the LCRC data installation and an E3SM-Unified environment on Anvil or Chrysalis.

Warning

You must run this test manually. It is not part of the CI/CD workflow.

On Anvil or Chrysalis:

git fetch <fork-name> <branch-name>
git checkout -b run-lcrc-test <repo-name>/<branch-name>
source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh
# or:
source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_anvil.sh
pip install .
pytest tests/integration/complete_run.py

If the test fails:

Inspect the reported image differences and determine whether the change is intentional.

Updating Expected Results on LCRC

If a complete-run image change is intentional:

cd /lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/expected
cat README.md
mv all_sets previous_output/all_sets_<version>_<date>_<hash>
mv image_list_all_sets.txt previous_output/image_list_all_sets_<version>_<date>_<hash>.txt
mv <version>_all_sets/ /lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/expected/all_sets
cd /lcrc/group/e3sm/public_html/e3sm_diags_test_data/unit_test_complete_run/expected/all_sets
find . -type f -name '*.png' > ../image_list_all_sets.txt
cd ..

After the pull request is merged, update the LCRC README.md metadata to match the E3SM Diags version, date, and git commit used to generate the new expected images.