Troubleshooting Deployment
Even with well-maintained tools, deployment of Polaris on HPC systems often encounters system-specific or environment-specific problems. This page outlines common categories of issues and how to diagnose and resolve them.
This is an evolving list. Please make PRs to add descriptions of issues you have encountered and solutions you have found.
1. 🛠️ Spack Build Failures
Common Causes
- Missing or incompatible system modules ( - cmake,- perl,- bison, etc.)
- Outdated Spack package definitions in the - spack_for_mache_<version>branch on the E3SM fork
- Spack build cache pollution 
- Environment not set correctly for Spack to detect compilers/libraries 
Solutions
- If Spack is attempting to build common system tools ( - cmake,- tar, etc.), add their system versions to the Spack templates in- machewith- buildable: falseinstead to save time and prevent build problems.
- Check with - spack find,- spack config get compilers, and- spack config get modules
- Load required modules manually before re-running 
- Rebuild: - spack uninstall -y <package>or delete the full deployment directory
- Double-check you are using the correct - spack_for_mache_<version>branch
2. 🔢 Activation Script or Module Issues
Symptoms
- Scripts not found or symlinks broken 
- Compute node not detected 
Fixes
- Inspect Jinja2 templates for logic errors (especially for new systems) 
- Re-run deployment with - --recreate
- Validate compute node detection logic ( - $SLURM_JOB_ID,- $COBALT_JOBID, etc.)
- For new schedulers (e.g., PBS), extend template logic accordingly 
3. 🚫 Conda Environment Problems
Symptoms
- Conda fails to resolve dependencies 
- Environments install but are missing key packages 
Fixes
- Run with - --recreateto force a rebuild
- Inspect logs carefully for root cause messages 
- Use troubleshooting scripts to bisect failing specs 
- Check for channel mismatches or conflicting dev-label dependencies 
4. 💾 Filesystem and Permission Issues
Symptoms
- Scripts not executable by collaborators 
- Environment directories not group-readable 
Fixes
- Run: - chmod -R g+rxand- chgrp -R <group>as needed
- Confirm deployment messages show permission updates succeeded 
- Use - ls -lto inspect group ownership and mode bits
- You may need to coordinate with administrators or previous maintainers to set permissions 
5. 🧰 mache Configuration Problems
Symptoms
- Unknown machine error during deployment 
- Spack fails to load environment due to incorrect module list 
Fixes
- Ensure the correct - macheversion or branch is being installed
- Ensure that the machine has been added to - macheboth under machine config files and in the logic for machine discovery
- Validate updates to - config_machines.xmland spack YAML templates
6. 🪖 Spack Caching and Environment Contamination
Symptoms
- Builds complete but produce incorrect or stale binaries 
- Environment behaves inconsistently between deploys 
Fixes
- Clear Spack caches manually if needed 
- Always deploy from a clean - $TMPDIRand fresh clone if unsure
- Delete the entire directory: - rm -rf <spack_env_dir> 
7. ⚠️ Common Fix: Full Clean + Re-run
When in doubt, remove and rebuild everything:
rm -rf <spack_env_dir>
./configure_polaris_envs.py --conda ~/miniforge3 --recreate
This often resolves cases where previous state is interfering with a clean build.