Architecture
This document describes the technical architecture of the Emulator Components framework.
Design Philosophy
The framework follows these key principles:
- Minimal Fortran — Thin Fortran wrappers delegate to C++ implementations
- Extensibility — Abstract base class enables new emulator types
- Backend Flexibility — Pluggable inference backends for different deployment scenarios
- E3SM Integration — Native MCT coupling support
- Portable Configuration — YAML-based config parseable by Python and C++
Class Hierarchy
classDiagram
class EmulatorComp {
<<abstract>>
+create_instance()
+set_grid_data()
+setup_coupling()
+initialize()
+run(dt)
+finalize()
#init_impl()*
#run_impl(dt)*
#final_impl()*
}
class EmulatorAtm {
+init_coupling_indices()
-prepare_inputs()
-process_outputs()
-run_inference()
-import_coupling_fields()
-export_coupling_fields()
}
class EmulatorConfig {
+build: BuildConfig
+runtime: RuntimeConfig
+model_io: ModelIOConfig
+coupling: CouplingConfig
}
class InferenceBackend {
<<interface>>
+initialize(config)*
+infer(inputs, outputs, batch_size)*
+finalize()*
}
EmulatorComp <|-- EmulatorAtm
EmulatorAtm --> EmulatorConfig
EmulatorAtm --> InferenceBackend
Directory Structure
emulator_comps/
├── common/
│ └── src/
│ ├── emulator_comp.hpp/cpp # Base component class
│ ├── emulator_config.hpp/cpp # YAML configuration parsing
│ ├── emulator_context.hpp # Singleton context manager
│ ├── emulator_io.hpp/cpp # PIO-based I/O
│ ├── emulator_logger.hpp/cpp # Logging utility
│ ├── coupling_fields.hpp/cpp # Field index registry
│ └── inference/ # Backend implementations
│ ├── inference_backend.hpp # Abstract interface
│ ├── inference_factory.cpp # Backend factory
│ ├── stub_backend.hpp/cpp # No-op testing backend
│ └── libtorch_backend.hpp/cpp # LibTorch backend
├── eatm/
│ ├── cime_config/ # CIME integration
│ │ ├── buildnml # YAML config generator
│ │ ├── defaults_yaml_eatm # Default configuration
│ │ └── user_yaml_eatm # User override template
│ └── src/
│ ├── atm_comp_mct.F90 # Fortran MCT wrapper
│ ├── emulator_atm.hpp/cpp # Atmosphere emulator
│ ├── emulator_atm_interface.cpp # C interface
│ ├── emulator_atm_f2c.F90 # Fortran-C bindings
│ └── impl/ # ATM-specific helpers
│ ├── atm_coupling.hpp/cpp # Coupling indices
│ ├── atm_field_manager.hpp/cpp # Field storage
│ └── atm_io.hpp/cpp # IC/restart I/O
└── docs/ # Documentation
Fortran-C++ Interoperability
The framework uses a thin Fortran wrapper pattern similar to EAMxx:
┌─────────────────────┐
│ E3SM Driver │
│ (Fortran MCT) │
└──────────┬──────────┘
│ calls
▼
┌─────────────────────┐
│ atm_comp_mct.F90 │ Thin Fortran wrapper
│ (atm_init/run/ │ - Receives MCT data structures
│ final_mct) │ - Calls C interface functions
└──────────┬──────────┘
│ via iso_c_binding
▼
┌─────────────────────┐
│ emulator_atm_ │ C interface layer
│ interface.cpp │ - Manages global instance
│ │ - Type conversions
└──────────┬──────────┘
│ C++ calls
▼
┌─────────────────────┐
│ EmulatorAtm │ Full C++ implementation
│ (C++ class) │ - Grid management
│ │ - Coupling field exchange
│ │ - AI inference
└─────────────────────┘
Configuration System
EATM uses YAML-based configuration for portability:
- defaults_yaml_eatm — Shipped default values
- user_yaml_eatm — User overrides in case directory
- atm_in — Merged config in run directory (YAML format)
# Example atm_in structure
eatm:
build:
grid_name: gauss180x360
inference_backend: libtorch
runtime:
model_path: /path/to/model.pt
ic_file: /path/to/initial_conditions.nc
enabled: true
model_io:
spatial_mode: true
input_variables:
- Ta
- Qa
- PRESsfc
output_variables:
- prec
- lwdn
- swdn
coupling:
debug: false
Component Lifecycle
Initialization (atm_init_mct)
- Create C++ emulator instance via
EmulatorContext - Load YAML configuration (
atm_in) - Read grid from config-specified file
- Initialize MCT gsMap and domain
- Setup coupling field pointers
- Load AI model and initialize inference backend
Run (atm_run_mct)
- Import fields from coupler (
x2a→ internal fields) - Pack input fields into tensor (
prepare_inputs) - Run AI inference via backend
- Unpack outputs from tensor (
process_outputs) - Export fields to coupler (internal fields →
a2x)
Finalization (atm_final_mct)
- Finalize inference backend
- Deallocate field storage
- Cleanup I/O subsystem
- Release context singleton
Tensor Data Layout
The framework supports two data layouts based on the spatial_mode configuration:
Spatial Mode (CNN models like ACE2)
- Input:
[1, C, H, W]- single batch with all channels and spatial dims prepare_inputs()packs from[H*W, C]to[C, H, W](flattened)process_outputs()unpacks from[C, H, W]to[H*W, C]- Backend called with
batch_size=1
Pointwise Mode (MLP models)
- Input:
[batch_size, C]- each grid point is a sample - Data remains in
[H*W, C]format - Backend called with
batch_size=H*W
Coupling Fields
Imported Fields (x2a)
| Field | Description |
|---|---|
Sx_t |
Surface temperature [K] |
So_t |
Ocean temperature [K] |
Faxx_sen |
Sensible heat flux [W/m²] |
Faxx_lat |
Latent heat flux [W/m²] |
Sf_ifrac |
Ice fraction [-] |
| ... | See atm_coupling.hpp |
Exported Fields (a2x)
| Field | Description |
|---|---|
Sa_z |
Bottom level height [m] |
Sa_u, Sa_v |
Wind components [m/s] |
Sa_tbot |
Bottom temperature [K] |
Sa_pbot |
Bottom pressure [Pa] |
Faxa_lwdn |
Downward longwave [W/m²] |
Faxa_rainc/l |
Precipitation [kg/m²/s] |
| ... | See atm_coupling.hpp |
Grid Management
The EmulatorComp base class handles:
- Reading SCRIP-format grid files via PIO
- 1D domain decomposition across MPI ranks
- Column ID mapping for MCT gsMap
- Coordinate and area storage
Grid data is set from the YAML configuration (grid_file option) or
provided by the driver via set_grid_data().
Restart System
The emulator supports three types of restart files:
| File Type | Pattern | Purpose |
|---|---|---|
| Model Restart | atm.r.*.nc |
Full model state |
| History Restart | atm.rh{N}.*.nc |
Averaging state |
| Restart Pointer | rpointer.atm |
Index of restart files |
Restart Workflow
EmulatorOutputManagertracks restart frequency viaOutputControl- At restart steps,
write_restart()saves all prognostic fields rpointer.atmis updated with the new restart filename- For continuation runs,
find_restart_file()readsrpointer.atm
CIME Integration
Restart Frequency Sync
CIME controls restart frequency globally via REST_N and REST_OPTION.
The buildnml script reads these at case setup and writes them to atm_in:
# In buildnml._get_restart_config()
rest_n = case.get_value("REST_N")
rest_option = case.get_value("REST_OPTION")
This ensures EATM matches the global E3SM restart frequency.
If the user manually edits atm_in, those changes persist until the next
buildnml invocation.
Configuration Merging
At case setup (buildnml), configuration is merged:
defaults_yaml_eatm— Shipped defaultsuser_yaml_eatm— User overrides from case directory- CIME settings —
REST_N/REST_OPTION(override user settings) - Output:
atm_in— Merged YAML in run directory
Future Enhancements
Note
The following items are planned but not yet implemented.
Performance Optimizations
-
Batch Inference: Accumulate multiple timesteps before inference to amortize kernel launch overhead on GPUs. Add
batch_sizeconfig option. -
Async Inference Pipeline: Overlap compute and data transfer. Push inputs to GPU while previous batch is still computing.
Inference Backends
-
PYTORCH: Embed Python interpreter for native PyTorch without tracing. Useful for rapid prototyping and models that don't trace well.
-
ONNX: Use ONNX Runtime for cross-framework inference. Run models from PyTorch, TensorFlow, JAX in a unified, optimized runtime.
-
LAPIS: Kokkos-based inference for GPU memory sharing with E3SM components (EAMxx). Avoid data movement between Kokkos and ML tensors.
Usability
-
Model Metadata: Embed expected input/output variable names in TorchScript model metadata. Auto-validate at load time.
-
Multi-Model Support: Support ensemble runs or multi-fidelity models (cheap model for most timesteps, expensive model for key intervals).
Data Infrastructure
-
DataView Integration: Extend
FieldDataProviderto returnDataViewfor zero-copy field access. Current implementation indata_view.hpp. -
Memory Layout Specification: Add explicit row-major/column-major flags to
InferenceConfigfor correct tensor reshaping.