Post-Processing
This guide is intended to walk users step-by-step through post-processing a simulation.
Short Term Archiving
First, short-term archiving is quite useful for post-processing.
By default, E3SM will store all output files under the <simulations_dir>/<case_name>/run/
directory. For long simulations, there could 10,000s to 100,000s of output files. Having so many files in a single directory can be very impractical, slowing down simple operations like ls
to a crawl. CIME includes a short-term archiving utility that will neatly organize output files into a separate <simulations_dir>/<case_name>/archive/
directory. Short term archiving can be accomplished with the following steps.
Tip
This can be done while the model is still running.
Use --force-move
to move instead of copying, which can take a long time. Set --last-date
to the latest date in the simulation you want to archive. You do not have to specify a beginning date.
cd <simulations_dir>/<case_name>/case_scripts
./case.st_archive --last-date <yyyy-mm-dd> --force-move --no-incomplete-logs
ls <e3sm_simulations_dir>/<case_name>/archive
Each component of the model has a directory under archive/
. There are also two additional directories under archive/
: logs
holds the gzipped log files and rest
holds the restart files.
Component | Directory | File naming pattern |
---|---|---|
Atmosphere (Earth Atmospheric Model) | archive/atm/hist |
*.eam.h* |
Coupler | archive/cpl/hist |
*.cpl.h* |
Sea Ice (MPAS-Sea-Ice) | archive/ice/hist |
*.mpassi.hist.* |
Land (Earth Land Model) | archive/lnd/hist |
*.elm.h* |
Ocean (MPAS-Ocean) | archive/ocn/hist |
*.mpaso.hist.* |
River Runoff (MOSART) | archive/rof/hist |
*.mosart.h* |
Post-Processing with zppy
To post-process a model run, do the following steps.
Warning
To post-process up to year n, then you must have short-term archived up to year n.
You can ask questions about zppy
on the zppy discussion board.
Install zppy
Load the E3SM Unified environment.
Tip
The E3SM Unified environment activation commands can be found on zppy's Getting started page. Alternatively, they can be found using Mache: click the relevant machine and find the base_path
listed under [e3sm_unified]
-- the activation command will be source <base_path>/load_latest_e3sm_unified_<machine_name>.sh
.
If you need a feature in zppy
that has not yet been included in the E3SM Unified environment, you can construct a development environment.
Configuration File
In <run_scripts_dir>
, create a new post-processing configuration file, or copy an existing one, and call it post.<case_name>.cfg
.
Tip
Good example configuration files can be found in the zppy
integration test directory -- test_complete_run_<machine_name>.cfg
Edit the file and customize as needed. The file is structured with [section]
and [[sub-sections]]
. There is a [default]
section, followed by additional sections for each available zppy task (climo
, ts
, e3sm_diags
, mpas_analysis
, …). Sub-sections can be used to have multiple instances of a particular task, for example having both regridded monthly and globally averaged time series files. Refer to the zppy
schematics documentation for more details.
The key sections of the configuration file are:
[default]
input
,output
,www
paths will likely need to be edited.
Note
The output of your simulation (<simulations_dir>/<case_name>
) is the input to zppy
. You can use the same directory for zppy
output as well, since zppy
will generate output under <output>/post
[climo]
mapping_file
path may need to be edited.- Typically you want to generate climatology files every 20,50 years:
years = begin_year:end_yr:averaging_period
– e.g.,years = "1:80:20", "1:50:50",
.
[ts]
mapping_file
path may need to be edited.- Typically you want to generate time series files every 10 years – e.g.,
years = "1:80:10"
.
[e3sm_diags]
reference_data_path
may need to be edited.short_name
is a shortened version of the case_nameyears
should match the[climo]
sectionyears
[mpas_analysis]
Years can be specified separately for time series, climatology, and ENSO plots. The lists must have the same lengths and each entry will be mapped to a realization of mpas_analysis
:
climo_years ="21-50", "51-100",
enso_years = "11-50", "11-100",
ts_years = "1-50", "1-100",
In this particular example, MPAS Analysis will be run twice. The first realization will produce climatology plots averaged over years 21-50, ENSO plots for years 11 to 50, and time series plots covering years 1 to 50. The second realization will cover years 51-100 for climatologies, 11-100 for ENSO, and 1-100 for time series.
[global_time_series]
climo_years
andts_years
should match their equivalents in the[mpas_analysis]
section.
Tip
See the zppy
parameters documentation for more information on parameters.
Launch zppy
Run zppy -c post.<case_name>.cfg
. This will submit a number of jobs. Run sq
to see what jobs are running.
zppy
automatically handles dependencies of jobs. E.g., e3sm_diags
jobs are dependent on climo
and ts
jobs, so they wait for those to finish. MPAS Analysis jobs re-use computations, so they are chained.
Most jobs run quickly, though E3SM Diags may take around an hour and MPAS Analysis may take several hours.
zppy
creates a new directory <simulations_dir>/<case_name>/post
. Each realization will have a shell script (typically bash
). This is the actual file that has been submitted to the batch
system. There will also be a log file *.o<job ID>
as well as a *.status
file. The status file indicates the state (WAITING, RUNNING, OK, ERROR). These files can be found in <simulations_dir>/<case_name>/post/scripts
. Once all the jobs are complete, you can check their status.
cd <simulations_dir>/<case_name>/post/scripts
cat *.status # should be a list of "OK"
grep -v "OK" *.status # lists files without "OK"
If you re-run zppy
, it will check the status of tasks and will skip a task if its status is “OK”. As your simulation progresses, you can update the post-processing years in the configuration file and re-run zppy
. Newly added task will be submitted, while previously completed ones will be skipped.
Tasks
If you run ls <simulations_dir>/<case_name>/post/scripts
you’ll see files like e3sm_diags_180x360_aave_model_vs_obs_0001-0020.status
. This is one e3sm_diags job. Parts of this file name are explained below:
Part of File Name | Meaning |
---|---|
e3sm_diags |
Task |
180x360_aave |
Grid |
model_vs_obs |
model_vs_model or model_vs_obs |
0001-0020 |
First and last years |
There is also a corresponding output file. It will have the same name but end with .o<job ID>
instead of .status
.
Output
The post-processing output is organized hierarchically. Examples:
<e3sm_simulations_dir>/<case_name>/post/atm/180x360_aave/ts/monthly/10yr
has the time series files – one variable per file, in 10 year periods as defined in<run_scripts_dir>/post.<case_name>.cfg
.<e3sm_simulations_dir>/<case_name>/post/atm/180x360_aave/clim/20yr
similarly has climatology files for 20 year periods, as defined in<run_scripts_dir>/post.<case_name>.cfg
.<e3sm_simulations_dir>/<case_name>/post/atm/glb/ts/monthly/10yr
has globally averaged files for 10 years periods as defined in<run_scripts_dir>/post.<case_name>.cfg
.