Quick guide for NERSC Cori using Shifter (v1)¶
Obtaining the container image¶
Shifter is the container runtime used at NERSC to run containers. It’s an alternative to Docker, but supports Docker containers.
View the
e3sm_diags
images available at NERSC.shifterimg images | grep e3sm_diags
If the version you want to use is already available, then please continue to step 3.
Otherwise, you’ll need to download the image you want, shown in step 2.
2. If the specific version you want or the latest
image is not shown, download it.
You can view all of the images available on the
e3sm_diags Docker Hub.
Below, we are getting the image with the latest tag.
shifterimg -v pull docker:e3sm/e3sm_diags:latest
You’ll see the same message with a timestamp print multiple times. This is normal and takes around 10 minutes or so. Something’s just wrong with Shifter, we don’t know why it does that.
Once an image is downloaded from a public repo like this one, all users on NERSC can use it.
You also cannot delete an image that you downloaded via Shifter for now. Please email NERSC support and they can do that for you.
wget
the following script.wget https://raw.githubusercontent.com/E3SM-Project/e3sm_diags/master/e3sm_diags/container/e3sm_diags_container.py
Running the entire annual latitude-longitude contour set¶
Copy and paste the below code into
myparams.py
using your favorite text editor. Adjust any options as you like.Tip: Make a folder in the following directory
/global/project/projectdirs/acme/www/
based off your username. Then you can setresults_dir
to/global/project/projectdirs/acme/www/<username>/lat_lon_demo
inmyparams.py
below to view the results via a web browser here: http://portal.nersc.gov/project/acme/<username>/lat_lon_demoreference_data_path = '/global/project/projectdirs/acme/e3sm_diags/obs_for_e3sm_diags/climatology/' test_data_path = '/global/project/projectdirs/acme/e3sm_diags/test_model_data_for_acme_diags/climatology/' test_name = '20161118.beta0.FC5COSP.ne30_ne30.edison' sets = ["lat_lon"] seasons = ["ANN"] # 'mpl' and 'vcs' are for matplotlib or vcs plots respectively. backend = 'mpl' # Name of folder where all results will be stored. results_dir = 'lat_lon_demo'
Since Shifter cannot be ran on the login nodes, it must be ran either in an interactive session on compute nodes, or as a batch job.
There are two kinds of compute nodes on Cori:
- Cori KNL:
68 cores/node
128 GB/node
Use
-C knl
with anysalloc
orsrun
commands.
- Cori Haswell:
32 cores/node
128 GB/node
Use
-C haswell
with anysalloc
orsrun
commands.
For more information on how to run any batch job on Cori, consult the documentation here.
Interactive session on compute nodes¶
First, request an interactive session with a single node (32 cores with Cori Haswell, 68 cores with Cori KNL) for one hour (running this example should take much less than this).
If obtaining a session takes too long, try to use the debug
partition.
Note that the maximum time allowed for this partition is 00:30:00
.
salloc --nodes=1 --partition=regular --time=01:00:00 -C haswell
Once the session is available, launch E3SM Diagnostics.
python e3sm_diags_container.py --shifter -p myparams.py
Tip: You can select the version of the container you want to run with the --container_version
argument.
If this argument isn’t defined, it defaults to the latest
container.
python e3sm_diags_container.py --shifter --container_version v1.5.0 -p myparams.py
Batch job¶
Alternatively, you can also create a script and submit it to the batch system.
Copy and paste the code below into a file named diags.bash
.
Please remember to change what directory you’re in to one accessible to you.
#!/bin/bash -l #SBATCH --job-name=diags #SBATCH --output=diags.o%j #SBATCH --partition=regular #SBATCH --account=acme #SBATCH --nodes=1 #SBATCH --time=01:00:00 #SBATCH -C haswell # Please change the directory below. cd /global/cscratch1/sd/golaz/tmp wget https://raw.githubusercontent.com/E3SM-Project/e3sm_diags/master/e3sm_diags/container/e3sm_diags_container.py python e3sm_diags_container.py --shifter -p myparams.py
And then submit it
sbatch diags.bash
View the status of your job with squeue -u <username>
.
Here’s the meaning of some values under the State (ST
) column:
PD
: PendingR
: RunningCA
: CancelledCD
: CompletedF
: FailedTO
: TimeoutNF
: Node Failure
Back to running the latitude-longitude contour set¶
Once you ran the diagnostics in an interactive session or via a batch job, open the following webpage to view the results.
lat_lon_demo/viewer/index.html
Tip: Once you’re on the webpage for a specific plot, click on the ‘Output Metadata’ drop down menu to view the metadata for the displayed plot. Running that command allows the displayed plot to be recreated. Changing any of the options will modify the just that resulting figure.
Running all of the diagnostics sets¶
Copy and paste the following into all_sets.py
using your
favorite text editor:
reference_data_path = '/global/project/projectdirs/acme/e3sm_diags/obs_for_e3sm_diags/climatology/' test_data_path = '/global/project/projectdirs/acme/e3sm_diags/test_model_data_for_acme_diags/climatology/' test_name = '20161118.beta0.FC5COSP.ne30_ne30.edison' # Not defining a sets parameter runs all of the default sets: # ['zonal_mean_xy', 'zonal_mean_2d', 'lat_lon', 'polar', 'cosp_histogram'] # 'mpl' and 'vcs' are for matplotlib or vcs plots respectively. backend = 'mpl' # Name of folder where all results will be stored. results_dir = 'diag_demo' # Optional settings below: diff_title = 'Model - Obs' multiprocessing = True # You can set this to 64 if running on the KNL nodes. num_workers = 32
Compared to the previous short test above, note the following changes:
Plots for all the available sets (‘zonal_mean_xy’, ‘zonal_mean_2d’, ‘lat_lon’, ‘polar’, ‘cosp_histogram’) are generated.
Multiprocessing with 32 workers is enabled.
Again, run the diagnostics with this new parameter file (
all_sets.py
), either in an interactive session or via a batch job.Open the following webpage to view the results.
diags_demo/viewer/index.html
Advanced: Running custom diagnostics¶
The following steps are for ‘advanced’ users, who want to run custom diagnostics. So most users will not run the software like this.
By default, all of the E3SM diagnostics are ran for the sets that we defined above. This takes some time, so instead we create our own diagnostics to be ran.
8. Copy and paste the code below in mydiags.cfg
.
Check Available Parameters
for all available parameters.
For more examples of these types of files, look here for the cfg file that was used to create all of the latitude-longitude sets.
[#] sets = ["lat_lon"] case_id = "GPCP_v2.2" variables = ["PRECT"] ref_name = "GPCP_v2.2" reference_name = "GPCP (yrs1979-2014)" seasons = ["ANN", "DJF"] regions = ["global"] test_colormap = "WhiteBlueGreenYellowRed.rgb" reference_colormap = "WhiteBlueGreenYellowRed.rgb" diff_colormap = "BrBG" contour_levels = [0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16] diff_levels = [-5, -4, -3, -2, -1, -0.5, 0.5, 1, 2, 3, 4, 5] [#] sets = ["lat_lon"] case_id = "SST_CL_HadISST" variables = ["SST"] ref_name = "HadISST_CL" reference_name = "HadISST/OI.v2 (Climatology) 1982-2001" seasons = ["ANN", "MAM"] contour_levels = [-1, 0, 1, 3, 6, 9, 12, 15, 18, 20, 22, 24, 26, 28, 29] diff_levels = [-5, -4, -3, -2, -1, -0.5, -0.2, 0.2, 0.5, 1, 2, 3, 4, 5]
Run E3SM diagnostics with the
-d
parameter.python e3sm_diags_container.py --shifter -p myparams.py -d mydiags.cfg