Parallel IO (IO)

For input and output of data needed for Omega, we use the Software for Caching Output and Reads for Parallel I/O (SCORPIO) library. This library supports parallel reading and writing of distributed arrays in various self-describing formats like NetCDF, HDF5, and ADIOS. SCORPIO is responsible for rearranging data from the data decomposition used for computation to a different rearrangement for parallel IO. In particular, for optimal IO, the user would specify a different number of IO tasks than for computation. For example, the user could specify the number of IO tasks to match underlying hardware like the network interfaces (NICs) on a node. Users and developers will generally access IO via IOStreams and other interfaces. The base IO layer described here only provides an Omega-aware wrapper around SCORPIO calls.

The base interfaces provide functions for file operations (open/close), reading and writing of metadata, and reading and writing of data arrays. Interfaces at this level utilize raw pointers to data and assume contiguous storage for arrays. SCORPIO utilizes integer handles to various files and data types so these must often be defined or retrieved for many operations. Although there is no IO class, we encapsulate the IO routines within the Omega and IO namespaces.

Before using any IO functions, the parallel IO system must be initialized using:

   int Err = IO::init(Comm);

where Comm is an MPI communicator and should in most cases be the communicator from the Omega default MachEnv (see MachEnv). This function also extracts the user-defined variables from the model configuration, include the number of IO tasks, the IO task stride, the default data rearranger method, and the default file format (see User Guide).

As mentioned above, most I/O operations will take place within the IOStreams module, but the base IO functions can be accessed directly. To open and close files for reading/writing, use:

   int Err = IO::openFile(FileID, Filename, Mode);
   Err = IO::closeFile(FileID);

or

   int Err = IO::openFile(FileID, Filename, Mode, Format, IfExists);
   Err = IO::closeFile(FileID);

In both case, an integer FileID is assigned to the open file which is then used by all subsequent operations. The Filename is a std::string that should include the full path and name of the file. The mode can be either OMEGA::IO::ModeRead or OMEGA::IO::ModeWrite. The shorter interface uses default values for the file format, overwrite behavior and precision. In the longer interface, the user must supply these values using the enums:

/// Supported file formats
enum FileFmt {
   FmtNetCDF3  = PIO_IOTYPE_NETCDF,   ///< NetCDF3 classic format
   FmtPnetCDF  = PIO_IOTYPE_PNETCDF,  ///< Parallel NetCDF3
   FmtNetCDF4c = PIO_IOTYPE_NETCDF4C, ///< NetCDF4 (HDF5-compatible) cmpressed
   FmtNetCDF4p = PIO_IOTYPE_NETCDF4P, ///< NetCDF4 (HDF5-compatible) parallel
   FmtNetCDF4  = PIO_IOTYPE_NETCDF4P, ///< NetCDF4 (HDF5-compatible) parallel
   FmtHDF5     = PIO_IOTYPE_HDF5,     ///< native HDF5 format
   FmtADIOS    = PIO_IOTYPE_ADIOS,    ///< ADIOS format
   FmtUnknown  = -1,                  ///< Unknown or undefined
   FmtDefault  = PIO_IOTYPE_NETCDF4C, ///< NetCDF4 is default
};

/// Behavior (for output files) when a file already exists
enum class IfExists {
   Fail,    /// Fail with an error
   Replace, /// Replace the file
   Append,  /// Append to the existing file
};

with the defaults for each being IO::FmtDefault, and IO::IfExists::Fail. The E3SM standard format is currently NetCDF4. Earlier NetCDF formats should be avoided, but are provided in case an input file is in an earlier format.

Once the file is open, data is read/written using one of two interfaces, depending on whether the array is decomposed across MPI tasks or not. For large decomposed arrays, the interface is:

int Err = IO::readArray (&Array, Size, VariableName, FileID, DecompID, VarID);
int Err = IO::writeArray(&Array, Size, &FillValue,   FileID, DecompID, VarID);

where the pointer to the data array is passed and data is assumed to be contiguous with the full local size of the array passed as Size. The FileID is the integer ID for the open file. The DecompID is a defined data decomposition as described below. For reading, the variable name (as a std::string) is supplied and the variable ID (VarID) is returned in case it is needed for reading of variable metadata. For writing, a FillValue is supplied to fill undefined locations in an array and the variable ID must have been assigned in a prior defineVar call prior to the write as described below.

Writing or reading multiple time slices (where there in an unlimited time dimension) is also possible and an additional optional Frame argument specifies the time index along that dimension that should be read/written:

int Err = IO::readArray (&Array, Size, VariableName, FileID, DecompID, VarID, Frame);
int Err = IO::writeArray(&Array, Size, &FillValue,   FileID, DecompID, VarID, Frame);

For arrays or scalars that are not distributed, the non-distributed variable interface must be used:

int Err = IO::readNDVar(&Array, VariableName, FileID, VarID);
int Err = IO::writeNDVar(&Array, FileID, VarID);

with arguments similar to the distributed array calls above. Note that when defining dimensions for these fields, the dimensions must be non-distributed. For scalars, the number of dimensions should be zero. Multiple time slices can be also be read/written for non-distributed fields, but require two additional arguments. As in the distributed array, the Frame (index of the time slice) must be provided. In addition, a vector std::vector<int> DimLengths containing the length of the non-time dimensions must be provided:

int Err = IO::readNDVar(&Array, VariableName, FileID, VarID, Frame, DimLengths);
int Err = IO::writeNDVar(&Array, FileID, VarID, Frame, DimLengths);

Note that the full arrays in this case are written so if any masking or pruning of points is required, it should be performed before the call.

The IO subsystem must know how the data is laid out in the parallel decomposition. Both the dimensions of the array and the decomposition across tasks must be defined. For each dimension, a dimension must be defined using:

   int Err = IO::defineDim(FileID, DimName, Length, DimID);

where FileID is the ID of an open file, the DimName is a std::string with the dimension name (eg NCells, NEdges, NVertices, NVertLevels or NTracers), length is the length of the full global array and DimID is the ID assigned to this dimension. Note that for reading a file, we supply the function:

   int Err = IO::getDimFromFile(FileID, DimName, DimID, DimLength);

that can be used to inquire about the dimension length and retrieve the dimension ID from the file.

Once the dimensions are defined, the decomposition of an array must be defined using:

   int Err = createDecomp(DecompID, IODataType, NDims, DimLengths,
                          Size, GlobalIndx, Rearr);

where DecompID is the ID assigned to the newly created decomposition, NDims is the number of dimensions for the array, DimLengths is an integer std::vector of the NDims local dimension lengths, Size is the full size of the local array, and Rearr is the data rearranger method to use to map to the number of IO tasks. The Rearr can be set to OMEGA::IO::DefaultRearr to make use of the overall default defined when the IO system was initialized, but can also be set explicitly to OMEGA::IO::RearrBox or OMEGA::IO::RearrSubset. The box rearranger is generally preferred (see UserGuide). The GlobalIndx array describes the global location (as a zero-based offset) of each local array entry. This can be computed from the Omega Default Decomp arrays. For example, an array dimensioned (NCellsAll,NVertLevels) would have an offset computed using:

   std::vector<int> OffsetCell(NCellsAll*NVertLevels,-1);
   int Add = 0;
   for (int Cell; Cell = 0; Cell < NCellsOwned){
      for (int k; k = 0; k < NVertLevels){
         OffsetCell[Add] = (CellIDH(Cell)-1)*NVertLevels + k;
         Add++;
      }
   }

Note that we exclude Halo layers by assigning an offset of -1. Finally, the data type of the array must be supplied. To map the standard Omega data types to the data types used in the IO subsystem, we define:

enum IODataType {
   IOTypeI4      = PIO_INT,    /// 32-bit integer
   IOTypeI8      = PIO_INT64,  /// 64-bit integer
   IOTypeR4      = PIO_REAL,   /// 32-bit real
   IOTypeR8      = PIO_DOUBLE, /// 64-bit real
   IOTypeChar    = PIO_CHAR,   /// Character/string
   IOTypeLogical = PIO_INT     /// Logicals are converted to ints for IO
};

so that in the above interface, we would supply for example IO::IOTypeI4 for an Omega I4 data type.

Now that dimensions and decompositions have been defined, a variable can be defined (this is required for writing only) using:

   int Err = IO::defineVar(FileID, VarName, IODataType, NDims, DimIDs, VarID);

where VarID is the ID assigned to the variable, FileID is the usual ID of the data file, VarName is a std::string holding the variable name, IODataType is the data type described above, NDims are the number of dimensions, and DimIDs are an integer std::vector holding the dimension IDs for each dimension. For scalar variables, NDims should be set to zero and a null pointer should be used in place of the DimID argument. Once defined, the variable ID is used in all IO calls related to this variable.

In addition to data in a file, we can also read and write metadata. As with the data itself, metadata is typically managed by the IOStreams and Metadata interfaces, but the base IO module contains interfaces for reading and writing metadata associated with either an array or the file or simulation itself. To read/write metadata, use:

   int Err = IO::writeMeta(MetaName, MetaValue, FileID, VarID);
   int Err = IO::readMeta (MetaName, MetaValue, FileID, VarID);

where MetaName is a std::string holding the name of the metadata and the MetaValue is the value of the MetaData. All supported Omega data types are allowed except boolean which must be converted to an integer type. The FileID is once again the ID of the open data file and VarID is the variable to which this metadata is attached. For global file and simulation metadata not attached to a variable, the ID IO::GlobalID is used to denote global metadata. If data is being appended to a file (eg for multiple time slices), writeMeta will check first to see if metadata already exists with the same value. If the entry is absent, new metadata will be written. If it exists but with a different value, the writeMeta function will replace the current entry with the new value.

For an example of the full read/write process, the IO unit test contains the full reading and writing of a data file and associated metadata. We will repeat again that users and developers are not expected to learn and use the interfaces above, but should access the IO system from the IOStreams interfaces.