SCORPIO  1.7.0
Describing decompositions

One of the advantages of using SCORPIO is the ability to describe the way data is decomposed across compute processes and reusing it for reading and writing any data that uses the same decomposition. Another major advantage is the feature in SCORPIO to rearrange data before using low level I/O libraries like NetCDF and PnetCDF to write data to disk.

The I/O decompositions can be defined by the user using the PIO_initdecomp API. The algorithm used to rearrange data can be specified when initializing the SCORPIO I/O system using the PIO_init API. All decompositions created on the I/O system, using PIO_initdecomp, use the rearrangment method specified while initializing the I/O system. SCORPIO provides two methods to rearrange data from compute processes to I/O processes,

  • Box rearrangement
  • Subset rearrangement

Box rearrangement

In this method data is rearranged from compute to IO tasks such that each I/O task contains a contiguous block of data. In this case each compute task will transfer data to one or more IO tasks.

Subset rearrangement

In this method each IO task is associated with a unique subset of compute tasks and data is rearranged to each I/O task by rearranging data between compute processes belonging to the subset associated with the I/O task. Since data rearrangement is confined to a subset of tasks, each I/O task is not guaranteed to have a single contiguous block of data. Any futher data rearrangment required before writing data to disk is delegated to the low level I/O libraries like NetCDF and PnetCDF.

As an example suppose we have a global two dimensional grid of size 4x5 decomposed over 5 tasks. We represent the two dimensional grid in terms of offset from the initial element ie

     0  1  2  3 
     4  5  6  7 
     8  9 10 11
    12 13 14 15
    16 17 18 19 

Now suppose this data is distributed over the compute tasks as follows:

0: {   0  4 8 12  } 
1: {  16 1 5 9  } 
2: {  13 17 2 6  } 
3: {  10 14 18 3  } 
4: {   7 11 15 19  } 

If we have 2 io tasks the Box rearranger would give:

0: { 0  1  2  3  4  5  6  7  8  9  }
1: { 10 11 12 13 14 15 16 17 18 19 }

While the subset rearranger would give:

0: { 0  1  4  5  8  9  12 16 }
1: { 2  3  6  7  10 11 13 14 17 18 19 }

Note that while the box rearranger gives a data layout which is well balanced and well suited for the underlying io library, it had to communicate with every compute task to do so. On the other hand the subset rearranger communicated with only a portion of the compute tasks but requires more work on the part of the underlying io library to complete the operation.

Also note if every task is an IO task then the box rearranger will need to do an alltoall communication, while the subset rearranger does none. In fact using the subset rearranger with every compute task an IO task provides a measure of what you might expect the performance of the underlying IO library to be if it were used without SCORPIO.