(omega-dev-decomp)=

## Domain Decomposition (Decomp)

In order to run across nodes in a parallel computer, Omega subdivides the
horizontal domain into subdomains that are distributed across the machine
and communicate using message passing via the Message Passing Interface (MPI).
To decompose the domain, we utilize the
[METIS](http://glaros.dtc.umn.edu/gkhome/metis/metis/overview) library.
An Omega mesh is fully described by the
[MPAS Mesh Specification](https://mpas-dev.github.io/files/documents/MPAS-MeshSpec.pdf)
which we will reproduce here eventually. The Decomp class decomposes
the domain based on the index space and number of MPI tasks (currently
one subdomain per task). Once decomposed, the Decomp class holds all
of the index space information as described below.

Within Omega, a default decomposition of the mesh is first created
with the call:
```c++
 Decomp::init();
```
This must be called very early in the init process, just after initializing
the MachEnv, Config and IO.  Mesh information is first read using parallel
IO into an equally-spaced linear decomposition, then partitioned by METIS
into a more optimal decomposition. The parallel METIS implementation
(ParMETIS) will eventually be used to reduce the memory footprint for
high-resolution configurations, but currently we use the serial METIS library
for the partitioning.

METIS requires information about the connectivity in the mesh. In particular,
it needs the total number of cells and edges in the mesh and the connectivity
given by the CellsOnCell array that stores the indices of neighboring cells
for each cell. Once the cells have been partitioned, we determine multiple
layers of halo cells that will be needed to avoid excessive communcation with
remote neighboring subdomains. The CellsOnCell, EdgesOnCell
and VerticesOnCell arrays are then distributed according to this cell
partitioning. Edges and Vertices are then partitioned. Ownership of each
edge/vertex is determined by the first (valid) cell index in the CellsOnEdge
or CellsOnVertex for that edge/vertex. The remaining connectivity arrays
(CellsOnEdge, EdgesOnEdge, VerticesOnEdge, CellsOnVertex, EdgesOnVertex)
are the redistributed to match the edge and vertex distributions. Halos
are filled to ensure all necessary edge and vertex information for the
cell decomposition (and cell halos) are present in the subdomain.

After the call to the Decomp initialization routine, a Decomp named
Default has been created and can be retrieved with
```c++
OMEGA::Decomp *DefDecomp = OMEGA::Decomp::getDefault();
```
Once retrieved all Decomp members are public and can be accessed using
```c++
OMEGA::I4 NCells = DefDecomp->NCells;
OMEGA::HostArray1DI4 CellIDH = DefDecomp->CellIDH;
```
Decomp is a container for all mesh index and connectivity arrays as
described in the mesh specification above. In particular, it contains
  - NCellsGlobal: the total number of cells
  - NCellsAll: the total number of cells in the local partition
  - NCellsSize: the length of the Cell array axis (typicall NCellsAll+1)
  - NCellsOwned: the number of cells owned by this task
  - NCellsHalo(i): the number of owned+halo cells for each halo layer
  - Analogous size variables for Edges and Vertices
  - MaxEdges: the max number of edges on a cell (and array size)
  - VertexDegree: the number of cells/edges at each vertex
  - CellID(NCellsSize): the global index for each local cell
  - CellLoc(NCellsSize,2): the task and local index for every local cell
    (redundant for owned cells but gives the remote location for halo cells)
  - Analogous arrays for Edges, Vertices
  - CellsOnCell(NCellsSize,MaxEdges): the local index for neighbor cells
    across each cell edge
  - EdgesOnCell(NCellsSize,MaxEdges): the local edge index for each cell edge
  - VerticesOnCell(NCellsSize,MaxEdges): the local index for each cell vertex
  - CellsOnEdge(NEdgesSize,2): the index of the 2 cells sharing an edge
  - VerticesOnEdge(NEdgesSize,2): the vertex index at each edge endpoint
  - EdgesOnEdge(NEdgesSize,MaxEdges x 2): the index for all edges contributing
    to an edge (all edges of the 2 cells sharing an edge)
  - CellsOnVertex(NVerticesSize,VertexDegree): indices for all cells meeting
    at a vertex
  - EdgesOnVertex(NVerticesSize,VertexDegree): indices for all edges meeting
    at a vertex
  - NEdgesOnCell(NCellsSize): the number of actual edges on each cell
  - NEdgesOnEdge(NEdgesSize): the number of actual edges on each edge

For each of the arrays above, there is a copy of the array on the host and
device (GPU) with the host array named with an extra H on the end
(eg CellsOnCellH). All are Kokkos arrays so are accessed with (index) rather
than [index] and for some of the arrays noted above are multi-dimensional.
A typical host loop might then look something like:
```c++
for (int Cell = 0; Cell < NCellsOwned; ++Cell) {
   for (int Edge = 0; Edge < NEdgesOnCell(Cell); ++Edge) {
      VarH(Cell) = VarH(Cell) + FluxH(Cell,Edge);
   }
}
```
And on the device, we use `OMEGA::parallelFor` in place of `Kokkos::parallel_for`;
```c++
OMEGA::parallelFor( {NCellsOwned,MaxEdges},
                       KOKKOS_LAMBDA (int Cell, int Edge) {
  if (Edge < NEdgesOnCell(Cell)) {
      Var(Cell) = Var(Cell) + Flux(Cell,Edge);
  }
});
```

Any defined decomposition can be removed by name using
```c++
Decomp::erase(Name);
```
and all decompositions *must* be removed before the Kokkos finalize call using
```c++
Decomp::clear();
```
which destroys all host and device arrays before Kokkos finalizes and removes
the memory pool in which all the arrays are allocated. Failure to call clear
before `Kokkos::finalize()` will result in an error.