# Unified Mesh: River Network Preparation date: 2026/04/19 Contributors: - Xylar Asay-Davis - Codex - Claude ## Summary This design describes the shared `prepare_river_network` step and associated tasks that can run the shared river steps on their own for the unified global base-mesh workflow. The purpose of the step is to simplify a global river dataset into products that can be consumed directly by `build_sizing_field` without re-reading or reinterpreting the raw source data. The shared river-network workflow is implemented in Polaris pull request . The preferred first source is HydroRIVERS or an equivalent global flowline dataset. Unlike the standalone [`mpas_land_mesh`](https://github.com/changliao1025/mpas_land_mesh) workflow, the Polaris design makes the downstream interface explicit. In particular, the workflow distinguishes between the authoritative simplified river network, the target-grid products needed by `build_sizing_field`, and the mesh-conditioned products needed by `create_base_mesh`, rather than overloading a single raster with mixed semantics. Because river-network simplification and river-driven meshing are the parts of the workflow where Xylar's design intuition is currently weakest, the first Polaris design should preserve the [`mpas_land_mesh`](https://github.com/changliao1025/mpas_land_mesh) river algorithms as closely as is practical. The implementation aligns `prepare_river_network` with the shared target-grid tier and coastline interpretation chosen for the workflow, while deferring river-outlet reconciliation until after an MPAS base mesh exists. Success means that Polaris gains a documented, reusable river-network preprocessing workflow that preserves the major hydrographic controls relevant for mesh generation and makes its outputs easy to inspect and easy for downstream steps to consume. ## Workflow Context The overall unified-mesh workflow is described in [Unified Mesh: Global Base Mesh Workflow](unified_base_mesh.md). The upstream unified-mesh workflow design is: - [Unified Mesh: Coastline Preparation](unified_mesh_prepare_coastline.md) The downstream unified-mesh workflow designs are: - [Unified Mesh: Sizing-Field Construction](unified_mesh_build_sizing_field.md) - [Unified Mesh: Base-Mesh Creation and Downstream Integration](unified_mesh_create_base_mesh.md) ## Requirements ### Requirement: Downstream-Ready River Network Products Date last modified: 2026/05/16 Contributors: - Xylar Asay-Davis - Codex `prepare_river_network` shall provide source-level, target-grid, and mesh-conditioned river products that can be consumed directly by `build_sizing_field` and `create_base_mesh`. The shared products shall retain the major river-network information needed for mesh refinement and direct cell-center placement, including channel locations and basin-root provenance. The downstream sizing-field and base-mesh steps shall not need to rerun HydroRIVERS filtering, network reconstruction, or coastline-aware river clipping and simplification. Coastline-aware river clipping shall be local to river-line geometry. It shall remove only the portions of each retained river line that fall inside the coastal exclusion band, preserving valid inland pieces rather than pruning whole trees or short inland fragments. ### Requirement: Hydrologically Meaningful Simplification Date last modified: 2026/05/15 Contributors: - Xylar Asay-Davis - Codex The first implementation shall preserve the dominant global river main stems and major tributaries needed to inform mesh resolution. Terminal river segments shall be retained as basin roots for traversal and grouping, not as coastline-reconciled outlet products. The design shall support filtering by drainage area and by proximity so the retained network reflects the target mesh scale rather than the full source dataset density. The simplification shall preserve connectivity and confluence structure rather than reducing the product to disconnected local segments. Where practical, the first Polaris design shall preserve the existing [`mpas_land_mesh`](https://github.com/changliao1025/mpas_land_mesh) river-network algorithms rather than redesigning them. ### Requirement: Deferred Outlet Reconciliation Date last modified: 2026/05/15 Contributors: - Xylar Asay-Davis - Codex The pre-base-mesh river workflow shall not snap river outlets to the coastline, write separate outlet products, or refine the sizing field based on outlet mask cells. The workflow shall preserve enough basin-root provenance, through `outlet_hyriv_id`, `outlet_drainage_area`, and `river_network_rank`, for downstream workflows to identify, select, and optionally write per-catchment products without rerunning HydroRIVERS simplification. Outlet/coastline reconciliation shall still occur after the MPAS base mesh exists. ### Requirement: Standalone River-Network Task Date last modified: 2026/05/15 Contributors: - Xylar Asay-Davis - Codex - Claude Polaris shall provide a standalone task per named unified mesh that runs the full shared river-network workflow for that mesh, including HydroRIVERS simplification, channel rasterization, and coastline-aware clipping, together with the shared upstream steps it depends on (for example `e3sm/init/topo/combine` and `prepare_coastline`). The standalone task shall make it practical to inspect retained basins, target-grid river-channel masks, and clipped river geometry without running the full unified mesh workflow. The same shared steps and configuration shall be reusable from the full unified workflow when settings match. ### Requirement: Reproducible Source Data Access Date last modified: 2026/04/19 Contributors: - Xylar Asay-Davis - Codex All source datasets needed by `prepare_river_network` shall be obtained either from documented public sources or, if that is not feasible, from the Polaris database. The preferred implementation shall download raw source data from public sources and perform any needed preprocessing within Polaris rather than requiring users to provide local input-file paths. Adding preprocessed artifacts to the Polaris database should be treated as a fallback for cases where the source data are not publicly distributable or the required preprocessing cannot be reproduced robustly within Polaris. ## Algorithm Design ### Algorithm Design: Downstream-Ready River Network Products Date last modified: 2026/05/16 Contributors: - Xylar Asay-Davis - Codex - Claude The current implementation separates source-level hydrographic products from target-grid products rather than trying to make one step serve both roles. This aligns with the design intent that downstream consumers should not need to reinterpret HydroRIVERS or infer outlet semantics from one overloaded raster. At the source level, the workflow writes: - `simplified_river_network.geojson`, containing retained segments with `hyriv_id`, `main_riv`, `ord_stra`, `drainage_area`, `next_down`, `endorheic`, `outlet_hyriv_id`, `outlet_drainage_area`, and `river_network_rank`; networks are ordered largest-first by terminal-root drainage area, and the rank field makes the N largest networks directly selectable without relying on feature order alone. The `outlet_hyriv_id` field is retained as basin-root provenance for future catchment grouping, not as a coastline-reconciled outlet product. At the target-grid level, the workflow writes: - `river_network.nc`, with `river_channel_mask`. This is intentionally clearer than the standalone workflow's mixed raster semantics. The present implementation does not yet add stream-order rasters or basin IDs, but it does establish a clean product split that the `build_sizing_field` implementation now consumes directly. For base-mesh consumers, the workflow also writes a mesh-conditioned product set: - `clipped_river_network.geojson`, containing river segments clipped inland of the coastline and simplified for direct JIGSAW geometry use, with valid inland pieces preserved even when one source feature is split by the coastal exclusion band, and with networks ordered largest-first by terminal-root drainage area; and - `clipped_river_network.nc`, containing masks regenerated from the clipped network for diagnostics. These products are where the river workflow becomes aware of the selected unified mesh and its direct cell-placement needs. `build_sizing_field` uses the target-grid masks, while `create_base_mesh` consumes the conditioned vector geometry. Generating the clipped products requires evaluating the coastline's `signed_distance` field along each retained river line. The implementation first densifies each line at the coastline-grid scale, then batches all sampled coordinates from all segments into a single array, performs one vectorised bilinear-interpolation call over the entire network, and splits the resulting distance values back to the corresponding per-segment slices. The clipped geometry is then built by retaining sampled intervals farther inland than the configured clip distance and linearly interpolating exact threshold crossings. This makes clipping local to the geometry near the coastline while avoiding artificial inland gaps. Short retained inland pieces are preserved; only degenerate pieces with fewer than two distinct points are removed. ### Algorithm Design: Hydrologically Meaningful Simplification Date last modified: 2026/05/16 Contributors: - Xylar Asay-Davis - Codex The current Polaris implementation is a focused reimplementation built around HydroRIVERS attributes such as `HYRIV_ID`, `MAIN_RIV`, `ORD_STRA`, `UPLAND_SKM`, `NEXT_DOWN`, and `ENDORHEIC`. Its staged logic is: 1. Filter source flowlines by a minimum drainage-area threshold tied to the intended river-refinement scale. 2. Merge multiple source features with the same `hyriv_id` into one canonical segment when needed. 3. Validate that the retained `NEXT_DOWN` graph is acyclic before attempting basin traversal. 4. Identify terminal basin roots from segments with `next_down == 0`. 5. Traverse upstream iteratively from each terminal root, keeping the largest upstream segment at each confluence as the main stem. 6. Retain additional tributaries when either their drainage area exceeds a configurable fraction of the largest upstream branch at the current confluence or their minimum distance from the already retained basin skeleton exceeds the branch-distance tolerance. The key point is that simplification should be basin-aware and topology-aware. The Polaris design should preserve connectivity and confluences, not just apply independent Douglas-Peucker style simplification to each source feature. The Polaris implementation intentionally differs from the standalone `mpas_land_mesh` simplification algorithm in the mechanics of basin construction. The standalone workflow performs a greedy reverse search for each individual basin: it rebuilds a `pyrivergraph`, updates headwater stream order, merges and defines stream segments, and recursively grows an R-tree of retained flowlines from the outlet upstream. At each step, nearby branches can be kept, rejected, or replaced by a larger branch depending on the order in which the greedy search encounters them. Polaris keeps the same design intent but uses a smaller algorithm tied directly to HydroRIVERS. It uses the `NEXT_DOWN` attributes as the authoritative downstream graph, validates that the retained graph is acyclic, constructs an upstream adjacency map, and processes each terminal root independently. Branch selection is deterministic and local to each confluence: keep the largest upstream branch, then keep other upstream branches that pass the area-ratio test or the distance-tolerance fallback. The retained set is not later mutated by replacing smaller branches with larger nearby ones. This makes the step easier to test, allows basin traversal to run in parallel, and avoids importing the broader `mpas_land_mesh`/`pyflowline` helper stack into Polaris. ### Algorithm Design: Deferred Outlet Reconciliation Date last modified: 2026/05/15 Contributors: - Xylar Asay-Davis - Codex - Claude Outlet and coastline reconciliation is intentionally deferred until after an MPAS base mesh exists. Before that point, snapping HydroRIVERS terminal points to coastline cells and refining outlet mask cells adds complexity without a clear benefit because the base-mesh workflow clips near-coast river geometry and the sizing-field workflow blends land resolution toward ocean resolution near the coastline. The pre-base-mesh river workflow therefore keeps terminal-root provenance on retained river segments through `outlet_hyriv_id`, `outlet_drainage_area`, and `river_network_rank`. Rasterization produces the channel mask needed by the sizing field, and clipped vector products provide the river geometry needed by JIGSAW. Downstream workflows that need outlet locations or catchment-specific files can group segments by `outlet_hyriv_id`, select the largest basins by `river_network_rank`, and perform outlet/coastline reconciliation later. ### Algorithm Design: Standalone River-Network Task Date last modified: 2026/05/15 Contributors: - Xylar Asay-Davis - Codex - Claude The current standalone task design uses one thin wrapper per named unified mesh, `UnifiedRiverNetworkTask`, rather than separate source-level and lat-lon tasks. Each task wraps the full shared river-network step chain for its mesh — coastline steps, simplification, rasterization, clipping, and visualization — so all products can be inspected together without running the full unified mesh workflow. Organizing by mesh name rather than by resolution keeps the task structure consistent with the sizing-field and base-mesh task families and avoids creating standalone tasks for resolutions that are not tied to a specific mesh configuration. ## Implementation ### Implementation: Downstream-Ready River Network Products Date last modified: 2026/05/16 Contributors: - Xylar Asay-Davis - Codex - Claude The file naming and class layout are now concrete. The river implementation is organized under `polaris/tasks/mesh/spherical/unified/river/` as: - `simplify.py` (`SimplifyRiverNetworkStep`) for HydroRIVERS download, unpacking and source-level simplification; - `rasterize.py` (`RasterizeRiverLatLonStep`) for target-grid rasterization of retained river channels; - `clip.py` (`ClipRiverNetworkStep`) for coastline-aware clipping and conditioning of retained river geometry for final mesh generation; - `viz.py` (`VizRiverStep`) for diagnostic plotting and text summaries; - `steps.py` for shared-step setup helpers (`get_unified_mesh_river_steps()`); - `task.py` and `tasks.py` for standalone task wrappers; and - the configuration sections are loaded from the unified mesh config. This implementation prioritizes a clean output contract over carrying forward the standalone workflow's mixed raster conventions or writing default per-catchment GeoJSON files. A single ranked GeoJSON keeps the authoritative simplified network in one file while still allowing scripts to reproduce the standalone workflow's "largest N basins" exports by filtering on `river_network_rank`. The simplification step obtains HydroRIVERS through `add_input_file()` using the public archive URL in the river network config section, with the Polaris database still available as a fallback cache location. The rasterization step then consumes the shared coastline grid for the selected convention and writes a channel-only mask. The `ClipRiverNetworkStep` consumes the simplified network together with the selected coastline product and writes the clipped river geometry consumed by the unified base-mesh step. The coastline-aware clipping in `condition_base_mesh_river_segments()` uses coastline-grid-scale line densification before signed-distance sampling, so a river line with endpoints inside the coastal exclusion band can still retain an inland middle portion. All sampled coordinates are stacked into one array, `_interpolate_signed_distance()` is called once, and the resulting signed-distance values are split back to per-segment slices with `np.split()`. The helper then retains only intervals outside the coastal exclusion band, interpolates boundary crossings, preserves all valid inland fragments, and falls back to unsimplified clipped geometry if Douglas-Peucker simplification would make a piece degenerate. The historical minimum-length option is retained for configuration compatibility but no longer removes valid inland pieces. ### Implementation: Hydrologically Meaningful Simplification Date last modified: 2026/05/16 Contributors: - Xylar Asay-Davis - Codex - Claude The current simplification logic lives in `simplify_river_network_feature_collection()` in `polaris/tasks/mesh/spherical/unified/river/simplify.py`. It uses small focused helpers for canonicalizing segments, validating downstream topology, filtering by drainage area, and traversing retained basin structure from all terminal roots. The traversal is iterative rather than recursive, so very deep main stems do not depend on Python recursion limits. When multiple CPUs are available, terminal basins are distributed across forked worker processes that share the read-only HydroRIVERS segment map, upstream adjacency map, and spatial index. Each worker returns the retained segments for one basin root, and the parent process merges those basin-local results before annotating network rank. After basin traversal, the implementation annotates each retained segment with `outlet_drainage_area` and `river_network_rank`. The rank is 1-based, with rank 1 assigned to the retained terminal basin with the largest outlet drainage area. These properties are preserved by the canonical `RiverSegment` read/write helpers and are carried through coastline conditioning so downstream products do not silently drop the network-selection metadata. The implementation favors a compact Polaris-native reimplementation over a direct migration of [`mpas_land_mesh`](https://github.com/changliao1025/mpas_land_mesh) helper layers. No clear defect emerged from the current unit tests, but this remains an area where additional comparison against real HydroRIVERS output would strengthen confidence. ### Implementation: Deferred Outlet Reconciliation Date last modified: 2026/05/15 Contributors: - Xylar Asay-Davis - Codex - Claude The current implementation removes coastline matching and inland-sink treatment from the pre-base-mesh river products. `river_network.nc` contains `river_channel_mask` only, and the simplified/clipped GeoJSON products keep basin-root provenance and network-selection metadata but no coastline-snapped outlet products. Outlet snapping and catchment-specific outlet products are deferred to downstream workflows that operate after the MPAS base mesh exists. ### Implementation: Standalone River-Network Task Date last modified: 2026/05/11 Contributors: - Xylar Asay-Davis - Codex - Claude The current implementation adds one lightweight task wrapper per named unified mesh in `polaris/tasks/mesh/spherical/unified/river/task.py` and avoids any separate task-specific river-processing code path. `UnifiedRiverNetworkTask` wraps the full shared step chain for its mesh — coastline steps, simplification (`SimplifyRiverNetworkStep`), rasterization (`RasterizeRiverLatLonStep`), clipping (`ClipRiverNetworkStep`), and visualization — so all products can be inspected together. Task registration is handled by `add_river_tasks()` in `tasks.py`, which iterates over `UNIFIED_MESH_NAMES` and registers one task per mesh. ## Testing ### Testing and Validation: Downstream-Ready River Network Products Date last modified: 2026/05/23 Contributors: - Xylar Asay-Davis - Codex - Claude Unit tests in `tests/mesh/spherical/unified/test_river.py` verify the target-grid product contract. Specifically: - `test_build_river_network_dataset_contract_and_channel_mask` verifies that `build_river_network_dataset()` writes the expected channel-only mask variable (`river_channel_mask`) without outlet-matching attributes. - `test_mesh_river_step_factories_use_mesh_subdirs` verifies that `get_unified_mesh_river_steps()` creates `SimplifyRiverNetworkStep`, `RasterizeRiverLatLonStep`, and `ClipRiverNetworkStep` with the expected mesh-specific subdirectories. - `test_mesh_river_step_factories_reuse_shared_configs` verifies step and config identity across multiple calls to `get_unified_mesh_river_steps()`. The coastline-aware conditioning tests in the same file verify `condition_base_mesh_river_segments()`, including local clipping through multiple entries and exits from the coastal exclusion band, densification before signed-distance sampling, preservation of short inland pieces, and safe simplification fallback. The `test_base_mesh.py` tests then verify that `UnifiedBaseMeshStep` converts the prepared `clipped_river_network.geojson` product into JIGSAW line constraints rather than raw river geometry. `build_sizing_field` unit tests consume the target-grid river masks. The full river workflow feeding the sizing-field task and the final base-mesh task has been run on real data for all four named unified meshes. ### Testing and Validation: Hydrologically Meaningful Simplification Date last modified: 2026/05/23 Contributors: - Xylar Asay-Davis - Codex - Claude Unit tests in `tests/mesh/spherical/unified/test_river.py` validate simplification behavior on synthetic networks: - `test_simplify_river_network_traverses_all_terminal_segments` verifies that all retained terminal segments are traversed and that `outlet_hyriv_id`, `outlet_drainage_area`, and `river_network_rank` are preserved as basin-root provenance and network-selection metadata. - `test_simplify_river_network_handles_deep_main_stem` confirms correctness for a 1500-segment chain without Python recursion limits. - `test_simplify_river_network_rejects_next_down_cycles` verifies that cyclic `NEXT_DOWN` graphs are rejected with a clear error. - `test_simplify_river_network_preserves_branch_traversal_order` verifies that multi-branch confluence structure is retained correctly. - `test_convert_hydrorivers_shapefile_to_geojson` verifies shapefile conversion. - `test_unpack_hydrorivers_archive` verifies archive unpacking. - `test_drainage_area_threshold_auto_derived_from_config` and `test_branch_distance_tolerance_auto_derived_from_config` verify that simplification thresholds are derived correctly from mesh configs. The simplification has been exercised on the full global HydroRIVERS dataset for all four named unified meshes, and the resulting river networks were inspected visually and found to reflect the major hydrographic controls at each resolution. ### Testing and Validation: Deferred Outlet Reconciliation Date last modified: 2026/05/16 Contributors: - Xylar Asay-Davis - Codex - Claude Unit tests in `tests/mesh/spherical/unified/test_river.py` cover the channel-only pre-base-mesh products: - `test_build_river_network_dataset_contract_and_channel_mask` verifies the channel-only raster contract. - `test_build_river_network_dataset_applies_physical_channel_buffer` verifies the physical buffer applied to rasterized channel cells. - `test_condition_base_mesh_river_segments_clips_then_simplifies`, `test_condition_base_mesh_river_segments_keeps_short_fragments`, `test_condition_base_mesh_river_segments_keeps_reentry_pieces`, `test_condition_base_mesh_river_segments_densifies_before_clipping`, and `test_condition_base_mesh_river_segments_simplify_fallback_keeps_geometry` verify the coastline clipping applied before base-mesh conditioning. The visualization step writes `river_network_overlay.png`, `rasterized_river_network.png`, and `debug_summary.txt`, making the simplified, clipped, and rasterized channel products straightforward to inspect in task runs. ### Testing and Validation: Standalone River-Network Task Date last modified: 2026/05/23 Contributors: - Xylar Asay-Davis - Codex - Claude Unit tests in `tests/mesh/spherical/unified/test_river.py` verify the standalone task structure: - `test_add_river_tasks_registers_mesh_tasks` verifies that `add_river_tasks()` registers one `UnifiedRiverNetworkTask` per name in `UNIFIED_MESH_NAMES`, that each task subdirectory is `spherical/unified//river/task`, and that each task name is `river_network__task`. - `test_mesh_river_step_factories_use_mesh_subdirs` verifies mesh-specific subdirectories for the simplify, rasterize, and clip steps. - `test_mesh_river_step_factories_reuse_shared_configs` verifies that step and config instances are shared across multiple `get_unified_mesh_river_steps()` calls for the same mesh. Standalone river tasks have been run for all four named unified meshes, showing the expected rasterized river networks and visualization overlays at each resolution. The full end-to-end workflow through sizing-field construction, base-mesh generation, topography remap, and mesh culling has been completed for all four meshes. The resulting culled ocean and land meshes were visually verified to be consistent with expectations.