# Shared steps date: 2023/08/18 Contributors: Carolyn Begeman, Xylar Asay-Davis ## Summary The capability designed here is the ability to share steps across tasks. In this design document, "shared steps" refers to any step which may be used by multiple tasks that are available in polaris. The main motivation behind this capability is the computational expense of running steps that could shared across tasks multiple times. In order to reflect the fact that steps are shared to the user, we present a new design for the working directory structure. The design is successful insofar as it guarantees that shared steps are run once per slurm job and that the role of shared steps is clear to users. ## Requirements ### Requirement: Shared steps are run once. Shared steps should be run once per invocation of `polaris serial` or `polaris run`. ### Requirement: Shared steps are run before steps that depend on their output. ### Requirement: Shared steps are not daughters of a task A shared step's class attributes do not include any task-related information such as a task it belongs to. ### Requirement: Working directory structure is intuitive. Shared step directories should be located at the highest level in the working directory structure where all tasks that use that step are run at or below that level. ### Requirement: Working directory step paths are easily discoverable by users. There should be a way to list the paths within the work directory of all steps in each task. There should also be a way for a user to find the steps in a task from the task's work directory. ### Requirement: The output of shared steps may be used by multiple tasks. A step may only be shared across multiple tasks if its output would be identical for each task. ### Requirement: tasks do not rely on outputs from steps in other tasks All tasks are self-contained and rely only on either shared steps or steps they contain. ## Implementation ### Implementation: Shared steps are set up once. As before, setup of either a list of tasks or a suite proceeds by iterating through the tasks and then through the steps in each task. An attribute `setup_complete` has been added to `Step` and is initialized to `False`. In the `setup_task()` function, setup is skipped for any steps where `step.setup_complete == True`, and this attribute is set to `True` when a step has been completed. ### Implementation: Shared steps are run before steps that depend on their output. Requirement is already satisfied as part of task parallelism design, which makes use of file dependencies. When running in task-serial mode, the implementation will be to make sure shared steps are added to the dictionary of steps before other steps that rely on them. ### Implementation: Shared steps are not daughters of a task The `task` attribute and constructor argument of the `Step` class has been replaced by the `component` attribute. The step's `subdir` attribute is now relative to the component's work directory, rather than a parent task's work directory. ### Implementation: Working directory structure is intuitive. The only shared steps that reside inside of a task's work directory are in situations where another task also lies within the task's work directory. The only such tasks at the moment are the `cosine_bell/with_viz` tasks, which reside inside the `cosine_bell` tasks. The `cosine_bell/with_viz` tasks share all of the steps of the `cosine_bell` (base-mesh, init and forward for each resolution, and a single analysis step) and also add remapping and visualization steps that are not shared with any other tasks: `cosine_bell`: * ocean * spherical * qu * base_mesh * 60km * 90km * 120km * 150km * 180km * 210km * 240km * cosine_bell * init * 60km * 90km * 120km * 150km * 180km * 210km * 240km * forward * 60km * 90km * 120km * 150km * 180km * 210km * 240km * analysis `cosine_bell/with_viz`: * ocean * spherical * qu * base_mesh * 60km * 90km * 120km * 150km * 180km * 210km * 240km * cosine_bell * init * 60km * 90km * 120km * 150km * 180km * 210km * 240km * forward * 60km * 90km * 120km * 150km * 180km * 210km * 240km * analysis * with_viz * map * 60km * 90km * 120km * 150km * 180km * 210km * 240km * viz * 60km * 90km * 120km * 150km * 180km * 210km * 240km ### Implementation: Working directory step paths are easily discoverable by users. This is implemented in two ways. First, `polaris list --verbose` now lists the work-directory relative path of steps, rather than their path relative to the task's work directory: ``` $ polaris list --verbose ... 10: path: ocean/spherical/qu/cosine_bell/with_viz name: cosine_bell component: ocean subdir: spherical/qu/cosine_bell/with_viz steps: - qu_base_mesh_60km: ocean/spherical/qu/base_mesh/60km - qu_init_60km: ocean/spherical/qu/cosine_bell/init/60km - qu_forward_60km: ocean/spherical/qu/cosine_bell/forward/60km - qu_map_60km: ocean/spherical/qu/cosine_bell/with_viz/map/60km - qu_viz_60km: ocean/spherical/qu/cosine_bell/with_viz/viz/60km - qu_base_mesh_90km: ocean/spherical/qu/base_mesh/90km - qu_init_90km: ocean/spherical/qu/cosine_bell/init/90km - qu_forward_90km: ocean/spherical/qu/cosine_bell/forward/90km - qu_map_90km: ocean/spherical/qu/cosine_bell/with_viz/map/90km - qu_viz_90km: ocean/spherical/qu/cosine_bell/with_viz/viz/90km - qu_base_mesh_120km: ocean/spherical/qu/base_mesh/120km - qu_init_120km: ocean/spherical/qu/cosine_bell/init/120km - qu_forward_120km: ocean/spherical/qu/cosine_bell/forward/120km - qu_map_120km: ocean/spherical/qu/cosine_bell/with_viz/map/120km - qu_viz_120km: ocean/spherical/qu/cosine_bell/with_viz/viz/120km - qu_base_mesh_150km: ocean/spherical/qu/base_mesh/150km - qu_init_150km: ocean/spherical/qu/cosine_bell/init/150km - qu_forward_150km: ocean/spherical/qu/cosine_bell/forward/150km - qu_map_150km: ocean/spherical/qu/cosine_bell/with_viz/map/150km - qu_viz_150km: ocean/spherical/qu/cosine_bell/with_viz/viz/150km - qu_base_mesh_180km: ocean/spherical/qu/base_mesh/180km - qu_init_180km: ocean/spherical/qu/cosine_bell/init/180km - qu_forward_180km: ocean/spherical/qu/cosine_bell/forward/180km - qu_map_180km: ocean/spherical/qu/cosine_bell/with_viz/map/180km - qu_viz_180km: ocean/spherical/qu/cosine_bell/with_viz/viz/180km - qu_base_mesh_210km: ocean/spherical/qu/base_mesh/210km - qu_init_210km: ocean/spherical/qu/cosine_bell/init/210km - qu_forward_210km: ocean/spherical/qu/cosine_bell/forward/210km - qu_map_210km: ocean/spherical/qu/cosine_bell/with_viz/map/210km - qu_viz_210km: ocean/spherical/qu/cosine_bell/with_viz/viz/210km - qu_base_mesh_240km: ocean/spherical/qu/base_mesh/240km - qu_init_240km: ocean/spherical/qu/cosine_bell/init/240km - qu_forward_240km: ocean/spherical/qu/cosine_bell/forward/240km - qu_map_240km: ocean/spherical/qu/cosine_bell/with_viz/map/240km - qu_viz_240km: ocean/spherical/qu/cosine_bell/with_viz/viz/240km - analysis: ocean/spherical/qu/cosine_bell/analysis ``` Second, we add symlinks within the task to the shared step. In what follows, the subdirectories in bold are shared steps that reside elsewhere up the directory tree: each resolution in the `base_mesh`, `init` and `forward`, and also `analysis`. `cosine_bell/with_viz`: * ocean * spherical * qu * cosine_bell * with_viz * base_mesh * **60km** * **90km** * **120km** * **150km** * **180km** * **210km** * **240km** * init * **60km** * **90km** * **120km** * **150km** * **180km** * **210km** * **240km** * forward * **60km** * **90km** * **120km** * **150km** * **180km** * **210km** * **240km** * map * 60km * 90km * 120km * 150km * 180km * 210km * 240km * viz * 60km * 90km * 120km * 150km * 180km * 210km * 240km * **analysis** Thus, a structure similar to what we had before shared steps is maintained locally, which should make debugging easier. ### Implementation: The output of shared steps may be used by multiple tasks. Task steps that use the output of shared steps will make use of symbolic links as before. ### Implementation: tasks do not rely on outputs from steps in other tasks There were not any polaris tasks that relied on outputs from other tasks even before the implementation of shared steps. There are tasks in Compass, though, such as global ocean `mesh`, `init` and `dynamic_adjustment`, that do allow outputs from one task to be inputs of another. As these are ported to Polaris, we will make sure they use shared steps instead. ## Testing ### Testing And Validation: Shared steps are run once. Output from running a series of tasks or a suite indicates when shared steps are skipped because they already ran (`already completed`): ``` ocean/spherical/icos/cosine_bell * step: icos_base_mesh_60km execution: SUCCESS runtime: 0:01:00 * step: icos_init_60km execution: SUCCESS runtime: 0:00:00 * step: icos_forward_60km execution: SUCCESS runtime: 0:00:38 ... * step: analysis execution: SUCCESS runtime: 0:00:02 task execution: SUCCESS task runtime: 0:02:59 ocean/spherical/icos/cosine_bell/with_viz * step: icos_base_mesh_60km already completed * step: icos_init_60km already completed * step: icos_forward_60km already completed * step: icos_map_60km execution: SUCCESS runtime: 0:00:20 * step: icos_viz_60km execution: SUCCESS runtime: 0:00:06 ... * step: analysis already completed task execution: SUCCESS task runtime: 0:03:23 ``` ### Testing And Validation: Shared steps are run before steps that depend on their output. As before, steps are added to tasks in the order they are to be run, ensuring that shared steps run before steps that require their output when running in task serial (`polaris serial`). Task parallelism already has mechanisms to prevent steps from running before their dependencies are available, and this is not expected to be affected by shared steps. However, no testing with task parallelism will be performed at this time. ### Testing And Validation: Shared steps are not daughters of a task Steps run successfully even after we have removed the `task` attribute from them, indicating that they no longer rely on information about a task they formerly belonged to. ### Testing And Validation: Working directory structure is intuitive. The intuitive work structure will need to be maintained by developers as new tasks and steps are added, as this is not enforced by the framework. The proposed implementation ensures that shared steps either reside close to the root of the directory structure from the tasks that use them or that they live inside of the tasks, which we have deemed an intuitive structure. ### Testing And Validation: Working directory step paths are easily discoverable by users. Between `polaris list --verbose` and the local symlinks to shared steps within each task, we think the shared steps will be discoverable by users and developers. ### Testing And Validation: The output of shared steps may be used by multiple tasks. We have implemented shared steps for base meshes, initial conditions and forward runs, and shown that multiple tasks can make use of their output. ### Testing And Validation: tasks do not rely on outputs from steps in other tasks This is not enforced, it will simply need to be maintained as the preferred convention for future development. Currently, all tasks can be run independently and do not rely on any other tasks.