(dev-adding-new-machine)= # Adding a New Machine to Mache Adding an E3SM-known machine to mache requires adding a new config file, as well as updating the list of machines in `discover.py`. :::{note} Only machines that are included in mache's [machine config list](https://github.com/E3SM-Project/mache/blob/main/mache/cime_machine_config/config_machines.xml) can be added to mache. This list is a *copy* of the [E3SM cime machine config list](https://github.com/E3SM-Project/E3SM/blob/master/cime_config/machines/config_machines.xml) which we try to keep up-to-date. If you wish to add a machine that is not included in this list, you must contact the E3SM-Project developers to add your machine. ::: (dev-new-config-file)= ## Adding a new config file Adding a new config file is usually straightforward if you follow the format of an existing config file. For machines with a known E3SM inputdata location, also add an `[inputdata]` section with: - `base_path`: base directory for the shared E3SM inputdata tree When the machine also appears in `mache/cime_machine_config/config_machines.xml`, this value should match that machine's `DIN_LOC_ROOT` entry. ### Parallel execution settings Machine config files now include parallel-resource settings that are consumed by `mache.parallel`. At minimum, each machine should define a `[parallel]` section with: - `system`: one of `slurm`, `pbs`, `single_node`, or `login` - `parallel_executable`: launcher command (for example, `srun --label` or `mpiexec --label`) Depending on the parallel system, the following options are typically required: - `cores_per_node` - `gpus_per_node` (if GPUs are available) - `max_mpi_tasks_per_node` - `cpus_per_task_flag` (primarily for PBS launchers) - `cpu_bind`, `gpu_bind`, `mem_bind`, `placement` (optional launcher tuning) - `login_cores`, `login_gpus` (for the `login` system) For machines with hyperthreading, mache's convention is that `cores_per_node` should normally be the physical-core count, not the hardware-thread count. Likewise, `max_mpi_tasks_per_node` should normally reflect the default non-hyperthreaded layout used by E3SM and most downstream software, and `cpu_bind = cores` is the preferred default when supported by the launcher. Downstream projects that intentionally want hyperthreading can override these settings in their own config to use hardware-thread counts and thread binding. In other words, these config fields are the hyperthreading controls rather than a dedicated boolean option. Compiler-specific overrides can be provided in optional `[parallel.]` sections, e.g. `[parallel.gnu]`. For machines with scheduler-target policy limits, you can also define optional sections for queue- or partition-based schedulers: - `[queue.]` sections corresponding to entries in `parallel.queues` - `[partition.]` sections corresponding to entries in `parallel.partitions` - `[qos.]` sections corresponding to entries in `parallel.qos` Supported keys are: - `min_nodes`: minimum node count for this scheduler target - `max_nodes`: maximum node count for this scheduler target (leave unset for no upper bound) - `max_wallclock`: maximum allowed wall-clock time (for example, `01:00:00`) Downstream software can query these values with `MachineInfo.get_queue_specs()`, `MachineInfo.get_partition_specs()`, `MachineInfo.get_qos_specs()` or `MachineInfo.get_scheduler_specs()`. These options are used to: - detect available resources on the current allocation, - construct launcher commands via `mache.parallel`, and - enforce machine-specific limits like max MPI tasks per node. (dev-discover-new-machine)= ## Adding the new machine to `discover.py` You will need to amend the list of machine names in `discover.py` so that mache can identify the new machine via its hostname. This process is typically done using a regular expression, which is often possible whenever the machine's hostname follows a standardized format. For example, we can identify known machines from hostnames with the following regular expressions: ```python '^chr-\d{4}' # Chrysalis compute nodes with hostnames chr-0000 to chr-9999 '^compy' # Compy nodes with hostname compy '^dane\d{1,4}' # Dane nodes with hostnames dane0 to dane9999 ``` In some cases, the hostname assigned to a machine is too generic to differentiate it from other machines. In these cases, we must identify the machine by its environment variables. However, this is *not* the recommended procedure and should only be done as a last resort. For example, we identify `frontier` by its `LMOD_SYSTEM_NAME` environment variable: ```python if machine is None and 'LMOD_SYSTEM_NAME' in os.environ: hostname = os.environ['LMOD_SYSTEM_NAME'] if hostname == 'frontier': # frontier's hostname is too generic to detect, so relying on # LMOD_SYSTEM_NAME machine = 'frontier' ``` :::{note} Identifying the machine by environment variables is **not recommended** unless absolutely necessary. :::