Calling python code from EAMxx atmosphere processes
Requirements
In order to call python code from an EAMxx atmosphere process,
EAMxx must be build with the CMake option EAMXX_ENABLE_PYTHON=ON
,
and the CMake variable Python_EXECUTABLE
must point to a python3
executable, with python version >= 3.9. Additionally, the python package
pybind11
must be installed (e.g., via pip or conda).
If EAMXX_ENABLE_PYTHON=OFF
, none of the code that is needed to call
python from EAMxx will be compiled.
Usage
If python support is enabled, every atmosphere process stores
data structures that can hold python-compatible arrays and modules.
During construction, if the input parameter list contains a non-trivial
entry for the key py_module_name
, EAMxx will automatically set up
these data structures. In particular, EAMxx will
- create python-compatible arrays for each of the input/output/internal
fields that are registered in the class. These are stored in two maps:
m_py_fields_dev
andm_py_fields_host
, which store python-compatible arrays for the device and host views of the Field, respectively. The maps are in fact nested maps, so that the python-compatible array for field X on grid Y can be retrieved viam_py_fields_host[Y][X]
. - load the python module provided via parameter list, so that its interfaces
can later be called during init/run phases. The module is then stored in
the local member
m_py_module
. If the module is in a non-standard path, the parameter list entrypy_module_path
can be used to specify its path, which will be added to python's search path before loading the module.
Due to implementation details in the pybind11 library, and to avoid compiler warnings,
all the python-compatible data structures are stored wrapped inside std::any
objects.
As such, the need to be properly casted to the correct underlying type before being used.
In particular, the fields and module can be casted as follows:
auto& f = std::any_cast<pybind11::array&>(m_py_fields_host[grid_name][fname]);
auto& pymod = std::any_cast<pybind11::module&>(m_py_module);
Once the module is available, a function from it can be called using the attr
method of
the module. For instance, if the module had a function run
that takes 2 arrays and a double
(in this order), it can be invoked via
pymod.attr("run")(f1,f2,my_double);
where f1
and f2
are of type pybind11::array
(e.g., casted from objects in m_py_fields_host
).
Example
We provided an example of how to use this feature in eamxx_cld_fraction_process_interface.cpp
,
which is a very small and simple atmosphere process. We paste here the code, which shows how
to support both C++ and python implemenation in the same cpp file
#ifdef EAMXX_HAS_PYTHON
if (m_py_module.has_value()) {
// For now, we run Python code only on CPU
const auto& py_fields = m_py_fields_host.at(m_grid->name());
const auto& py_qi = std::any_cast<const py::array&>(py_fields.at("qi"));
const auto& py_liq_cld_frac = std::any_cast<const py::array&>(py_fields.at("cldfrac_liq"));
const auto& py_ice_cld_frac = std::any_cast<const py::array&>(py_fields.at("cldfrac_ice"));
const auto& py_tot_cld_frac = std::any_cast<const py::array&>(py_fields.at("cldfrac_tot"));
const auto& py_ice_cld_frac_4out = std::any_cast<const py::array&>(py_fields.at("cldfrac_ice_for_analysis"));
const auto& py_tot_cld_frac_4out = std::any_cast<const py::array&>(py_fields.at("cldfrac_tot_for_analysis"));
// Sync input to host
liq_cld_frac.sync_to_host();
const auto& py_module = std::any_cast<const py::module&>(m_py_module);
double ice_threshold = m_params.get<double>("ice_cloud_threshold");
double ice_4out_threshold = m_params.get<double>("ice_cloud_for_analysis_threshold");
py_module.attr("main")(ice_threshold,ice_4out_threshold,py_qi,py_liq_cld_frac,py_ice_cld_frac,py_tot_cld_frac,py_ice_cld_frac_4out,py_tot_cld_frac_4out);
// Sync outputs to dev
qi.sync_to_dev();
liq_cld_frac.sync_to_dev();
ice_cld_frac.sync_to_dev();
tot_cld_frac.sync_to_dev();
ice_cld_frac_4out.sync_to_dev();
tot_cld_frac_4out.sync_to_dev();
} else
#endif
{
auto qi_v = qi.get_view<const Pack**>();
auto liq_cld_frac_v = liq_cld_frac.get_view<const Pack**>();
auto ice_cld_frac_v = ice_cld_frac.get_view<Pack**>();
auto tot_cld_frac_v = tot_cld_frac.get_view<Pack**>();
auto ice_cld_frac_4out_v = ice_cld_frac_4out.get_view<Pack**>();
auto tot_cld_frac_4out_v = tot_cld_frac_4out.get_view<Pack**>();
CldFractionFunc::main(m_num_cols,m_num_levs,m_icecloud_threshold,m_icecloud_for_analysis_threshold,
qi_v,liq_cld_frac_v,ice_cld_frac_v,tot_cld_frac_v,ice_cld_frac_4out_v,tot_cld_frac_4out_v);
}
A few observations:
m_py_module.has_value()
is a good way to check if thestd::any
object is storing anything or it's empty. If empty, it means that the user did not specify thepy_module_name
input option. In this case, we interpret this as "proceed with the C++ implementation", but of course, another process may only offer a python implementation, in which case it would make sense to error out if the check fails.- the namespace alias
py = pybind11
was used in this implementation. This is a common (and sometimes recommended) practice. - when casting to pybind11 data structures, we used const references, for both inputs and outputs. The reason
for using a reference is to avoid copy construction of pybind11 structures (even though they are usually
lightweight). The const qualifier is not really important, as python has no corresponding concept, and the
code would have been perfectly fine (and working the same way) without
const
. - when passing host arrays to python, keep in mind that EAMxx only requires that device views be kept up to date by atmosphere processes. Hence, you must take care of syncing to host all inputs before calling the python interfaces, as well as syncing to device the outputs upon return.
The python implemenetation of the CldFraction process is provided in cld_fraction.py
, in the same folder
as the process interface.