Overlapping Groups¶
This post is motivated by the problem of computing the Meridional Overturning Circulation. One of the steps is a binned average over latitude, over regions of the World Ocean. Commonly we want to average globally, as well as over the Atlantic, and the Indo-Pacific. Generally group-by problems involve non-overlapping groups. In this example, the “global” group overlaps with the “Indo-Pacific” and “Atlantic” groups. Below we consider a simplified version of this problem.
Consider the following labels:
import numpy as np
import xarray as xr
from flox.xarray import xarray_reduce
labels = xr.DataArray(
[1, 2, 3, 1, 2, 3, 0, 0, 0],
dims="x",
name="label",
)
labels
<xarray.DataArray 'label' (x: 9)> Size: 72B array([1, 2, 3, 1, 2, 3, 0, 0, 0]) Dimensions without coordinates: x
These labels are non-overlapping. So when we reduce this data array over those labels along x
da = xr.ones_like(labels)
da
<xarray.DataArray 'label' (x: 9)> Size: 72B array([1, 1, 1, 1, 1, 1, 1, 1, 1]) Dimensions without coordinates: x
we get (note the reduction over x
is implicit here):
xarray_reduce(da, labels, func="sum")
<xarray.DataArray 'label' (label: 4)> Size: 32B array([3, 2, 2, 2]) Coordinates: * label (label) int64 32B 0 1 2 3
Now let’s also calculate the sum
where labels
is either 1
or 2
.
We could easily compute this using the grouped result but here we use this simple example for illustration.
The trick is to add a new dimension with new labels (here 4
) in the appropriate locations.
<xarray.DataArray 'label' (y: 2, x: 9)> Size: 144B array([[ 1, 2, 3, 1, 2, 3, 0, 0, 0], [ 4, 4, -1, 4, 4, -1, -1, -1, -1]]) Dimensions without coordinates: y, x
Now we reduce over x
and the new dimension y
(again implicitly) to get the appropriate sum under
label=4
(and label=-1
). We can discard the value accumulated under label=-1
later.
xarray_reduce(da, expanded, func="sum")
<xarray.DataArray 'label' (label: 6)> Size: 48B array([5, 3, 2, 2, 2, 4]) Coordinates: * label (label) int64 48B -1 0 1 2 3 4
This way we compute all the reductions we need, in a single pass over the data.
This technique generalizes to more complicated aggregations. The trick is to
generate appropriate labels
concatenate these new labels along a new dimension (
y
) absent on the object being reduced (da
), andreduce over that new dimension in addition to any others.