Overlapping Groups¶
This post is motivated by the problem of computing the Meridional Overturning Circulation. One of the steps is a binned average over latitude, over regions of the World Ocean. Commonly we want to average globally, as well as over the Atlantic, and the Indo-Pacific. Generally group-by problems involve non-overlapping groups. In this example, the “global” group overlaps with the “Indo-Pacific” and “Atlantic” groups. Below we consider a simplified version of this problem.
Consider the following labels:
import numpy as np
import xarray as xr
from flox.xarray import xarray_reduce
labels = xr.DataArray(
[1, 2, 3, 1, 2, 3, 0, 0, 0],
dims="x",
name="label",
)
labels
<xarray.DataArray 'label' (x: 9)> Size: 72B array([1, 2, 3, 1, 2, 3, 0, 0, 0]) Dimensions without coordinates: x
These labels are non-overlapping. So when we reduce this data array over those labels along x
da = xr.ones_like(labels)
da
<xarray.DataArray 'label' (x: 9)> Size: 72B array([1, 1, 1, 1, 1, 1, 1, 1, 1]) Dimensions without coordinates: x
we get (note the reduction over x is implicit here):
xarray_reduce(da, labels, func="sum")
<xarray.DataArray 'label' (label: 4)> Size: 32B array([3, 2, 2, 2]) Coordinates: * label (label) int64 32B 0 1 2 3
Now let’s also calculate the sum where labels is either 1 or 2.
We could easily compute this using the grouped result but here we use this simple example for illustration.
The trick is to add a new dimension with new labels (here 4) in the appropriate locations.
<xarray.DataArray 'label' (y: 2, x: 9)> Size: 144B
array([[ 1, 2, 3, 1, 2, 3, 0, 0, 0],
[ 4, 4, -1, 4, 4, -1, -1, -1, -1]])
Dimensions without coordinates: y, xNow we reduce over x and the new dimension y (again implicitly) to get the appropriate sum under
label=4 (and label=-1). We can discard the value accumulated under label=-1 later.
xarray_reduce(da, expanded, func="sum")
<xarray.DataArray 'label' (label: 6)> Size: 48B array([5, 3, 2, 2, 2, 4]) Coordinates: * label (label) int64 48B -1 0 1 2 3 4
This way we compute all the reductions we need, in a single pass over the data.
This technique generalizes to more complicated aggregations. The trick is to
generate appropriate labels
concatenate these new labels along a new dimension (
y) absent on the object being reduced (da), andreduce over that new dimension in addition to any others.