Overlapping Groups

This post is motivated by the problem of computing the Meridional Overturning Circulation. One of the steps is a binned average over latitude, over regions of the World Ocean. Commonly we want to average globally, as well as over the Atlantic, and the Indo-Pacific. Generally group-by problems involve non-overlapping groups. In this example, the “global” group overlaps with the “Indo-Pacific” and “Atlantic” groups. Below we consider a simplified version of this problem.

Consider the following labels:

import numpy as np
import xarray as xr

from flox.xarray import xarray_reduce

labels = xr.DataArray(
    [1, 2, 3, 1, 2, 3, 0, 0, 0],
    dims="x",
    name="label",
)
labels
<xarray.DataArray 'label' (x: 9)> Size: 72B
array([1, 2, 3, 1, 2, 3, 0, 0, 0])
Dimensions without coordinates: x

These labels are non-overlapping. So when we reduce this data array over those labels along x

<xarray.DataArray 'label' (x: 9)> Size: 72B
array([1, 1, 1, 1, 1, 1, 1, 1, 1])
Dimensions without coordinates: x

we get (note the reduction over x is implicit here):

xarray_reduce(da, labels, func="sum")
<xarray.DataArray 'label' (label: 4)> Size: 32B
array([3, 2, 2, 2])
Coordinates:
  * label    (label) int64 32B 0 1 2 3

Now let’s also calculate the sum where labels is either 1 or 2. We could easily compute this using the grouped result but here we use this simple example for illustration. The trick is to add a new dimension with new labels (here 4) in the appropriate locations.

# assign 4 where label == 1 or 2, and -1 otherwise
newlabels = xr.where(labels.isin([1, 2]), 4, -1)

# concatenate along a new dimension y;
# note y is not present on da
expanded = xr.concat([labels, newlabels], dim="y")
expanded
<xarray.DataArray 'label' (y: 2, x: 9)> Size: 144B
array([[ 1,  2,  3,  1,  2,  3,  0,  0,  0],
       [ 4,  4, -1,  4,  4, -1, -1, -1, -1]])
Dimensions without coordinates: y, x

Now we reduce over x and the new dimension y (again implicitly) to get the appropriate sum under label=4 (and label=-1). We can discard the value accumulated under label=-1 later.

xarray_reduce(da, expanded, func="sum")
<xarray.DataArray 'label' (label: 6)> Size: 48B
array([5, 3, 2, 2, 2, 4])
Coordinates:
  * label    (label) int64 48B -1 0 1 2 3 4

This way we compute all the reductions we need, in a single pass over the data.

This technique generalizes to more complicated aggregations. The trick is to

  • generate appropriate labels

  • concatenate these new labels along a new dimension (y) absent on the object being reduced (da), and

  • reduce over that new dimension in addition to any others.