Custom Aggregations#

This notebook is motivated by a post on the Pangeo discourse forum.

Even better would be a command that lets me simply do the following.
A = da.groupby(['lon_bins', 'lat_bins']).mode()

This notebook will describe how to accomplish this using a custom Aggregation.

Tip

flox now supports mode, nanmode, quantile, nanquantile, median, nanmedian using exactly the same approach as shown below

import numpy as np
import numpy_groupies as npg
import xarray as xr

import flox.xarray
from flox import Aggregation
from flox.aggregations import mean

# define latitude and longitude bins
binsize = 1.0  # 1°x1° bins
lon_min, lon_max, lat_min, lat_max = [-180, 180, -65, 65]
lon_bins = np.arange(lon_min, lon_max, binsize)
lat_bins = np.arange(lat_min, lat_max, binsize)

size = 28397


da = xr.DataArray(
    np.random.randint(0, 7, size=size),
    dims="profile",
    coords={
        "lat": (
            "profile",
            (np.random.random(size) - 0.5) * (lat_max - lat_min),
        ),
        "lon": (
            "profile",
            (np.random.random(size) - 0.5) * (lon_max - lon_min),
        ),
    },
    name="label",
)
da

<xarray.DataArray 'label' (profile: 28397)> Size: 227kB
array([2, 5, 2, ..., 2, 6, 3])
Coordinates:
    lat      (profile) float64 227kB 31.11 -1.784 32.01 ... 4.594 -58.08 -60.65
    lon      (profile) float64 227kB 65.21 -126.7 -168.1 ... 172.5 134.5 80.12
Dimensions without coordinates: profile

A built-in reduction#

First a simple example of lat-lon binning using a built-in reduction: mean

binned_mean = flox.xarray.xarray_reduce(
    da,
    da.lat,
    da.lon,
    func="mean",  # built-in
    expected_groups=(lat_bins, lon_bins),
    isbin=(True, True),
)
binned_mean.plot()

<matplotlib.collections.QuadMesh at 0x7fae1a9a7fe0>

../_images/7f9ef92a755c931599ddb78d954a3769fd9486425bdf03432caf48f969093428.png

Aggregations#

flox knows how to interperet func="mean" because it’s been implemented in aggregations.py as an Aggregation

An Aggregation is a blueprint for computing an aggregation, with both numpy and dask data.

print(type(mean))
mean

<class 'flox.aggregations.Aggregation'>

'mean', fill: dict_values([<NA>, (0, 0)]), dtype: None
chunk: ('sum', 'nanlen')
combine: ('sum', 'sum')
finalize: <function _mean_finalize at 0x7fae1beb4180>
min_count: 0

Here’s how the mean Aggregation is created

mean = Aggregation(
    name="mean",

    # strings in the following are built-in grouped reductions
    # implemented by the underlying  "engine": flox or numpy_groupies or numbagg

    # for pure  numpy inputs
    numpy="mean",

    # The next are for dask inputs and describe how to reduce
    # the data in parallel
    chunk=("sum", "nanlen"), # first compute these blockwise : (grouped_sum, grouped_count)
    combine=("sum", "sum"), #  reduce intermediate results (sum the sums, sum the counts)
    finalize=lambda sum_, count: sum_ / count, # final mean value (divide sum by count)

    fill_value=(0, 0),  # fill value for intermediate  sums and counts when groups have no members
    dtypes=(None, np.intp),  # optional dtypes for intermediates
    final_dtype=np.floating,  # final dtype for output
)

Custom Aggregations#

A built-in reduction#

Aggregations#

Defining a custom aggregation#