Engines¶
flox
provides multiple options, using the engine
kwarg, for computing the core GroupBy reduction on numpy or other array types other than dask.
engine="numpy"
wrapsnumpy_groupies.aggregate_numpy
. This uses indexing tricks and functions likenp.bincount
, or the ufunc.at
methods (.e.gnp.maximum.at
) to provided reasonably performant aggregations.engine="numba"
wrapsnumpy_groupies.aggregate_numba
. This usesnumba
kernels for the core aggregation.engine="flox"
uses theufunc.reduceat
method after first argsorting the array so that all group members occur sequentially. This was copied from a gist by Stephan Hoyerengine="numbagg"
uses the reductions available innumbagg.grouped
from the numbagg project.
See Duck Array Support for more details.
Tradeoffs¶
For the common case of reducing a nD array by a 1D array of group labels (e.g. groupby("time.month")
), engine="numbagg"
is almost always faster, and engine="flox"
can be faster.
The reason is that numpy_groupies
converts all groupby problems to a 1D problem, this can involve some overhead.
It is possible to optimize this a bit in flox
or numpy_groupies
, but the work has not been done yet.
The advantage of engine="numpy"
is that it tends to work for more array types, since it appears to be more common to implement np.bincount
, and not np.add.reduceat
.
Tip
One other potential engine we could add is datashader
.
Contributions or discussion is very welcome!