flox.groupby_reduce¶
- flox.groupby_reduce(array, *by, func, expected_groups=None, sort=True, isbin=False, axis=None, fill_value=None, dtype=None, min_count=None, method=None, engine=None, reindex=None, finalize_kwargs=None)[source]¶
GroupBy reductions using tree reductions for dask.array
- Parameters:
- arrayndarray or DaskArray
Array to be reduced, possibly nD
- *byndarray or DaskArray
Array of labels to group over. Must be aligned with
arrayso thatarray.shape[-by.ndim :] == by.shapeor any disagreements in that equality check are for dimensions of size 1 inby.- func{“all”, “any”, “count”, “sum”, “nansum”, “mean”, “nanmean”, “max”, “nanmax”, “min”, “nanmin”, “argmax”, “nanargmax”, “argmin”, “nanargmin”, “quantile”, “nanquantile”, “median”, “nanmedian”, “mode”, “nanmode”, “first”, “nanfirst”, “last”, “nanlast”} or Aggregation
Single function name or an Aggregation instance
- expected_groups(optional) Sequence
Expected unique labels.
- isbinbool, optional
Are
expected_groupsbin edges?- sortbool, optional
Whether groups should be returned in sorted order. Only applies for dask reductions when
methodis not"map-reduce". For"map-reduce", the groups are always sorted.- axisNone or int or Sequence[int], optional
If None, reduce across all dimensions of
by, else reduce across corresponding axes of array. Negative integers are normalized usingarray.ndim.- fill_valueAny
Value to assign when a label in
expected_groupsis not present.- dtypedata-type , optional
DType for the output. Can be anything that is accepted by
np.dtype.- min_countint, default: None
The required number of valid values to perform the operation. If fewer than
min_countnon-NA values are present the result will be NA. Only used ifskipnais set to True or defaults to True for the array’s dtype.- method{“map-reduce”, “blockwise”, “cohorts”}, optional
Note that this arg is chosen by default using heuristics. Strategy for reduction of dask arrays only.
"map-reduce": First apply the reduction blockwise onarray, then combine a few newighbouring blocks, apply the reduction. Continue until finalizing. Usually,funcwill need to be anAggregationinstance for this method to work. Common aggregations are implemented."blockwise": Only reduce using blockwise and avoid aggregating blocks together. Useful for resampling-style reductions where group members are always together. Ifbyis 1D,arrayis automatically rechunked so that chunk boundaries line up with group boundaries i.e. each block contains all members of any group present in that block. For nDby, you must make sure that all members of a group are present in a single block."cohorts": Finds group labels that tend to occur together (“cohorts”), indexes out cohorts and reduces that subset using “map-reduce”, repeat for all cohorts. This works well for many time groupings where the group labels repeat at regular intervals like ‘hour’, ‘month’, dayofyear’ etc. Optimize chunkingarrayfor this method by first rechunking usingrechunk_for_cohorts(for 1Dbyonly).
- engine{“flox”, “numpy”, “numba”, “numbagg”}, optional
- Algorithm to compute the groupby reduction on non-dask arrays and on each dask chunk:
"numpy": Use the vectorized implementations innumpy_groupies.aggregate_numpy. This is the default choice because it works for most array types."flox": Use an internal implementation where the data is sorted so that all members of a group occur sequentially, and then numpy.ufunc.reduceat is to used for the reduction. This will fall back tonumpy_groupies.aggregate_numpyfor a reduction that is not yet implemented."numba": Use the implementations innumpy_groupies.aggregate_numba."numbagg": Use the reductions supported bynumbagg.grouped. This will fall back tonumpy_groupies.aggregate_numpyfor a reduction that is not yet implemented.
- reindexReindexStrategy | bool, optional
Whether to “reindex” the blockwise reduced results to
expected_groups(possibly automatically detected). If True, the intermediate result of the blockwise groupby-reduction has a value for all expected groups, and the final result is a simple reduction of those intermediates. In nearly all cases, this is a significant boost in computation speed. For cases like time grouping, this may result in large intermediates relative to the original block size. Avoid that by usingmethod="cohorts". By default, it is turned off for argreductions. By default, the type ofarrayis preserved. You may optionally reindex to a sparse array type to further control memory in the case ofexpected_groupsbeing very large. Pass aReindexStrategyinstance with the appropriatearray_type, for example (reindex=ReindexStrategy(blockwise=False, array_type=ReindexArrayType.SPARSE_COO)).- finalize_kwargsdict, optional
Kwargs passed to finalize the reduction such as
ddoffor var, std orqfor quantile.
- Returns:
- result
Aggregated result
- *groups
Group labels
See also