flox.xarray.xarray_reduce¶

flox.xarray.xarray_reduce(obj, *by, func, expected_groups=None, isbin=False, sort=True, dim=None, fill_value=None, dtype=None, method=None, engine=None, keep_attrs=True, skipna=None, min_count=None, reindex=None, **finalize_kwargs)[source]¶

GroupBy reduce operations on xarray objects using numpy-groupies.

Parameters:

objDataArray or Dataset

Xarray object to reduce

*byDataArray or iterable of str or iterable of DataArray

Variables with which to group by obj

func{“all”, “any”, “count”, “sum”, “nansum”, “mean”, “nanmean”, “max”, “nanmax”, “min”, “nanmin”, “argmax”, “nanargmax”, “argmin”, “nanargmin”, “quantile”, “nanquantile”, “median”, “nanmedian”, “mode”, “nanmode”, “first”, “nanfirst”, “last”, “nanlast”} or Aggregation

Single function name or an Aggregation instance

expected_groupsstr or sequence

expected group labels corresponding to each by variable

isbiniterable of bool

If True, corresponding entry in expected_groups are bin edges. If False, the entry in expected_groups is treated as a simple label.

sort(optional), bool

Whether groups should be returned in sorted order. Only applies for dask reductions when method is not "map-reduce". For "map-reduce", the groups are always sorted.

dimhashable

dimension name along which to reduce. If None, reduces across all dimensions of by

fill_valueAny

Value used for missing groups in the output i.e. when one of the labels in expected_groups is not actually present in by.

dtypedata-type, optional

DType for the output. Can be anything that is accepted by np.dtype.

method{“map-reduce”, “blockwise”, “cohorts”}, optional

Note that this arg is chosen by default using heuristics. Strategy for reduction of dask arrays only.

"map-reduce": First apply the reduction blockwise on array, then combine a few newighbouring blocks, apply the reduction. Continue until finalizing. Usually, func will need to be an Aggregation instance for this method to work. Common aggregations are implemented.

"blockwise": Only reduce using blockwise and avoid aggregating blocks together. Useful for resampling-style reductions where group members are always together. If by is 1D, array is automatically rechunked so that chunk boundaries line up with group boundaries i.e. each block contains all members of any group present in that block. For nD by, you must make sure that all members of a group are present in a single block.

"cohorts": Finds group labels that tend to occur together (“cohorts”), indexes out cohorts and reduces that subset using “map-reduce”, repeat for all cohorts. This works well for many time groupings where the group labels repeat at regular intervals like ‘hour’, ‘month’, dayofyear’ etc. Optimize chunking array for this method by first rechunking using rechunk_for_cohorts (for 1D by only).

engine{“flox”, “numpy”, “numba”, “numbagg”}, optional

Algorithm to compute the groupby reduction on non-dask arrays and on each dask chunk:

"numpy": Use the vectorized implementations in numpy_groupies.aggregate_numpy. This is the default choice because it works for most array types.
"flox": Use an internal implementation where the data is sorted so that all members of a group occur sequentially, and then numpy.ufunc.reduceat is to used for the reduction. This will fall back to numpy_groupies.aggregate_numpy for a reduction that is not yet implemented.
"numba": Use the implementations in numpy_groupies.aggregate_numba.
"numbagg": Use the reductions supported by numbagg.grouped. This will fall back to numpy_groupies.aggregate_numpy for a reduction that is not yet implemented.

keep_attrsbool, optional

Preserve attrs?

skipnabool, optional

If True, skip missing values (as marked by NaN). By default, only skips missing values for float dtypes; other dtypes either do not have a sentinel missing value (int) or skipna=True has not been implemented (object, datetime64 or timedelta64).

min_countint, default: None

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA. Only used if skipna is set to True or defaults to True for the array’s dtype.

reindexReindexStrategy | bool, optional

Whether to “reindex” the blockwise reduced results to expected_groups (possibly automatically detected). If True, the intermediate result of the blockwise groupby-reduction has a value for all expected groups, and the final result is a simple reduction of those intermediates. In nearly all cases, this is a significant boost in computation speed. For cases like time grouping, this may result in large intermediates relative to the original block size. Avoid that by using method="cohorts". By default, it is turned off for argreductions. By default, the type of array is preserved. You may optionally reindex to a sparse array type to further control memory in the case of expected_groups being very large. Pass a ReindexStrategy instance with the appropriate array_type, for example (reindex=ReindexStrategy(blockwise=False, array_type=ReindexArrayType.SPARSE_COO)).

**finalize_kwargs: dict, optional

kwargs passed to the finalize function, like ddof for var, std or q for quantile.

Returns:

DataArray or Dataset: Reduced object

Raises:

NotImplementedError
ValueError