flox.groupby_scan¶
- flox.groupby_scan(array, *by, func, expected_groups=None, axis=-1, dtype=None, method=None, engine=None)[source]¶
GroupBy reductions using parallel scans for dask.array
- Parameters:
- arrayndarray or DaskArray
Array to be reduced, possibly nD
- *byndarray or DaskArray
Array of labels to group over. Must be aligned with
arrayso thatarray.shape[-by.ndim :] == by.shapeor any disagreements in that equality check are for dimensions of size 1 in by.- func{“nancumsum”, “ffill”, “bfill”} or Scan
Single function name or a Scan instance
- expected_groups(optional) Sequence
Expected unique labels.
- axisNone or int or Sequence[int], optional
If None, reduce across all dimensions of by Else, reduce across corresponding axes of array Negative integers are normalized using array.ndim.
- fill_valueAny
Value to assign when a label in
expected_groupsis not present.- dtypedata-type , optional
DType for the output. Can be anything that is accepted by
np.dtype.- method{“blockwise”, “blelloch”}, optional
- Strategy for scan of dask arrays only:
"blockwise": Only scan using blockwise and avoid aggregating blocks together. Useful for resampling-style groupby problems where group members are always together. If by is 1D, array is automatically rechunked so that chunk boundaries line up with group boundaries i.e. each block contains all members of any group present in that block. For nD by, you must make sure that all members of a group are present in a single block."blelloch": Use Blelloch’s parallel prefix scan algorithm, which allows scanning across chunk boundaries. This is the default when groups span multiple chunks.
- engine{“flox”, “numpy”, “numba”, “numbagg”}, optional
- Algorithm to compute the groupby reduction on non-dask arrays and on each dask chunk:
"numpy": Use the vectorized implementations innumpy_groupies.aggregate_numpy. This is the default choice because it works for most array types."flox": Use an internal implementation where the data is sorted so that all members of a group occur sequentially, and then numpy.ufunc.reduceat is to used for the reduction. This will fall back tonumpy_groupies.aggregate_numpyfor a reduction that is not yet implemented."numba": Use the implementations innumpy_groupies.aggregate_numba."numbagg": Use the reductions supported bynumbagg.grouped. This will fall back tonumpy_groupies.aggregate_numpyfor a reduction that is not yet implemented.
- Returns:
- result
Aggregated result
See also