flox: fast & furious GroupBy reductions for
flox mainly provides strategies for fast GroupBy reductions with dask.array.
flox uses the MapReduce paradigm (or a “tree reduction”)
to run the GroupBy operation in a parallel-native way totally avoiding a sort or shuffle operation. It was motivated by
floxintegrates with xarray to provide more performant Groupby and Resampling operations.
flox.xarray.xarray_reduce()extends Xarray’s GroupBy operations allowing lazy grouping by dask arrays, grouping by multiple arrays, as well as combining categorical grouping and histogram-style binning operations using multiple variables.
floxalso provides utility functions for rechunking both dask arrays and Xarray objects along a single dimension using the group labels as a guide:
$ pip install flox
$ conda install -c conda-forge flox
This work was funded in part by
NASA-ACCESS 80NSSC18M0156 “Community tools for analysis of NASA Earth Observing System Data in the Cloud” (PI J. Hamman),
NASA-OSTFL 80NSSC22K0345 “Enhancing analysis of NASA data with the open-source Python Xarray Library” (PIs Scott Henderson, University of Washington; Deepak Cherian, NCAR; Jessica Scheick, University of New Hampshire), and
It was motivated by many discussions in the Pangeo community.