Grid Workflow#

class datacube.api.GridWorkflow(index, grid_spec=None, product=None)[source]#

GridWorkflow deals with cell- and tile-based processing using a grid defining a projection and resolution.

Use GridWorkflow to specify your desired output grid. The methods list_cells() and list_tiles() query the index and return a dictionary of cell or tile keys, each mapping to a Tile object.

The Tile object can then be used to load the data without needing the index, and can be serialized for use with the distributed package.

Create a grid workflow tool.

Either grid_spec or product must be supplied.

Parameters
  • index (datacube.index.Index) – The database index to use.

  • grid_spec (GridSpec) – The grid projection and resolution

  • product (str) – The name of an existing product, if no grid_spec is supplied.

Methods:

cell_observations([cell_index, geopolygon, ...])

List datasets, grouped by cell.

group_into_cells(observations, group_by)

Group observations into a stack of source tiles.

list_cells([cell_index])

List cells that match the query.

list_tiles([cell_index])

List tiles of data, sorted by cell.

load(tile[, measurements, dask_chunks, ...])

Load data for a cell/tile.

tile_sources(observations, group_by)

Split observations into tiles and group into source tiles

cell_observations(cell_index=None, geopolygon=None, tile_buffer=None, **indexers)[source]#

List datasets, grouped by cell.

Parameters
  • geopolygon (datacube.utils.Geometry) – Only return observations with data inside polygon.

  • tile_buffer ((float,float)) – buffer tiles by (y, x) in CRS units

  • cell_index ((int,int)) – The cell index. E.g. (14, -40)

  • indexers – Query to match the datasets, see datacube.api.query.Query

Returns

Datsets grouped by cell index

Return type

dict[(int,int), list[datacube.model.Dataset]]

static group_into_cells(observations, group_by)[source]#

Group observations into a stack of source tiles.

Parameters
  • observations – datasets grouped by cell index, like from cell_observations()

  • group_by (datacube.api.query.GroupBy) – grouping method, as returned by datacube.api.query.query_group_by()

Returns

tiles grouped by cell index

Return type

dict[(int,int), Tile]

list_cells(cell_index=None, **query)[source]#

List cells that match the query.

Returns a dictionary of cell indexes to Tile objects.

Cells are included if they contain any datasets that match the query using the same format as datacube.Datacube.load().

E.g.:

gw.list_cells(product='ls5_nbar_albers',
              time=('2001-1-1 00:00:00', '2001-3-31 23:59:59'))
Parameters
Return type

dict[(int, int), Tile]

list_tiles(cell_index=None, **query)[source]#

List tiles of data, sorted by cell.

tiles = gw.list_tiles(product='ls5_nbar_albers',
                      time=('2001-1-1 00:00:00', '2001-3-31 23:59:59'))

The values can be passed to load()

Parameters
Return type

dict[(int, int, numpy.datetime64), Tile]

See also

load()

static load(tile, measurements=None, dask_chunks=None, fuse_func=None, resampling=None, skip_broken_datasets=False)[source]#

Load data for a cell/tile.

The data to be loaded is defined by the output of list_tiles().

This is a static function and does not use the index. This can be useful when running as a worker in a distributed environment and you wish to minimize database connections.

See the documentation on using xarray with dask for more information.

Parameters
  • tile (.Tile) – The tile to load.

  • measurements (list(str)) – The names of measurements to load

  • dask_chunks (dict) –

    If the data should be loaded as needed using dask.array.Array, specify the chunk size in each output direction.

    See the documentation on using xarray with dask for more information.

  • fuse_func – Function to fuse together a tile that has been pre-grouped by calling list_cells() with a group_by parameter.

  • resampling (str|dict) –

    The resampling method to use if re-projection is required, could be configured per band using a dictionary (:meth: load_data)

    Valid values are: 'nearest', 'cubic', 'bilinear', 'cubic_spline', 'lanczos', 'average'

    Defaults to 'nearest'.

  • skip_broken_datasets (bool) – If True, ignore broken datasets and continue processing with the data that can be loaded. If False, an exception will be raised on a broken dataset. Defaults to False.

Return type

xarray.Dataset

static tile_sources(observations, group_by)[source]#

Split observations into tiles and group into source tiles

Parameters
  • observations – datasets grouped by cell index, like from cell_observations()

  • group_by (datacube.api.query.GroupBy) – grouping method, as returned by datacube.api.query.query_group_by()

Returns

tiles grouped by cell index and time

Return type

dict[tuple(int, int, numpy.datetime64), Tile]