Grid Workflow#

class datacube.api.GridWorkflow(index, grid_spec=None, product=None)[source]#

GridWorkflow deals with cell- and tile-based processing using a grid defining a projection and resolution.

Use GridWorkflow to specify your desired output grid. The methods list_cells() and list_tiles() query the index and return a dictionary of cell or tile keys, each mapping to a Tile object.

The Tile object can then be used to load the data without needing the index, and can be serialized for use with the distributed package.

Create a grid workflow tool.

Either grid_spec or product must be supplied.

Parameters

index (datacube.index.Index) – The database index to use.
grid_spec (GridSpec) – The grid projection and resolution
product (str) – The name of an existing product, if no grid_spec is supplied.

Members

Methods:

`cell_observations`([cell_index, geopolygon, ...])	List datasets, grouped by cell.
`group_into_cells`(observations, group_by)	Group observations into a stack of source tiles.
`list_cells`([cell_index])	List cells that match the query.
`list_tiles`([cell_index])	List tiles of data, sorted by cell.
`load`(tile[, measurements, dask_chunks, ...])	Load data for a cell/tile.
`tile_sources`(observations, group_by)	Split observations into tiles and group into source tiles

cell_observations(cell_index=None, geopolygon=None, tile_buffer=None, **indexers)[source]#

List datasets, grouped by cell.

Parameters

geopolygon (datacube.utils.Geometry) – Only return observations with data inside polygon.
tile_buffer ((float,float)) – buffer tiles by (y, x) in CRS units
cell_index ((int,int)) – The cell index. E.g. (14, -40)
indexers – Query to match the datasets, see datacube.api.query.Query

Returns

Datsets grouped by cell index

Return type

dict[(int,int), list[datacube.model.Dataset]]

See also

datacube.Datacube.find_datasets()

datacube.api.query.Query

static group_into_cells(observations, group_by)[source]#

Group observations into a stack of source tiles.

Parameters

observations – datasets grouped by cell index, like from cell_observations()
group_by (datacube.api.query.GroupBy) – grouping method, as returned by datacube.api.query.query_group_by()

Returns

tiles grouped by cell index

Return type

dict[(int,int), Tile]

See also

load()

datacube.Datacube.group_datasets()

list_cells(cell_index=None, **query)[source]#

List cells that match the query.

Returns a dictionary of cell indexes to Tile objects.

Cells are included if they contain any datasets that match the query using the same format as datacube.Datacube.load().

E.g.:

gw.list_cells(product='ls5_nbar_albers',
              time=('2001-1-1 00:00:00', '2001-3-31 23:59:59'))

Parameters

cell_index ((int,int)) – The cell index. E.g. (14, -40)
query – see datacube.api.query.Query

Return type

dict[(int, int), Tile]

list_tiles(cell_index=None, **query)[source]#

List tiles of data, sorted by cell.

tiles = gw.list_tiles(product='ls5_nbar_albers',
                      time=('2001-1-1 00:00:00', '2001-3-31 23:59:59'))

The values can be passed to load()

Parameters

cell_index ((int,int)) – The cell index (optional). E.g. (14, -40)
query – see datacube.api.query.Query

Return type

dict[(int, int, numpy.datetime64), Tile]

See also

load()

static load(tile, measurements=None, dask_chunks=None, fuse_func=None, resampling=None, skip_broken_datasets=False)[source]#

Load data for a cell/tile.

The data to be loaded is defined by the output of list_tiles().

This is a static function and does not use the index. This can be useful when running as a worker in a distributed environment and you wish to minimize database connections.

See the documentation on using xarray with dask for more information.

Parameters

tile (.Tile) – The tile to load.
measurements (list(str)) – The names of measurements to load
dask_chunks (dict) –
If the data should be loaded as needed using dask.array.Array, specify the chunk size in each output direction.

See the documentation on using xarray with dask for more information.
fuse_func – Function to fuse together a tile that has been pre-grouped by calling list_cells() with a group_by parameter.
resampling (str|dict) –
The resampling method to use if re-projection is required, could be configured per band using a dictionary (:meth: load_data)

Valid values are: 'nearest', 'cubic', 'bilinear', 'cubic_spline', 'lanczos', 'average'

Defaults to 'nearest'.
skip_broken_datasets (bool) – If True, ignore broken datasets and continue processing with the data that can be loaded. If False, an exception will be raised on a broken dataset. Defaults to False.

Return type