Datacube.load(product=None, measurements=None, output_crs=None, resolution=None, resampling=None, skip_broken_datasets=False, dask_chunks=None, like=None, fuse_func=None, align=None, datasets=None, progress_cbk=None, **query)[source]

Load data as an xarray object. Each measurement will be a data variable in the xarray.Dataset.

See the xarray documentation for usage of the xarray.Dataset and xarray.DataArray objects.

Product and Measurements

A product can be specified using the product name, or by search fields that uniquely describe a single product.


See list_products() for the list of products with their names and properties.

A product can also be selected by searching using fields, but must only match one product. For example:


The measurements argument is a list of measurement names, as listed in list_measurements(). If not provided, all measurements for the product will be returned.

measurements=['red', 'nir', 'swir2']

Spatial dimensions can specified using the longitude/latitude and x/y fields.

The CRS of this query is assumed to be WGS84/EPSG:4326 unless the crs field is supplied, even if the stored data is in another projection or the output_crs is specified. The dimensions longitude/latitude and x/y can be used interchangeably.

latitude=(-34.5, -35.2), longitude=(148.3, 148.7)


x=(1516200, 1541300), y=(-3867375, -3867350), crs='EPSG:3577'

The time dimension can be specified using a tuple of datetime objects or strings with YYYY-MM-DD hh:mm:ss format. E.g:

time=('2001-04', '2001-07')

For EO-specific datasets that are based around scenes, the time dimension can be reduced to the day level, using solar day to keep scenes together.


For data that has different values for the scene overlap the requires more complex rules for combining data, such as GA’s Pixel Quality dataset, a function can be provided to the merging into a single time slice.

See datacube.helpers.ga_pq_fuser() for an example implementation.


To reproject or resample the data, supply the output_crs, resolution, resampling and align fields.

By default, the resampling method is ‘nearest’. However any stored overview layers may be used when down-sampling, which may override (or hybridise) the choice of resampling method.

To reproject data to 25m resolution for EPSG:3577:

dc.load(product='ls5_nbar_albers', x=(148.15, 148.2), y=(-35.15, -35.2), time=('1990', '1991'),
        output_crs='EPSG:3577`, resolution=(-25, 25), resampling='cubic')
  • product (str) – the product to be included.

  • measurements (list(str), optional) –

    Measurements name or list of names to be included, as listed in list_measurements().

    If a list is specified, the measurements will be returned in the order requested. By default all available measurements are included.

  • query – Search parameters for products and dimension ranges as described above.

  • output_crs (str) – The CRS of the returned data. If no CRS is supplied, the CRS of the stored data is used.

  • resolution ((float,float)) –

    A tuple of the spatial resolution of the returned data. This includes the direction (as indicated by a positive or negative number).

    Typically when using most CRSs, the first number would be negative.

  • resampling (str|dict) –

    The resampling method to use if re-projection is required. This could be a string or a dictionary mapping band name to resampling mode. When using a dict use '*' to indicate “apply to all other bands”, for example {'*': 'cubic', 'fmask': 'nearest'} would use cubic for all bands except fmask for which nearest will be used.

    Valid values are: 'nearest', 'cubic', 'bilinear', 'cubic_spline', 'lanczos', 'average', 'mode', 'gauss',  'max', 'min', 'med', 'q1', 'q3'

    Default is to use nearest for all bands. .. seealso:: load_data()

  • align ((float,float)) –

    Load data such that point ‘align’ lies on the pixel boundary. Units are in the co-ordinate space of the output CRS.

    Default is (0,0)

  • dask_chunks (dict) –

    If the data should be lazily loaded using dask.array.Array, specify the chunking size in each output dimension.

    See the documentation on using xarray with dask for more information.

  • like (xarray.Dataset) –

    Uses the output of a previous load() to form the basis of a request for another product. E.g.:

    pq = dc.load(product='ls5_pq_albers', like=nbar_dataset)

  • group_by (str) – When specified, perform basic combining/reducing of the data.

  • fuse_func – Function used to fuse/combine/reduce data with the group_by parameter. By default, data is simply copied over the top of each other, in a relatively undefined manner. This function can perform a specific combining step, eg. for combining GA PQ data. This can be a dictionary if different fusers are needed per band.

  • datasets – Optional. If this is a non-empty list of datacube.model.Dataset objects, these will be loaded instead of performing a database lookup.

  • limit (int) – Optional. If provided, limit the maximum number of datasets returned. Useful for testing and debugging.

  • progress_cbk – Int, Int -> None if supplied will be called for every file read with files_processed_so_far, total_files. This is only applicable to non-lazy loads, ignored when using dask.


Requested data in a xarray.Dataset

Return type