datacube.Datacube.load

Datacube.load(product=None, measurements=None, output_crs=None, resolution=None, resampling=None, skip_broken_datasets=False, dask_chunks=None, like=None, fuse_func=None, align=None, datasets=None, progress_cbk=None, **query)[source]

Load data as an xarray object. Each measurement will be a data variable in the xarray.Dataset.

See the xarray documentation for usage of the xarray.Dataset and xarray.DataArray objects.

Product and Measurements

A product can be specified using the product name, or by search fields that uniquely describe a single product.

product='ls5_ndvi_albers'

See list_products() for the list of products with their names and properties.

A product can also be selected by searching using fields, but must only match one product. For example:

platform='LANDSAT_5',
product_type='ndvi'

The measurements argument is a list of measurement names, as listed in list_measurements(). If not provided, all measurements for the product will be returned.

measurements=['red', 'nir', 'swir2']
Dimensions

Spatial dimensions can specified using the longitude/latitude and x/y fields.

The CRS of this query is assumed to be WGS84/EPSG:4326 unless the crs field is supplied, even if the stored data is in another projection or the output_crs is specified. The dimensions longitude/latitude and x/y can be used interchangeably.

latitude=(-34.5, -35.2), longitude=(148.3, 148.7)

or

x=(1516200, 1541300), y=(-3867375, -3867350), crs='EPSG:3577'

The time dimension can be specified using a tuple of datetime objects or strings with YYYY-MM-DD hh:mm:ss format. E.g:

time=('2001-04', '2001-07')

For EO-specific datasets that are based around scenes, the time dimension can be reduced to the day level, using solar day to keep scenes together.

group_by='solar_day'

For data that has different values for the scene overlap the requires more complex rules for combining data, such as GA’s Pixel Quality dataset, a function can be provided to the merging into a single time slice.

See datacube.helpers.ga_pq_fuser() for an example implementation.

Output

To reproject or resample the data, supply the output_crs, resolution, resampling and align fields.

To reproject data to 25m resolution for EPSG:3577:

dc.load(product='ls5_nbar_albers', x=(148.15, 148.2), y=(-35.15, -35.2), time=('1990', '1991'),
        output_crs='EPSG:3577`, resolution=(-25, 25), resampling='cubic')
Parameters:
  • product (str) – the product to be included.
  • measurements (list(str), optional) –

    Measurements name or list of names to be included, as listed in list_measurements().

    If a list is specified, the measurements will be returned in the order requested. By default all available measurements are included.

  • query – Search parameters for products and dimension ranges as described above.
  • output_crs (str) – The CRS of the returned data. If no CRS is supplied, the CRS of the stored data is used.
  • resolution ((float,float)) –

    A tuple of the spatial resolution of the returned data. This includes the direction (as indicated by a positive or negative number).

    Typically when using most CRSs, the first number would be negative.

  • resampling (str|dict) –

    The resampling method to use if re-projection is required. This could be a string or a dictionary mapping band name to resampling mode. When using a dict use '*' to indicate “apply to all other bands”, for example {'*': 'cubic', 'fmask': 'nearest'} would use cubic for all bands except fmask for which nearest will be used.

    Valid values are: 'nearest', 'cubic', 'bilinear', 'cubic_spline', 'lanczos', 'average', 'mode', 'gauss',  'max', 'min', 'med', 'q1', 'q3'

    Default is to use nearest for all bands. .. seealso:: load_data()

  • align ((float,float)) –

    Load data such that point ‘align’ lies on the pixel boundary. Units are in the co-ordinate space of the output CRS.

    Default is (0,0)

  • dask_chunks (dict) –

    If the data should be lazily loaded using dask.array.Array, specify the chunking size in each output dimension.

    See the documentation on using xarray with dask for more information.

  • like (xarray.Dataset) –

    Uses the output of a previous load() to form the basis of a request for another product. E.g.:

    pq = dc.load(product='ls5_pq_albers', like=nbar_dataset)
    
  • group_by (str) – When specified, perform basic combining/reducing of the data.
  • fuse_func – Function used to fuse/combine/reduce data with the group_by parameter. By default, data is simply copied over the top of each other, in a relatively undefined manner. This function can perform a specific combining step, eg. for combining GA PQ data. This can be a dictionary if different fusers are needed per band.
  • datasets – Optional. If this is a non-empty list of datacube.model.Dataset objects, these will be loaded instead of performing a database lookup.
  • limit (int) – Optional. If provided, limit the maximum number of datasets returned. Useful for testing and debugging.
  • progress_cbk – Int, Int -> None if supplied will be called for every file read with files_processed_so_far, total_files. This is only applicable to non-lazy loads, ignored when using dask.
Returns:

Requested data in a xarray.Dataset

Return type:

xarray.Dataset