datacube.Datacube.load#
- Datacube.load(product=None, measurements=None, output_crs=None, resolution=None, resampling=None, skip_broken_datasets=False, dask_chunks=None, like=None, fuse_func=None, align=None, datasets=None, dataset_predicate=None, progress_cbk=None, patch_url=None, **query)[source]#
Load data as an
xarray.Dataset
object. Each measurement will be a data variable in thexarray.Dataset
.See the xarray documentation for usage of the
xarray.Dataset
andxarray.DataArray
objects.- Product and Measurements
A product can be specified using the product name, or by search fields that uniquely describe a single product.
product='ls5_ndvi_albers'
See
list_products()
for the list of products with their names and properties.A product can also be selected by searching using fields, but must only match one product. For example:
platform='LANDSAT_5', product_type='ndvi'
The
measurements
argument is a list of measurement names, as listed inlist_measurements()
. If not provided, all measurements for the product will be returned.measurements=['red', 'nir', 'swir2']
- Dimensions
Spatial dimensions can specified using the
longitude
/latitude
andx
/y
fields.The CRS of this query is assumed to be WGS84/EPSG:4326 unless the
crs
field is supplied, even if the stored data is in another projection or theoutput_crs
is specified. The dimensionslongitude
/latitude
andx
/y
can be used interchangeably.latitude=(-34.5, -35.2), longitude=(148.3, 148.7)
or
x=(1516200, 1541300), y=(-3867375, -3867350), crs='EPSG:3577'
The
time
dimension can be specified using a tuple of datetime objects or strings withYYYY-MM-DD hh:mm:ss
format. Data will be loaded inclusive of the start and finish times. E.g:time=('2000-01-01', '2001-12-31') time=('2000-01', '2001-12') time=('2000', '2001')
For 3D datasets, where the product definition contains an
extra_dimension
specification, these dimensions can be queried using that dimension’s name. E.g.:z=(10, 30)
or
z=5
or
wvl=(560.3, 820.5)
For EO-specific datasets that are based around scenes, the time dimension can be reduced to the day level, using solar day to keep scenes together.
group_by='solar_day'
For data that has different values for the scene overlap the requires more complex rules for combining data, a function can be provided to the merging into a single time slice.
See
datacube.helpers.ga_pq_fuser()
for an example implementation. seedatacube.api.query.query_group_by()
for group_by built-in functions.- Output
To reproject or resample data, supply the
output_crs
,resolution
,resampling
andalign
fields.By default, the resampling method is ‘nearest’. However, any stored overview layers may be used when down-sampling, which may override (or hybridise) the choice of resampling method.
To reproject data to 30 m resolution for EPSG:3577:
dc.load(product='ls5_nbar_albers', x=(148.15, 148.2), y=(-35.15, -35.2), time=('1990', '1991'), output_crs='EPSG:3577`, resolution=(-30, 30), resampling='cubic' )
- Parameters
product (str) – The product to be loaded.
Measurements name or list of names to be included, as listed in
list_measurements()
. These will be loaded as individualxr.DataArray
variables in the outputxarray.Dataset
object.If a list is specified, the measurements will be returned in the order requested. By default all available measurements are included.
**query – Search parameters for products and dimension ranges as described above. For example:
'x', 'y', 'time', 'crs'
.output_crs (str) –
The CRS of the returned data, for example
EPSG:3577
. If no CRS is supplied, the CRS of the stored data is used if available.This differs from the
crs
parameter desribed above, which is used to define the CRS of the coordinates in the query itself.A tuple of the spatial resolution of the returned data. Units are in the coordinate space of
output_crs
.This includes the direction (as indicated by a positive or negative number). For most CRSs, the first number will be negative, e.g.
(-30, 30)
.The resampling method to use if re-projection is required. This could be a string or a dictionary mapping band name to resampling mode. When using a dict use
'*'
to indicate “apply to all other bands”, for example{'*': 'cubic', 'fmask': 'nearest'}
would usecubic
for all bands exceptfmask
for whichnearest
will be used.Valid values are:
'nearest', 'average', 'bilinear', 'cubic', 'cubic_spline', 'lanczos', 'mode', 'gauss', 'max', 'min', 'med', 'q1', 'q3'
Default is to use
nearest
for all bands.See also
Load data such that point ‘align’ lies on the pixel boundary. Units are in the coordinate space of
output_crs
.Default is
(0, 0)
dask_chunks (dict) –
If the data should be lazily loaded using
dask.array.Array
, specify the chunking size in each output dimension.See the documentation on using xarray with dask for more information.
like (xarray.Dataset) –
Use the output of a previous
load()
to load data into the same spatial grid and resolution (i.e.datacube.utils.geometry.GeoBox
). E.g.:pq = dc.load(product='ls5_pq_albers', like=nbar_dataset)
group_by (str) – When specified, perform basic combining/reducing of the data. For example,
group_by='solar_day'
can be used to combine consecutive observations along a single satellite overpass into a single time slice.fuse_func – Function used to fuse/combine/reduce data with the
group_by
parameter. By default, data is simply copied over the top of each other in a relatively undefined manner. This function can perform a specific combining step. This can be a dictionary if different fusers are needed per band.datasets – Optional. If this is a non-empty list of
datacube.model.Dataset
objects, these will be loaded instead of performing a database lookup.skip_broken_datasets (bool) – Optional. If this is True, then don’t break when failing to load a broken dataset. Default is False.
dataset_predicate (function) –
Optional. A function that can be passed to restrict loaded datasets. A predicate function should take a
datacube.model.Dataset
object (e.g. as returned fromfind_datasets()
) and return a boolean. For example, loaded data could be filtered to January observations only by passing the following predicate function that returns True for datasets acquired in January:def filter_jan(dataset): return dataset.time.begin.month == 1
limit (int) – Optional. If provided, limit the maximum number of datasets returned. Useful for testing and debugging.
progress_cbk –
Int, Int -> None
, if supplied will be called for every file read withfiles_processed_so_far, total_files
. This is only applicable to non-lazy loads, ignored when using dask.patch_url (Callable[[str], str],) – if supplied, will be used to patch/sign the url(s), as required to access some commercial archives (e.g. Microsoft Planetary Computer).
- Returns
Requested data in a
xarray.Dataset
- Return type