Dataset Searching & Querying#

Finding Datasets#

Individual datasets for a product can be searched for using a datacube instance’s find_datasets method.

For example, we could search for an example dataset from the ls9_sr product:

[1]:
import datacube

dc = datacube.Datacube(app="my_analysis")

datasets = dc.find_datasets(product="ls9_sr", limit=1)
datasets
[1]:
[Dataset <id=d853931f-f37d-5ed0-98a9-20753caf97f8 product=ls9_sr location=s3://deafrica-landsat/collection02/level-2/standard/oli-tirs/2022/177/042/LC09_L2SP_177042_20220304_20220306_02_T1/LC09_L2SP_177042_20220304_20220306_02_T1_SR_stac.json>]

We can also search for datasets within a specific spatial extent or time period. To do this, we supply a spatiotemporal query (i.e. a range of x- and y-coordinates defining the spatial area to load, and a range of times).

dc.find_datasets() will then return a subset of datasets that match this query:

[2]:
datasets = dc.find_datasets(
    product="ls9_sr",
    x=(29.0, 29.01),
    y=(25.0, 25.01),
    time=("2022-01-01", "2022-02-01")
)
datasets
[2]:
[Dataset <id=8a7ae87d-2032-527f-93af-bb6a59c4f972 product=ls9_sr location=s3://deafrica-landsat/collection02/level-2/standard/oli-tirs/2022/177/043/LC09_L2SP_177043_20220131_20220202_02_T1/LC09_L2SP_177043_20220131_20220202_02_T1_SR_stac.json>,
 Dataset <id=e83c49c0-a10a-57e4-846b-e07e2ebe1a74 product=ls9_sr location=s3://deafrica-landsat/collection02/level-2/standard/oli-tirs/2022/177/043/LC09_L2SP_177043_20220115_20220118_02_T1/LC09_L2SP_177043_20220115_20220118_02_T1_SR_stac.json>]

Inspecting Datasets#

Dataset objects contain important metadata that are required for loading and interpreting datacube data. These include the dataset’s URIs:

[3]:
datasets[0].uris
[3]:
['s3://deafrica-landsat/collection02/level-2/standard/oli-tirs/2022/177/043/LC09_L2SP_177043_20220131_20220202_02_T1/LC09_L2SP_177043_20220131_20220202_02_T1_SR_stac.json']

A list of measurements available within the dataset:

[4]:
datasets[0].measurements
[4]:
{'SR_B1': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_SR_B1.TIF'},
 'SR_B2': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_SR_B2.TIF'},
 'SR_B3': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_SR_B3.TIF'},
 'SR_B4': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_SR_B4.TIF'},
 'SR_B5': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_SR_B5.TIF'},
 'SR_B6': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_SR_B6.TIF'},
 'SR_B7': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_SR_B7.TIF'},
 'QA_PIXEL': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_QA_PIXEL.TIF'},
 'QA_RADSAT': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_QA_RADSAT.TIF'},
 'SR_QA_AEROSOL': {'path': 'LC09_L2SP_177043_20220131_20220202_02_T1_SR_QA_AEROSOL.TIF'}}

The dataset’s native coordinate reference system (CRS) and geotransform:

[5]:
datasets[0].crs
[5]:
CRS('epsg:32635')
[6]:
datasets[0].transform
[6]:
Affine(226830.0, 0.0, 581385.0,
       0.0, -231030.0, 2831715.0)

Other important metadata fields that can be used to query and search for data can be accessed using the metadata property:

[7]:
dir(datasets[0].metadata)
[7]:
['cloud_cover',
 'collection_category',
 'creation_dt',
 'creation_time',
 'crs_raw',
 'data_coverage',
 'eo_gsd',
 'eo_sun_azimuth',
 'eo_sun_elevation',
 'format',
 'grid_spatial',
 'id',
 'instrument',
 'label',
 'lat',
 'lon',
 'measurements',
 'platform',
 'product_family',
 'region_code',
 'rmse',
 'rmse_x',
 'rmse_y',
 'sat_orbit_state',
 'sat_relative_orbit',
 'sources',
 'time']