API reference

For examples on how to use the API, check out the example Jupyter notebooks

Datacube Class

Datacube([index, config, app, env, …])

Interface to search, read and write a datacube.

Data Discovery

Datacube.list_products

List products in the datacube

Datacube.list_measurements

List measurements for each product

Data Loading

Datacube.load

Load data as an xarray object.

Internal Loading Functions

This operations can be useful if you need to customise the loading process, for example, to pre-filter the available datasets before loading.

Datacube.find_datasets(**search_terms)

Search the index and return all datasets for a product matching the search terms.

Datacube.group_datasets(datasets, group_by)

Group datasets along defined non-spatial dimensions (ie.

Datacube.load_data(sources, geobox, measurements)

Load data from group_datasets() into an xarray.Dataset.

Grid Processing API

Tile(sources, geobox)

The Tile object holds a lightweight representation of a datacube result.

GridWorkflow(index[, grid_spec, product])

GridWorkflow deals with cell- and tile-based processing using a grid defining a projection and resolution.

GridWorkflow.list_cells([cell_index])

List cells that match the query.

GridWorkflow.list_tiles([cell_index])

List tiles of data, sorted by cell.

GridWorkflow.load(tile[, measurements, …])

Load data for a cell/tile.

Grid Processing API Internals

GridWorkflow.cell_observations([cell_index, …])

List datasets, grouped by cell.

GridWorkflow.group_into_cells(observations, …)

Group observations into a stack of source tiles.

GridWorkflow.tile_sources(observations, group_by)

Split observations into tiles and group into source tiles

Internal Data Model

Dataset(type_, metadata_doc[, uris, …])

A Dataset.

Measurement([canonical_name])

Describes a single data variable of a Product or Dataset.

MetadataType(definition[, …])

Metadata Type definition

DatasetType(metadata_type, definition[, id_])

Product definition

Range(begin, end)

Database Index API

Dataset Querying

When connected to an ODC Database, these methods are available for searching and querying:

dc = Datacube()
dc.index.datasets.{method}

get

Get dataset by id

search

Perform a search, returning results as Dataset objects.

search_by_metadata

Perform a search using arbitrary metadata, returning results as Dataset objects.

search_by_product

Perform a search, returning datasets grouped by product type.

search_eager

Perform a search, returning results as Dataset objects.

search_product_duplicates

Find dataset ids who have duplicates of the given set of field names.

search_returning

Perform a search, returning only the specified fields.

search_summaries

Perform a search, returning just the search fields of each dataset.

has

Have we already indexed this dataset?

bulk_has

Like has but operates on a list of ids.

can_update

Check if dataset can be updated.

count

Perform a search, returning count of results.

count_by_product

Perform a search, returning a count of for each matching product type.

count_by_product_through_time

Perform a search, returning counts for each product grouped in time slices of the given period.

count_product_through_time

Perform a search, returning counts for a single product grouped in time slices of the given period.

get_derived

Get all derived datasets

get_field_names

Get the list of possible search fields for a Product

get_locations

Get the list of storage locations for the given dataset id

get_archived_locations

Find locations which have been archived for a dataset

get_datasets_for_location

Find datasets that exist at the given URI

Dataset Writing

When connected to an ODC Database, these methods are available for adding, updating and archiving datasets:

dc = Datacube()
dc.index.datasets.{method}

add

Add dataset to the index.

add_location

Add a location to the dataset if it doesn’t already exist.

archive

Mark datasets as archived

archive_location

Archive a location of the dataset if it exists.

remove_location

Remove a location from the dataset if it exists.

restore

Mark datasets as not archived

restore_location

Un-archive a location of the dataset if it exists.

update

Update dataset metadata and location :param Dataset dataset: Dataset to update :param updates_allowed: Allowed updates :rtype: Dataset

Product Querying

When connected to an ODC Database, these methods are available for discovering information about Products:

dc = Datacube()
dc.index.products.{method}

from_doc

Create a Product from its definitions

add

Add a Product.

can_update

Check if product can be updated.

add_document

Add a Product using its definition

get

Retrieve Product by id

get_by_name

Retrieve Product by name

get_unsafe

get_by_name_unsafe

get_with_fields

Return dataset types that have all the given fields.

search

Return dataset types that have all the given fields.

search_robust

Return dataset types that match match-able fields and dict of remaining un-matchable fields.

get_all

Retrieve all Products

Product Addition/Modification

When connected to an ODC Database, these methods are available for discovering information about Products:

dc = Datacube()
dc.index.products.{method}

from_doc

Create a Product from its definitions

add

Add a Product.

update

Update a product.

update_document

Update a Product using its definition

add_document

Add a Product using its definition

Database Index Connections

index.index_connect

Create a Data Cube Index that can connect to a PostgreSQL server

index.Index

Access to the datacube index.

Dataset to Product Matching

Doc2Dataset

Used for constructing Dataset objects from plain metadata documents.

Geometry Utilities

Open Data Cube includes a set of CRS aware geometry utilities.

Geometry Classes

utils.geometry.Coordinate

utils.geometry.BoundingBox

Bounding box, defining extent in cartesian coordinates.

utils.geometry.CRS

Wrapper around pyproj.CRS for backwards compatibility.

utils.geometry.Geometry

2D Geometry with CRS

utils.geometry.GeoBox

Defines the location and resolution of a rectangular grid of data, including it’s CRS.

utils.geometry.BoundingBox

Bounding box, defining extent in cartesian coordinates.

utils.geometry.gbox.GeoboxTiles

Partition GeoBox into sub geoboxes

model.GridSpec

Definition for a regular spatial grid

utils.geometry.CRSError

Raised when a CRS error occurs.

utils.geometry.CRSMismatchError

Raised when geometry operation is attempted on geometries in different coordinate references.

Tools

Creating Geometries

point(x, y, crs)

Create a 2D Point

multipoint(coords, crs)

Create a 2D MultiPoint Geometry

line(coords, crs)

Create a 2D LineString (Connected set of lines)

multiline(coords, crs)

Create a 2D MultiLineString (Multiple disconnected sets of lines)

polygon(outer, crs, *inners)

Create a 2D Polygon

multipolygon(coords, crs)

Create a 2D MultiPolygon

multigeom(geoms)

Construct Multi{Polygon|LineString|Point}

box(left, bottom, right, top, crs)

Create a 2D Box (Polygon)

sides(poly)

Returns a sequence of Geometry[Line] objects.

polygon_from_transform(width, height, …)

Create a 2D Polygon from an affine transform

Multi-geometry ops

unary_union(geoms)

compute union of multiple (multi)polygons efficiently

unary_intersection(geoms)

compute intersection of multiple (multi)polygons

bbox_union(bbs)

Given a stream of bounding boxes compute enclosing BoundingBox

bbox_intersection(bbs)

Given a stream of bounding boxes compute the overlap BoundingBox

lonlat_bounds(geom[, mode, resolution])

Return the bounding box of a geometry

projected_lon(crs, lon[, lat, step])

Project vertical line along some longitude into given CRS.

clip_lon180(geom[, tol])

For every point in the lon=180|-180 band clip to either 180 or -180 180|-180 is decided based on where the majority of other points lie.

chop_along_antimeridian(geom[, precision])

Chop a geometry along the antimeridian

Other Utilities

assign_crs(xx[, crs, crs_coord_name])

Assign CRS for a non-georegistered array or dataset.

crs_units_per_degree(crs, lon[, lat, step])

Compute number of CRS units per degree for a projected CRS at a given location in lon/lat.

geobox_union_conservative(geoboxes)

Union of geoboxes.

geobox_intersection_conservative(geoboxes)

Intersection of geoboxes.

scaled_down_geobox(src_geobox, scaler)

Given a source geobox and integer scaler compute geobox of a scaled down image.

intersects(a, b)

Returns True if geometries intersect, else False

common_crs(geoms)

Return CRS common across geometries, or raise CRSMismatchError

is_affine_st(A[, tol])

True if Affine transform has scale and translation components only.

apply_affine(A, x, y)

broadcast A*(x_i, y_i) across all elements of x/y arrays in any shape (usually 2d image)

roi_boundary(roi[, pts_per_side])

Get boundary points from a 2d roi.

roi_is_empty(roi)

roi_is_full(roi, shape)

Check if ROI covers the entire region.

roi_intersect(a, b)

Compute intersection of two ROIs

roi_shape(roi)

roi_normalise(roi, shape)

Fill in missing .start/.stop, also deal with negative values, which are treated as offsets from the end.

roi_from_points(xy, shape[, padding, align])

Compute envelope around a bunch of points and return it as roi (tuple of row/col slices)

roi_center(roi)

Return center point of roi

roi_pad(roi, pad, shape)

Pad ROI on each side, with clamping (0,..) -> shape

scaled_down_shape(shape, scale)

scaled_down_roi(roi, scale)

scaled_up_roi(roi, scale[, shape])

decompose_rws(A)

Compute decomposition Affine matrix sans translation into Rotation, Shear and Scale.

affine_from_pts(X, Y)

Given points X,Y compute A, such that: Y = A*X.

get_scale_at_point(pt, tr[, r])

Given an arbitrary locally linear transform estimate scale change around a point.

native_pix_transform(src, dst)

direction: from src to dst .back: goes the other way .linear: None|Affine linear transform src->dst if transform is linear (i.e.

compute_reproject_roi(src, dst[, tol, …])

Given two GeoBoxes find the region within the source GeoBox that overlaps with the destination GeoBox, and also compute the scale factor (>1 means shrink).

split_translation(t)

Split translation into pixel aligned and sub-pixel components.

compute_axis_overlap(Ns, Nd, s, t)

s, t define linear transform from destination coordinate space to source >> x_s = s * x_d + t

w_

Translate numpy slices to rasterio window tuples.

warp_affine(src, dst, A, resampling[, …])

Perform Affine warp using best available backend (GDAL via rasterio is the only one so far).

rio_reproject(src, dst, s_gbox, d_gbox, …)

Perform reproject from ndarray->ndarray using rasterio as backend library.

Masking

masking.mask_invalid_data(data[, keep_attrs])

Sets all nodata values to nan.

masking.describe_variable_flags(variable[, …])

Returns either a Pandas Dataframe (with_pandas=True - default) or a string (with_pandas=False) describing the available flags for a masking variable

masking.make_mask(variable, **flags)

Returns a mask array, based on provided flags

Writing Image Files

write_cog(geo_im, fname[, blocksize, …])

Save xarray.DataArray to a file in Cloud Optimized GeoTiff format.

to_cog(geo_im[, blocksize, ovr_blocksize, …])

Compress xarray.DataArray into Cloud Optimized GeoTiff bytes in memory.

AWS Utilities

s3_client([profile, creds, region_name, …])

Construct s3 client with configured region_name.

s3_open(url[, s3, range])

Open whole or part of S3 object

s3_head_object(url[, s3])

Head object, return object metadata.

s3_fetch(url[, s3, range])

Read entire or part of object into memory and return as bytes

s3_dump(data, url[, s3])

Write data to s3 object.

s3_url_parse(url)

Return Bucket, Key tuple

auto_find_region([session, default])

Try to figure out which region name to use

get_aws_settings([profile, region_name, …])

Compute aws= parameter for set_default_rio_config.

get_creds_with_retry(session[, max_tries, sleep])

Attempt to obtain credentials upto max_tries times with back off :type session: Session :param session: botocore session, see mk_boto_session :type max_tries: int :param max_tries: number of attempt before failing and returing None :type sleep: float :param sleep: number of seconds to sleep after first failure (doubles on every consecutive failure)

mk_boto_session([profile, creds, region_name])

Get botocore session with correct region configured

ec2_current_region()

Returns name of the region this EC2 instance is running in.

ec2_metadata([timeout])

When running inside AWS returns dictionary describing instance identity.

configure_s3_access([profile, region_name, …])

Credentialize for S3 bucket access or configure public access.

Dask Utilities

start_local_dask([n_workers, …])

Wrapper around distributed.Client(..) constructor that deals with memory better.

partition_map(n, func, its[, name])

Parallel map in lumps.

pmap(func, its, client[, lump, …])

Parallel map with back pressure.

compute_tasks(tasks, client[, max_in_flight])

Parallel compute stream with back pressure.

save_blob_to_file(data, fname[, with_deps])

Dump from memory to local filesystem as a dask delayed operation.

save_blob_to_s3(data, url[, profile, creds, …])

Dump from memory to S3 as a dask delayed operation.

Query Class

Query([index, product, geopolygon, like])

solar_day(dataset[, longitude])

Adjust Dataset timestamp for “local time” given location and convert to numpy.

solar_offset(geom[, precision])

Given a geometry or a Dataset compute offset to add to UTC timestamp to get solar day right.

User Configuration

LocalConfig(config[, files_loaded, env])

System configuration for the user.

DEFAULT_CONF_PATHS

Config locations in order.

Everything Else

For Exploratory Data Analysis see Datacube Class for more details

For Writing Large Scale Workflows see Grid Processing API for more details