Database and Indexing

Public Method

Modules for interfacing with the index/database.

datacube.index.index_connect(local_config=None, application_name=None, validate_connection=True)

Connect to the index. Default Postgres implementation.

Parameters:
  • application_name – A short, alphanumeric name to identify this application.
  • local_config (datacube.config.LocalConfig, optional) – Config object to use.
  • validate_connection – Validate database connection and schema immediately
Raises:

datacube.index.postgres._api.EnvironmentError

Return type:

Index

Private Indexing Modules

Access methods for indexing datasets & products.

class datacube.index._api.Index(db)[source]

Access to the datacube index.

Thread safe. But not multiprocess safe once a connection is made (db connections cannot be shared between processes) You can close idle connections before forking by calling close(), provided you know no other connections are active. Or else use a separate instance of this class in each process.

Variables:
close()[source]

Close any idle connections database connections.

This is good practice if you are keeping the Index instance in scope but wont be using it for a while.

(Connections are normally closed automatically when this object is deleted: ie. no references exist)

API for dataset indexing, access and search.

class datacube.index._datasets.DatasetResource(db, dataset_type_resource)[source]
add(dataset, skip_sources=False, sources_policy='verify')[source]

Ensure a dataset is in the index. Add it if not present.

Parameters:
  • dataset (datacube.model.Dataset) – dataset to add
  • sources_policy (str) – one of ‘verify’ - verify the metadata, ‘ensure’ - add if doesn’t exist, ‘skip’ - skip
  • skip_sources (bool) – don’t attempt to index source datasets (use when sources are already indexed)
Return type:

datacube.model.Dataset

add_location(id_, uri)[source]

Add a location to the dataset if it doesn’t already exist.

Parameters:
  • str] id (typing.Union[UUID,) – dataset id
  • uri (str) – fully qualified uri
Returns bool:

Was one added?

archive(ids)[source]

Mark datasets as archived

Parameters:ids (list[UUID]) – list of dataset ids to archive
archive_location(id_, uri)[source]

Archive a location of the dataset if it exists. :param typing.Union[UUID, str] id_: dataset id :param str uri: fully qualified uri :return bool: location was able to be archived

can_update(dataset, updates_allowed=None)[source]

Check if dataset can be updated. Return bool,safe_changes,unsafe_changes

Parameters:
Return type:

bool,list[change],list[change]

count(**query)[source]

Perform a search, returning count of results.

Parameters:query (dict[str,str|float|datacube.model.Range]) –
Return type:int
count_by_product(**query)[source]

Perform a search, returning a count of for each matching product type.

Parameters:query (dict[str,str|float|datacube.model.Range]) –
Returns:Sequence of (product, count)
Return type:__generator[(datacube.model.DatasetType, int)]]
count_by_product_through_time(period, **query)[source]

Perform a search, returning counts for each product grouped in time slices of the given period.

Parameters:
  • query (dict[str,str|float|datacube.model.Range]) –
  • period (str) – Time range for each slice: ‘1 month’, ‘1 day’ etc.
Returns:

For each matching product type, a list of time ranges and their count.

Return type:

__generator[(datacube.model.DatasetType, list[(datetime.datetime, datetime.datetime), int)]]

count_product_through_time(period, **query)[source]

Perform a search, returning counts for a single product grouped in time slices of the given period.

Will raise an error if the search terms match more than one product.

Parameters:
  • query (dict[str,str|float|datacube.model.Range]) –
  • period (str) – Time range for each slice: ‘1 month’, ‘1 day’ etc.
Returns:

For each matching product type, a list of time ranges and their count.

Return type:

list[(str, list[(datetime.datetime, datetime.datetime), int)]]

get(id_, include_sources=False)[source]

Get dataset by id

Parameters:
  • id (UUID) – id of the dataset to retrieve
  • include_sources (bool) – get the full provenance graph?
Return type:

datacube.model.Dataset

get_archived_locations(id_)[source]
Parameters:str] id (typing.Union[UUID,) – dataset id
Return type:list[str]
get_derived(id_)[source]

Get all derived datasets

Parameters:id (UUID) – dataset id
Return type:list[datacube.model.Dataset]
get_field_names(type_name=None)[source]
Parameters:type_name (str) –
Return type:set[str]
get_locations(id_)[source]
Parameters:str] id (typing.Union[UUID,) – dataset id
Return type:list[str]
has(id_)[source]

Have we already indexed this dataset?

Parameters:str] id (typing.Union[UUID,) – dataset id
Return type:bool
remove_location(id_, uri)[source]

Remove a location from the dataset if it exists.

Parameters:
  • str] id (typing.Union[UUID,) – dataset id
  • uri (str) – fully qualified uri
Returns bool:

Was one removed?

restore(ids)[source]

Mark datasets as not archived

Parameters:ids (list[UUID]) – list of dataset ids to restore
restore_location(id_, uri)[source]

Un-archive a location of the dataset if it exists. :param typing.Union[UUID, str] id_: dataset id :param str uri: fully qualified uri :return bool: location was able to be restored

search(**query)[source]

Perform a search, returning results as Dataset objects.

Parameters:query (dict[str,str|float|datacube.model.Range]) –
Return type:__generator[datacube.model.Dataset]
search_by_metadata(metadata)[source]

Perform a search using arbitrary metadata, returning results as Dataset objects.

Caution – slow! This will usually not use indexes.

Parameters:metadata (dict) –
Return type:list[datacube.model.Dataset]
search_by_product(**query)[source]

Perform a search, returning datasets grouped by product type.

Parameters:query (dict[str,str|float|datacube.model.Range]) –
Return type:__generator[(datacube.model.DatasetType, __generator[datacube.model.Dataset])]]
search_eager(**query)[source]

Perform a search, returning results as Dataset objects.

Parameters:query (dict[str,str|float|datacube.model.Range]) –
Return type:list[datacube.model.Dataset]
search_product_duplicates(product, *group_fields)[source]

Find dataset ids who have duplicates of the given set of field names.

Product is always inserted as the first grouping field.

Returns each set of those field values and the datasets that have them.

search_returning(field_names, **query)[source]

Perform a search, returning only the specified fields.

This method can be faster than normal search() if you don’t need all fields of each dataset.

It also allows for returning rows other than datasets, such as a row per uri when requesting field ‘uri’.

Parameters:
  • field_names (tuple[str]) –
  • query (dict[str,str|float|datacube.model.Range]) –
Returns __generator[tuple]:
 

sequence of results, each result is a namedtuple of your requested fields

search_summaries(**query)[source]

Perform a search, returning just the search fields of each dataset.

Parameters:query (dict[str,str|float|datacube.model.Range]) –
Return type:__generator[dict]
update(dataset, updates_allowed=None)[source]

Update dataset metadata and location :param datacube.model.Dataset dataset: Dataset to update :param updates_allowed: Allowed updates :rtype: datacube.model.Dataset

class datacube.index._datasets.ProductResource(db, metadata_type_resource)[source]
add(product, allow_table_lock=False)[source]

Add a Product.

Parameters:
  • allow_table_lock

    Allow an exclusive lock to be taken on the table while creating the indexes. This will halt other user’s requests until completed.

    If false, creation will be slightly slower and cannot be done in a transaction.

  • product (datacube.model.DatasetType) – Product to add
Return type:

datacube.model.DatasetType

add_document(definition)[source]

Add a Product using its difinition

Parameters:definition (dict) – product definition document
Return type:datacube.model.DatasetType
can_update(product, allow_unsafe_updates=False)[source]

Check if product can be updated. Return bool,safe_changes,unsafe_changes

(An unsafe change is anything that may potentially make the product incompatible with existing datasets of that type)

Parameters:
Return type:

bool,list[change],list[change]

from_doc(definition)[source]

Create a Product from its definitions

Parameters:definition (dict) – product definition document
Return type:datacube.model.DatasetType
get(id_)[source]

Retrieve Product by id

Parameters:id (int) – id of the Product
Return type:datacube.model.DatasetType
get_all()[source]

Retrieve all Products

Return type:iter[datacube.model.DatasetType]
get_by_name(name)[source]

Retrieve Product by name

Parameters:name (str) – name of the Product
Return type:datacube.model.DatasetType
get_with_fields(field_names)[source]

Return dataset types that have all the given fields.

Parameters:field_names (tuple[str]) –
Return type:__generator[DatasetType]
search(**query)[source]

Return dataset types that have all the given fields.

Parameters:query (dict) –
Return type:__generator[DatasetType]
search_robust(**query)[source]

Return dataset types that match match-able fields and dict of remaining un-matchable fields.

Parameters:query (dict) –
Return type:__generator[(DatasetType, dict)]
update(product, allow_unsafe_updates=False, allow_table_lock=False)[source]

Update a product. Unsafe changes will throw a ValueError by default.

(An unsafe change is anything that may potentially make the product incompatible with existing datasets of that type)

Parameters:
  • product (datacube.model.DatasetType) – Product to update
  • allow_unsafe_updates (bool) – Allow unsafe changes. Use with caution.
  • allow_table_lock

    Allow an exclusive lock to be taken on the table while creating the indexes. This will halt other user’s requests until completed.

    If false, creation will be slower and cannot be done in a transaction.

Return type:

datacube.model.DatasetType

update_document(definition, allow_unsafe_updates=False, allow_table_lock=False)[source]

Update a Product using its definition

Parameters:
  • allow_unsafe_updates (bool) – Allow unsafe changes. Use with caution.
  • definition (dict) – product definition document
  • allow_table_lock

    Allow an exclusive lock to be taken on the table while creating the indexes. This will halt other user’s requests until completed.

    If false, creation will be slower and cannot be done in a transaction.

Return type:

datacube.model.DatasetType