# Virtual Products¶

## Introduction¶

Virtual products enable ODC users to declaratively combine data from multiple products and perform on-the-fly computation while the data is loaded. The workflow is deduced from a lightweight configuration that can help datacube optimize the query to avoid loading data that would be eventually discarded.

An example virtual product would be a cloud-free surface reflectance (SR) product derived from a base surface reflectance product and a pixel quality (PQ) product that classifies cloud. Virtual products are especially useful when the datasets from the different products have the same spatio-temporal extent and the operations are to be applied pixel-by-pixel.

Functionalities related to virtual products are mainly in the datacube.virtual module.

## Design¶

Virtual products are constructed by applying a set of combinators to either existing datacube products or other virtual products. That is, a virtual product can be viewed as a tree whose nodes are combinators and leaves are ordinary datacube products.

Continuing the example in the previous section, consider the configuration (or the “recipe”) for a cloud-free SR product from SR products for two sensors (ls7_nbar_albers and ls8_nbar_albers) and their corresponding PQ products (ls7_pq_albers and ls8_pq_albers):

from datacube.virtual import construct_from_yaml

cloud_free_ls_nbar = construct_from_yaml("""
collate:
input:
juxtapose:
- product: ls7_nbar_albers
measurements: [red, green, blue]
input:
product: ls7_pq_albers
flags:
blue_saturated: false
cloud_acca: no_cloud
contiguous: true
green_saturated: false
nir_saturated: false
red_saturated: false
swir1_saturated: false
swir2_saturated: false
input:
juxtapose:
- product: ls8_nbar_albers
measurements: [red, green, blue]
input:
product: ls8_pq_albers
flags:
blue_saturated: false
cloud_acca: no_cloud
contiguous: true
green_saturated: false
nir_saturated: false
red_saturated: false
swir1_saturated: false
swir2_saturated: false
""")


The virtual product cloud_free_ls_nbar can now be used to load cloud-free SR imagery. The dataflow for loading the data reflects the tree structure of the recipe:

## Combinators¶

Currently, there are six combinators for creating virtual products:

### 1. product¶

The recipe to construct a virtual product from an existing datacube product has the form:

{'product': <product-name>, **settings}


where settings can include datacube.Datacube.load() settings such as:

• measurements
• output_crs, resolution, align
• resampling
• group_by, fuse_func

The product nodes are at the leaves of the virtual product syntax tree.

### 2. collate¶

This combinator concatenates observations from multiple sensors having the same set of measurements. The recipe for a collate node has the form:

{'collate': [<virtual-product-1>,
<virtual-product-2>,
...,
<virtual-product-N>]}


Observations from different sensors get interlaced:

Optionally, the source product of a pixel can be captured by introducing another measurement in the loaded data that consists of the index of the source product:

{'collate': [<virtual-product-1>,
<virtual-product-2>,
...,
<virtual-product-N>],
'index_measurement_name': <measurement-name>}


### 3. transform¶

This node applies an on-the-fly data transformation on the loaded data. The recipe for a transform has the form:

{'transform': <transformation-class>,
'input': <input-virtual-product>,
**settings}


where the settings are keyword arguments to the initializer of the transformation class that implements the datacube.virtual.Transformation interface:

class Transformation:
def __init__(self, **settings):
""" Initialize the transformation object with the given settings. """

def compute(self, data):
""" xarray.Dataset -> xarray.Dataset """

def measurements(self, input_measurements):
""" Dict[str, Measurement] -> Dict[str, Measurement] """


ODC has a (growing) set of built-in transformations:

• make_mask
• apply_mask
• to_float
• rename
• select
• expressions

For more information on transformations, see User-defined transformations.

### 4. juxtapose¶

This node merges disjoint sets of measurements from different products into one. The form of the recipe is:

{'juxtapose': [<virtual-product-1>,
<virtual-product-2>,
...,
<virtual-product-N>]}


Observations without corresponding entries in the other products will get dropped.

### 5. aggregate¶

This combinator performs (partial) temporal reduction, that is, statistics, on the loaded data. The form of the recipe for this is:

{'aggregate': <transformation-class>,
'group_by': <grouping-function>,
'input': <input-virtual-product>,
**settings}


As in transform, the settings are keyword arguments to the initializer of the transformation class that performs the time reduction. The grouping function takes the timestamp of the input dataset and returns another timestamp to be assigned to the group it would belong to. Some grouping functions (year, month, week, day) are built-in.

The only built-in statistics class in ODC at the moment is xarray_reduction which applies a reducing method of the xarray.DataArray object to each individual band. Users can define their own aggregate transformations as in User-defined transformations.

### 6. reproject¶

This performs an on-the-fly reprojection of loaded data to a specified CRS and resolution. The recipe looks like:

{'reproject': {'output_crs': <crs-string>,
'resolution': [<y-resolution>, <x-resolution>],
'align': [<y-alignment>, <x-alignment>]},
'input':  <input-virtual-product>,
'resampling': <resampling-settings>}


Here align and resampling are optional (defaults to edge-aligned and nearest neighbor respectively). This combinator enables performing transformations to raster data in their native grids before aligning different rasters to a common grid.

## Using virtual products¶

Virtual products provide a common interface to query and then to load the data. The relevant methods are:

query(dc, **search_terms)
Retrieves datasets that match the search_terms from the database index of the datacube instance dc.
group(datasets, **group_settings)
Groups the datasets from query by the timestamps. Does not connect to the database.
fetch(grouped, **load_settings)
Loads the data from the grouped datasets according to load_settings. Does not connect to the database. The on-the-fly transformations are applied at this stage. To load data lazily using dask, the preferred dask_chunks size can be specified in the load_settings.

Currently, virtual products also provide a load(dc, **query) method that roughly correspond to dc.load. However, this method exists only to facilitate code migration, and its extensive use is not recommended. It implements the pipeline:

For advanced use cases, the intermediate objects VirtualDatasetBag and VirtualDatasetBox may be directly manipulated.

## User-defined transformations¶

Custom transformations must inherit from datacube.virtual.Transformation. If the user-defined transformation class is already installed in the Python environment the datacube instance is running from, the recipe may refer to it by its fully qualified name. Otherwise, for example for a transformation defined in a Notebook, the virtual product using the custom transformation is best constructed using the combinators directly.

For example, calculating the NDVI from a SR product (say, ls8_nbar_albers) would look like:

from datacube.virtual import construct, Transformation, Measurement

class NDVI(Transformation):
def compute(self, data):
result = ((data.nir - data.red) / (data.nir + data.red))
return result.to_dataset(name='NDVI')

def measurements(self, input_measurements):
return {'NDVI': Measurement(name='NDVI', dtype='float32', nodata=float('nan'), units='1')}

ndvi = construct(transform=NDVI, input=dict(product='ls8_nbar_albers', measurements=['red', 'nir'])

ndvi_data = ndvi.load(dc, **search_terms)


for the required geo-spatial search_terms. Note that the measurement method describes the output from the compute method.

Note

We assume that the user-defined transformations are dask-friendly, otherwise loading data using dask may be broken. Also, method names starting with _transform_ are reserved for internal use.