What’s New

v1.7.1 (???)

  • New virtual product combinator reproject for on-the-fly reprojection of rasters (PR 773)
  • Enhancements to the expressions transformation in virtual products (PR 776, PR 761)
  • Support /vsi** style paths for dataset locations (PR 825)

v1.7.0 (16 May 2019)

Not a lot of changes since rc1.

  • Early exit from dc.load on KeyboardInterrupt, allows partial loads inside notebook.
  • Some bug fixes in geometry related code
  • Some cleanups in tests
  • Pre-commit hooks configuration for easier testing
  • Re-enable multi-threaded reads for s3aio driver. Set use_threads to True in dc.load()

v1.7.0rc1 (18 April 2019)

Virtual Products

Add Virtual Products for multi-product loading.

(PR 522, PR 597, PR 601, PR 612, PR 644, PR 677, PR 699, PR 700)

Changes to Data Loading

The internal machinery used when loading and reprojecting data, has been completely rewritten. The new code has been tested, but this is a complicated and fundamental part of code and there is potential for breakage.

When loading reprojected data, the new code will produce slightly different results. We don’t believe that it is any less accurate than the old code, but you cannot expect exactly the same numeric results.

Non-reprojected loads should be identical.

This change has been made for two reasons:

  1. The reprojection is now core Data Cube, and is not the responsibility of the IO driver.
  2. When loading lower resolution data, DataCube can now take advantage of available overviews.
  • New futures based IO driver interface (PR 686)

Other Changes

  • Allow specifying different resampling methods for different data variables of the same Product. (PR 551)
  • Allow all reampling methods supported by rasterio. (PR 622)
  • Bug fix (Index out of bounds causing ingestion failures)
  • Support indexing data directly from HTTP/HTTPS/S3 URLs (PR 607)
  • Renamed the command line tool datacube metadata_type to datacube metadata (PR 692)
  • More useful output from the command line datacube {product|metadata} {show|list}
  • Add optional progress_cbk to dc.load(_data) (PR 702), allows user to monitor data loading progress.
  • Thread-safe netCDF access within dc.load (PR 705)

Performance Improvements

  • Use single pass over datasets when computing bounds (PR 660)
  • Bugfixes and improved performance of dask-backed arrays (PR 547, PR 664)

Documentation Improvements

Deprecations

  • From the command line, the old query syntax for searching within vague time ranges, eg: 2018-03 < time < 2018-04 has been removed. It is unclear exactly what that syntax should mean, whether to include or exclude the months specified. It is replaced by time in [2018-01, 2018-02] which has the same semantics as dc.load time queries. (PR 709)

v1.6.1 (27 August 2018)

Correction release. By mistake, v1.6.0 was identical to v1.6rc2!

v1.6.0 (23 August 2018)

  • Enable use of aliases when specifying band names
  • Fix ingestion failing after the first run (PR 510)
  • Docker images now know which version of ODC they contain (PR 523)
  • Fix data loading when nodata is NaN (PR 531)
  • Allow querying based on python datetime.datetime objects. (PR 499)
  • Require rasterio 1.0.2 or higher, which fixes several critical bugs when loading and reprojecting from multi-band files.
  • Assume fixed paths for id and sources metadata fields (issue 482)
  • datacube.model.Measurement was put to use for loading in attributes and made to inherit from dict to preserve current behaviour. (PR 502)
  • Updates when indexing data with datacube dataset add (See PR 485, issue 451 and issue 480)
  • Preliminary API for indexing datasets (PR 511)
  • Enable creation of MetadataTypes without having an active database connection (PR 535)

v1.6rc2 (29 June 2018)

Backwards Incompatible Changes

  • The helpers.write_geotiff() function has been updated to support files smaller than 256x256. It also no longer supports specifying the time index. Before passing data in, use xarray_data.isel(time=<my_time_index>). (PR 277)

  • Removed product matching options from datacube dataset update (PR 445). No matching is needed in this case as all datasets are already in the database and are associated to products.

  • Removed --match-rules option from datacube dataset add (PR 447)

  • The seldom-used stack keyword argument has been removed from Datcube.load. (PR 461)

  • The behaviour of the time range queries has changed to be compatible with standard Python searches (eg. time slice an xarray). Now the time range selection is inclusive of any unspecified time units. (PR 440)

    Example 1:

    time=('2008-01', '2008-03') previously would have returned all data from the start of 1st January, 2008 to the end of 1st of March, 2008. Now, this query will return all data from the start of 1st January, 2008 and 23:59:59.999 on 31st of March, 2008.

    Example 2:

    To specify a search time between 1st of January and 29th of February, 2008 (inclusive), use a search query like time=('2008-01', '2008-02'). This query is equivalent to using any of the following in the second time element:

    ('2008-02-29')
    ('2008-02-29 23')
    ('2008-02-29 23:59')
    ('2008-02-29 23:59:59')
    ('2008-02-29 23:59:59.999')

Changes

  • A --location-policy option has been added to the datacube dataset update command. Previously this command would always add a new location to the list of URIs associated with a dataset. It’s now possible to specify archive and forget options, which will mark previous location as archived or remove them from the index altogether. The default behaviour is unchanged. (PR 469)

  • The masking related function describe_variable_flags() now returns a pandas DataFrame by default. This will display as a table in Jupyter Notebooks. (PR 422)

  • Usability improvements in datacube dataset [add|update] commands (issue 447, issue 448, issue 398)

    • Embedded documentation updates
    • Deprecated --auto-match (it was always on anyway)
    • Renamed --dtype to --product (the old name will still work, but with a warning)
    • Add option to skip lineage data when indexing (useful for saving time when testing) (PR 473)
  • Enable compression for metadata documents stored in NetCDFs generated by stacker and ingestor (issue 452)

  • Implement better handling of stacked NetCDF files (issue 415)

    • Record the slice index as part of the dataset location URI, using #part=<int> syntax, index is 0-based
    • Use this index when loading data instead of fuzzy searching by timestamp
    • Fall back to the old behaviour when #part=<int> is missing and the file is more than one time slice deep
  • Expose the following dataset fields and make them searchable:

    • indexed_time (when the dataset was indexed)
    • indexed_by (user who indexed the dataset)
    • creation_time (creation of dataset: when it was processed)
    • label (the label for a dataset)

    (See PR 432 for more details)

Bug Fixes

  • The .dimensions property of a product no longer crashes when product is missing a grid_spec. It instead defaults to time,y,x
  • Fix a regression in v1.6rc1 which made it impossible to run datacube ingest to create products which were defined in 1.5.5 and earlier versions of ODC. (issue 423, PR 436)
  • Allow specifying the chunking for string variables when writing NetCDFs (issue 453)

v1.6rc1 Easter Bilby (10 April 2018)

This is the first release in a while, and so there’s a lot of changes, including some significant refactoring, with the potential having issues when upgrading.

Backwards Incompatible Fixes

  • Drop Support for Python 2. Python 3.5 is now the earliest supported Python version.
  • Removed the old ndexpr, analytics and execution engine code. There is work underway in the execution engine branch to replace these features.

Enhancements

  • Support for third party drivers, for custom data storage and custom index implementations

  • The correct way to get an Index connection in code is to use datacube.index.index_connect().

  • Changes in ingestion configuration

    • Must now specify the Data Write Plug-ins to use. For s3 ingestion there was a top level container specified, which has been renamed and moved under storage. The entire storage section is passed through to the Data Write Plug-ins, so drivers requiring other configuration can include them here. eg:

      ...
      storage:
        ...
        driver: s3aio
        bucket: my_s3_bucket
      ...
      
  • Added a Dockerfile to enable automated builds for a reference Docker image.

  • Multiple environments can now be specified in one datacube config. See PR 298 and the Runtime Config

    • Allow specifying which index_driver should be used for an environment.
  • Command line tools can now output CSV or YAML. (Issue issue 206, PR 390)

  • Support for saving data to NetCDF using a Lambert Conformal Conic Projection (PR 329)

  • Lots of documentation updates:

    • Information about Bit Masking.
    • A description of how data is loaded.
    • Some higher level architecture documentation.
    • Updates on how to index new data.

Bug Fixes

  • Allow creation of datacube.utils.geometry.Geometry objects from 3d representations. The Z axis is simply thrown away.
  • The datacube --config_file option has been renamed to datacube --config, which is shorter and more consistent with the other options. The old name can still be used for now.
  • Fix a severe performance regression when extracting and reprojecting a small region of data. (PR 393)
  • Fix for a somewhat rare bug causing read failures by attempt to read data from a negative index into a file. (PR 376)
  • Make CRS equality comparisons a little bit looser. Trust either a Proj.4 based comparison or a GDAL based comparison. (Closed issue 243)

New Data Support

  • Added example prepare script for Collection 1 USGS data; improved band handling and downloads.
  • Add a product specification and prepare script for indexing Landsat L2 Surface Reflectance Data (PR 375)
  • Add a product specification for Sentinel 2 ARD Data (PR 342)

v1.5.4 Dingley Dahu (13th December 2017)

  • Minor features backported from 2.0:

    • Support for limit in searches
    • Alternative lazy search method find_lazy
  • Fixes:

    • Improve native field descriptions
    • Connection should not be held open between multi-product searches
    • Disable prefetch for celery workers
    • Support jsonify-ing decimals

v1.5.3 Purpler Unicorn with Starlight (16 October 2017)

  • Use cloudpickle as the celery serialiser

v1.5.2 Purpler Unicorn with Stars (28 August 2017)

  • Fix bug when reading data in native projection, but outside source area. Often hit when running datacube-stats
  • Fix error loading and fusing data using dask. (Fixes issue 276)
  • When reading data, implement skip_broken_datasets for the dask case too

v1.5.4 Dingley Dahu (13th December 2017)

  • Minor features backported from 2.0:

    • Support for limit in searches
    • Alternative lazy search method find_lazy
  • Fixes:

    • Improve native field descriptions
    • Connection should not be held open between multi-product searches
    • Disable prefetch for celery workers
    • Support jsonify-ing decimals

v1.5.3 Purpler Unicorn with Starlight (16 October 2017)

  • Use cloudpickle as the celery serialiser
  • Allow celery tests to run without installing it
  • Move datacube-worker inside the main datacube package
  • Write metadata_type from the ingest configuration if available
  • Support config parsing limitations of Python 2
  • Fix issue 303: resolve GDAL build dependencies on Travis
  • Upgrade rasterio to newer version

v1.5.2 Purpler Unicorn with Stars (28 August 2017)

  • Fix bug when reading data in native projection, but outside source area. Often hit when running datacube-stats
  • Fix error loading and fusing data using dask. (Fixes issue 276)
  • When reading data, implement skip_broken_datasets for the dask case too

v1.5.1 Purpler Unicorn (13 July 2017)

  • Fix bug issue 261. Unable to load Australian Rainfall Grid Data. This was as a result of the CRS/Transformation override functionality being broken when using the latest rasterio version 1.0a9

v1.5.0 Purple Unicorn (9 July 2017)

New Features

  • Support for AWS S3 array storage
  • Driver Manager support for NetCDF, S3, S3-file drivers.

Usability Improvements

  • When datacube dataset add is unable to add a Dataset to the index, print out the entire Dataset to make it easier to debug the problem.
  • Give datacube system check prettier and more readable output.
  • Make celery and redis optional when installing.
  • Significantly reduced disk space usage for integration tests
  • Dataset objects now have an is_active field to mirror is_archived.
  • Added index.datasets.get_archived_location_times() to see when each location was archived.

v1.4.1 (25 May 2017)

  • Support for reading multiband HDF datasets, such as MODIS collection 6
  • Workaround for rasterio issue when reprojecting stacked data
  • Bug fixes for command line arg handling

v1.4.0 (17 May 2017)

  • Adds more convenient year/date range search expressions (see PR 226)
  • Adds a simple replication utility (see PR 223)
  • Fixed issue reading products without embedded CRS info, such as bom_rainfall_grid (see issue 224)
  • Fixed issues with stacking and ncml creation for NetCDF files
  • Various documentation and bug fixes
  • Added CircleCI as a continuous build system, for previewing generated documenation on pull
  • Require xarray >= 0.9. Solves common problems caused by losing embedded flag_def and crs attributes.

v1.3.1 (20 April 2017)

  • Docs now refer to “Open Data Cube”
  • Docs describe how to use conda to install datacube.
  • Bug fixes for the stacking process.
  • Various other bug fixes and document updates.

v1.3.0

  • Updated the Postgres product views to include the whole dataset metadata document.
  • datacube system init now recreates the product views by default every time it is run, and now supports Postgres 9.6.
  • URI searches are now better supported from the cli: datacube dataset search uri = file:///some/uri/here
  • datacube user now supports a user description (via --description) when creating a user, and delete accepts multiple user arguments.
  • Platform-specific (Landsat) fields have been removed from the default eo metadata type in order to keep it minimal. Users & products can still add their own metadata types to use additional fields.
  • Dataset locations can now be archived, not just deleted. This represents a location that is still accessible but is deprecated.
  • We are now part of Open Data Cube, and have a new home at https://github.com/opendatacube/datacube-core

This release now enforces the uri index changes to be applied: it will prompt you to rerun init as an administrator to update your existing cubes: datacube -v system init (this command can be run without affecting read-only users, but will briefly pause writes)

v1.2.2

  • Added --allow-exclusive-lock flag to product add/update commands, allowing faster index updates when system usage can be halted.
  • {version} can now be used in ingester filename patterns

v1.2.0 Boring as Batman (15 February 2017)

  • Implemented improvements to dataset search and info cli outputs
  • Can now specify a range of years to process to ingest cli (e.g. 2000-2005)
  • Fixed metadata_type update cli not creating indexes (running system init will create missing ones)
  • Enable indexing of datacube generated NetCDF files. Making it much easier to pull selected data into a private datacube index. Use by running datacube dataset add selected_netcdf.nc.
  • Switch versioning system to increment the second digit instead of the third.

v1.1.18 Mushroom Milkshake (9 February 2017)

  • Added sources-policy options to dataset add cli
  • Multiple dataset search improvements related to locations
  • Keep hours/minutes when grouping data by solar_day
  • Code Changes: datacube.model.[CRS,BoundingBox,Coordinate,GeoBox have moved into datacube.utils.geometry. Any code using these should update their imports.

v1.1.17 Happy Festivus Continues (12 January 2017)

  • Fixed several issues with the geometry utils
  • Added more operations to the geometry utils
  • Updated Code Recipes to use geometry utils
  • Enabled Windows CI (python 3 only)

v1.1.16 Happy Festivus (6 January 2017)

  • Added update command to datacube dataset cli
  • Added show command to datacube product cli
  • Added list and show commands to datacube metadata_type cli
  • Added ‘storage unit’ stacker application
  • Replaced model.GeoPolygon with utils.geometry library

v1.1.15 Minion Party Hangover (1 December 2016)

  • Fixed a data loading issue when reading HDF4_EOS datasets.

v1.1.14 Minion Party (30 November 2016)

  • Added support for buffering/padding of GridWorkflow tile searches
  • Improved the Query class to make filtering by a source or parent dataset easier. For example, this can be used to filter Datasets by Geometric Quality Assessment (GQA). Use source_filter when requesting data.
  • Additional data preparation and configuration scripts
  • Various fixes for single point values for lat, lon & time searches
  • Grouping by solar day now overlays scenes in a consistent, northern scene takes precedence manner. Previously it was non-deterministic which scene/tile would be put on top.

v1.1.13 Black Goat (15 November 2016)

  • Added support for accessing data through http and s3 protocols
  • Added dataset search command for filtering datasets (lists id, product, location)
  • ingestion_bounds can again be specified in the ingester config
  • Can now do range searches on non-range fields (e.g. dc.load(orbit=(20, 30))
  • Merged several bug-fixes from CEOS-SEO branch
  • Added Polygon Drill recipe to Code Recipes

v1.1.12 Unnamed Unknown (1 November 2016)

  • Fixed the affine deprecation warning
  • Added datacube metadata_type cli tool which supports add and update
  • Improved datacube product cli tool logging

v1.1.11 Unnamed Unknown (19 October 2016)

  • Improved ingester task throughput when using distributed executor
  • Fixed an issue where loading tasks from disk would use too much memory
  • model.GeoPolygon.to_crs() now adds additional points (~every 100km) to improve reprojection accuracy

v1.1.10 Rabid Rabbit (5 October 2016)

  • Ingester can now be configured to have WELD/MODIS style tile indexes (thanks Chris Holden)
  • Added –queue-size option to datacube ingest to control number of tasks queued up for execution
  • Product name is now used as primary key when adding datasets. This allows easy migration of datasets from one database to another
  • Metadata type name is now used as primary key when adding products. This allows easy migration of products from one database to another
  • DatasetResource.has() now takes dataset id insted of model.Dataset
  • Fixed an issues where database connections weren’t recycled fast enough in some cases
  • Fixed an issue where DatasetTypeResource.get() and DatasetTypeResource.get_by_name() would cache None if product didn’t exist

v1.1.9 Pest Hippo (20 September 2016)

  • Added origin, alignment and GeoBox-based methods to model.GridSpec
  • Fixed satellite path/row references in the prepare scripts (Thanks to Chris Holden!)
  • Added links to external datasets in Indexing Data
  • Improved archive and restore command line features: datacube dataset archive and datacube dataset restore
  • Improved application support features
  • Improved system configuration documentation

v1.1.8 Last Mammoth (5 September 2016)

  • GridWorkflow.list_tiles() and GridWorkflow.list_cells() now return a Tile object
  • Added resampling parameter to Datacube.load() and GridWorkflow.load(). Will only be used if the requested data requires resampling.
  • Improved Datacube.load() like parameter behaviour. This allows passing in a xarray.Dataset to retrieve data for the same region.
  • Fixed an issue with passing tuples to functions in Analytics Expression Language
  • Added a User Guide section to the documentation containing useful code snippets
  • Reorganized project dependencies into required packages and optional ‘extras’
  • Added performance dependency extras for improving run-time performance
  • Added analytics dependency extras for analytics features
  • Added interactive dependency extras for interactivity features

v1.1.7 Bit Shift (22 August 2016)

  • Added bit shift and power operators to Analytics Expression Language
  • Added datacube product update which can be used to update product definitions
  • Fixed an issue where dataset geo-registration would be ignored in some cases
  • Fixed an issue where Execution Engine was using dask arrays by default
  • Fixed an issue where int8 data could not sometimes be retrieved
  • Improved search and data retrieval performance

v1.1.6 Lightning Roll (8 August 2016)

  • Improved spatio-temporal search performance. datacube system init must be run to benefit
  • Added info, archive and restore commands to datacube dataset
  • Added product-counts command to datacube-search tool
  • Made Index object thread-safe
  • Multiple masking API improvements
  • Improved database Index API documentation
  • Improved system configuration documentation

v1.1.5 Untranslatable Sign (26 July 2016)

  • Updated the way database indexes are patitioned. Use datacube system init --rebuild to rebuild indexes
  • Added fuse_data ingester configuration parameter to control overlaping data fusion
  • Added --log-file option to datacube dataset add command for saving logs to a file
  • Added index.datasets.count method returning number of datasets matching the query

v1.1.4 Imperfect Inspiration (12 July 2016)

  • Improved dataset search performance
  • Restored ability to index telemetry data
  • Fixed an issue with data access API returning uninitialized memory in some cases
  • Fixed an issue where dataset center_time would be calculated incorrectly
  • General improvements to documentation and usablity

v1.1.3 Speeding Snowball (5 July 2016)

  • Added framework for developing distributed, task-based application
  • Several additional Ingester performance improvements

v1.1.2 Wind Chill (28 June 2016)

This release brings major performance and usability improvements

  • Major performance improvements to GridWorkflow and Ingester
  • Ingestion can be limited to one year at a time to limit memory usage
  • Ingestion can be done in two stages (serial followed by highly parallel) by using –save-tasks/load-task options. This should help reduce idle time in distributed processing case.
  • General improvements to documentation.

v1.1.1 Good Idea (23 June 2016)

This release contains lots of fixes in preparation for the first large ingestion of Geoscience Australia data into a production version of AGDCv2.

  • General improvements to documentation and user friendliness.
  • Updated metadata in configuration files for ingested products.
  • Full provenance history is saved into ingested files.
  • Added software versions, machine info and other details of the ingestion run into the provenance.
  • Added valid data region information into metadata for ingested data.
  • Fixed bugs relating to changes in Rasterio and GDAL versions.
  • Refactored GridWorkflow to be easier to use, and include preliminary code for saving created products.
  • Improvements and fixes for bit mask generation.
  • Lots of other minor but important fixes throughout the codebase.

v1.1.0 No Spoon (3 June 2016)

This release includes restructuring of code, APIs, tools, configurations and concepts. The result of this churn is cleaner code, faster performance and the ability to handle provenance tracking of Datasets created within the Data Cube.

The major changes include:

  • The datacube-config and datacube-ingest tools have been combined into datacube.
  • Added dependency on pandas for nicer search results listing and handling.
  • Indexing and Ingesting Data have been split into separate steps.
  • Data that has been indexed can be accessed without going through the ingestion process.
  • Data can be requested in any projection and will be dynamically reprojected if required.
  • Dataset Type has been replaced by Product
  • Storage Type has been removed, and an Ingestion Configuration has taken it’s place.
  • A new Datacube Class for querying and accessing data.

1.0.4 Square Clouds (3 June 2016)

Pre-Unification release.

1.0.3 (14 April 2016)

Many API improvements.

1.0.2 (23 March 2016)

1.0.1 (18 March 2016)

1.0.0 (11 March 2016)

This release is to support generation of GA Landsat reference data.

pre-v1 (end 2015)

First working Data Cube v2 code.