ODC Configuration (Details)#
Open Data Cube configuration files, supported configuration options, and basic use cases and defaults are described in ODC Configuration (Basics).
The Open Data Cube uses configuration files and/or environment variables to determine how to connect to databases.
Summary#
The default behaviour is to read in configuration from a configuration file.
Alternatively, Raw configuration can be explicitly passed in.
Data in a configuration file can be supplemented or overridden by configuration from environment variables.
One configuration can define multiple environments, so users must choose one.
The configuration engine in 1.9 is not 100% compatible with the previous configuration engine. Advanced users and developers upgrading 1.8 systems should read the migration notes.
1. File configuration#
If Raw configuration is not passed in, ODC attempts to find a configuration file in the file system.
Only one configuration file is read.
If your previous practice was to extend a shared system configuration file with a local user configuration file, then you will now need to take a copy of the system configuration file, add your extensions to your copy, and ensure that the Open Data Cube reads from your modified file.
1a. Default Search Paths#
If no config file paths have been specified through any of the methods below (1b. through 1d.), the Open Data Cube will search for the following paths in order and use the first readable file it finds:
./datacube.conf
(in the current working directory)~/.datacube.conf
(in the user’s home directory)/etc/default/datacube.conf
/etc/datacube.conf
If none of the files in the default search path exist, then a basic default configuration is used, equivalent to:
default:
db_hostname: ''
db_database: datacube
index_driver: default
db_connection_timeout: 60
Note
Note This default config is only used after exhausting the default search path. If you have provided your own search path via any of the below methods and none of the paths exist, then an error is raised.
1b. In Python#
In Python, the config
argument can take a path to a config file:
dc = Datacube(config="/path/to/my/file.conf")
The config
argument can also take a priority list of config paths.
The first readable path in the list is used.
If none of the files in the list no configuration file can be found, a ConfigException
is raised:
dc = Datacube(config=[
"/first/path/checked",
"/second/path/checked",
"/last/path/checked",
])
The config argument can also take a cfg.ODCConfig
object. Refer to
the API documentation for more information.
1c. Via the datacube CLI#
Configuration file paths can be passed using either the datacube -C
or datacube --config
option.
The option can be specified multiple times, with paths being searched in order, and an error being raised if none can be read.
1d. Via an Environment Variable#
- ODC_CONFIG_PATH#
If config paths have not been passed in through methods 2a. or 2b. above, then they can be read from the
ODC_CONFIG_PATH
environment variable, in a UNIX Path-style colon separated list:ODC_CONFIG_PATH=/first/path/checked:/second/path/checked:/last/path/checked
2. Raw configuration#
Raw configuration can be passed in explicitly, without ever reading from a configuration file on disk.
Attempting to pass in both raw configuration and a configuration file path simultaneously will raise a
ConfigException
.
Environment variable overrides do NOT apply to configuration environments defined in raw configuration.
However new dynamic environments that do not explicitly appear in raw configuration CAN still be defined by environment variable.
2a. Via Python (str or dict)#
A valid configuration dictionary can be passed in directly to the
Datacube
constructor with the raw_config
argument, without
serialising to a string:
dc = Datacube(raw_config={
"default": {
"index_driver": "postgres",
"db_url": "postgresql:///mydb"
}
})
The raw_config
argument can also be passed config as a string, in either INI or YAML format:
dc = Datacube(raw_config="""
default:
# Connect to database mydb over local socket with OS authentication.
db_url: postgresql:///mydb
""")
2b. As a string, via the datacube CLI#
The contents of a configuration file can be passed into the datacube
CLI via the -R
or
--raw-config
command line option:
datacube --raw-config "default: {db_database: this_db}"
Output from a script that generates a configuration file dynamically can be passed in using a BASH backquote string:
datacube --raw-config "`config_file_generator --option blah`"
2c. As a string, via an Environment Variable#
If raw configuration has not been passed in via methods 1a. or 1b.
above, then the contents of a configuration file can be written in full to the
ODC_CONFIG
environment variable:
$ ODC_CONFIG="default: {db_database: this_db}"
$ datacube check # will use the this_db database
3. The Active Environment#
Each Datacube
object is associated with a particular environment. Multiple environments can be
accessed by instantiating multiple Datacube
objects. The environment associated with a particular
Datacube
object is determined when the object is first instantiated and cannot subsequently be changed.
3a. Default Environment#
If not specified by any of the methods 3b. to 3d. below, the default
environment is used. If no default
environment is known, an error is
raised. It is strongly recommended that a default
environment be defined
in all configuration files - ideally as an alias to an explicitly
defined environment.
If no environment named default
is known, but one named datacube
IS
known, the datacube
environment is used and a deprecation warning issued.
datacube
will be dropped as a legacy default environment name in a future
release.
An all-defaults environment is used when requesting an environment that does not exist.
3b. Specifying in Python#
The active environment can be selected in Python with the env
argument to
the Datacube
constructor.
If you wish to work with multiple environments simultaneously, you can create one :py:class`Datacube` object for each environment of interest and use them side by side:
dc_main = Datacube(env="main")
dc_aux = Datacube(env="aux")
dc_private = Datacube(env="private")
3c. Specifying in the CLI#
The active environment can be selected in Python with the -E
or --env
option to the datacube
CLI tool.
CLI commands that require more than one environment will have a second option for the second argument.
Refer to the --help
text for more information.
3d. Via an Environment Variable#
- ODC_ENVIRONMENT#
If not explicitly specified via methods 3a. and 3b. above, the active environment can be specified with the
$ODC_ENVIRONMENT
environment variable.
4. Generic Environment Variable Overrides#
Configuration values in config files can be overridden by setting the appropriate environment variable.
The name of overriding environment variables are all upper-case and structured:
$ODC_{environment name}_{option name}
E.g. to override the db_password
field in the main
environment,
set the $ODC_MAIN_DB_PASSWORD
environment variable.
Environment variable overrides are NOT applied to environments defined in raw configuration that was passed in explicitly as a string or dictionary.
4a. Dynamic Environments#
It is possible for environments to be defined dynamically purely in environment variables.
E.g. given the following active configuration file:
default:
alias: main
main:
index_driver: postgres
db_url: postgresql://myuser:mypassword@server.domain/main
and the following defined environment variables:
ODC_AUX_INDEX_DRIVER=postgis
ODC_AUX_DB_URL=postgres://auxuser:secret@backup.domain/aux
You can request the aux
environment and its configuration will be
dynamically read from the environment variables, even though the “aux”
environment is not mentioned in the configuration file at all.
Note
Environment variables are read when first accessing to a named environment (usually just before connecting to a database from that environment). Dynamic changes to environment variables after first access have no effect.
Environment variables cannot override values included in Raw configuration, but can still be used to create dynamic environments.
4b. Environment Variable Overrides and Environment Aliases#
Aliases can only be defined in raw configuration or in config files - they cannot be defined through environment variables.
i.e. defining ODC_ENV2_ALIAS=env1
does NOT create an env2
alias to the env1
environment.
A configuration file may define an environment which is an alias to an environment that is to be loaded dynamically and is NOT defined in the configuration file.
Aliases (created in raw config or a config file) ARE honoured when interpreting environment variables.
E.g. Given config file:
default:
alias: main
common:
alias: main
main:
index_driver: postgis
db_url: postgresql://uid:pwd@server.domain:5432/main
The “main” environment url can be overridden with ANY of the following environment variables:
$ODC_DEFAULT_DB_URL
$ODC_COMMON_DB_URL
$ODC_MAIN_DB_URL
The environment variable using the canonical environment name ($ODC_MAIN_DB_URL
in this case) always
takes precedence if it is set. If more than one alias environment name is used (e.g. if both $ODC_DEFAULT_DB_URL
AND $ODC_COMMON_DB_URL
exist and $ODC_MAIN_DB_URL
does not) then only one will be read and the
implementation makes no guarantees about which. Therefore canonical environment names are strongly recommended
for environment variable names where possible.
4c. Deprecated Legacy Environment Variables#
Some legacy environment variable names are also read for backwards compatibility reasons, however they may not work as expected where more than one ODC environment is in use and will generate a deprecation warning if they are read from. The preferred new environment variable name will be included in the text of the deprecation warning.
Most notably the old database connection environment variables:
$DB_DATABASE
$DB_HOSTNAME
$DB_PORT
$DB_USERNAME
$DB_PASSWORD
apply to ALL environments, and are deprecated.
The new preferred configuration environment variable names all begin with ODC_
Migrating from datacube-1.8#
The new configuration engine introduced in datacube-1.9 is not fully backwards compatible with that used previously. This section notes the changes which administrators, maintainers and developers should be aware of before upgrading.
Merging multiple config files#
Previously, multiple config files could be read simultaneously and merged with “higher priority” files being read later, and overriding the contents of “lower priority” files.
This is no longer supported. Only one configuration file is now read.
Where users previously created a local personal configuration file that supplemented a global system configuration file, they should now make a copy of the global system configuration file, edit it with their own personal extensions, and ensure that it is read in preference to the global file - or choose one of the other methods for passing in configuration.
The special “user” section is also no longer supported as it doesn’t make sense without merging of multiple config files.
Legacy Environment Variables#
Legacy environment variables are deprecated, but still read to assist with migration. In all cases there is a new preferred environment variable, as listed in the table below.
Legacy Environment Variable |
New Environment Variable(s) |
Notes |
---|---|---|
DATACUBE_CONFIG_PATH |
Behaviour is different for lists of paths, due to only reading a single file. |
|
DATACUBE_DB_URL |
ODC_<env_name>_DB_URL |
These legacy environment variables apply to ALL environments - which is probably not what you want in a multi-db scenario. |
DB_DATABASE |
ODC_<env_name>_DB_DATABASE |
|
DB_HOSTNAME |
ODC_<env_name>_DB_HOSTNAME |
|
DB_PORT |
ODC_<env_name>_DB_PORT |
|
DB_USERNAME |
ODC_<env_name>_DB_USERNAME |
|
DB_PASSWORD |
ODC_<env_name>_DB_PASSWORD |
|
DATACUBE_ENVIRONMENT |
datacube-1.8 used this legacy environment variable fairly inconsistently. There are several corner cases where it is now read where it was not previously. |
API changes#
Details of the new API are described in Configuration API.
The old datacube.config.LocalConfig
class has been replaced by datacube.cfg.ODCConfig
and datacube.cfg.ODCEnvironment
classes.
For most users the only method you need is ODCConfig.get_environment()
The auto_config() function#
There used to be an undocumented auto_config()
function (also available through python -m datacube
) that read
in the configuration (from multiple files and environment variables) and wrote it out as a single consolidated
configuration file.
As the new configuration engine is more clearly documented and more predictable in its behaviour, this functionality is no longer required.