auto Driver

Many of the supported TensorStore drivers support format auto-detection: an existing TensorStore can be opened by specifying just a key-value store, and the appropriate TensorStore driver is determined automatically.

Both TensorStore drivers, such as zarr3 or jpeg, and key-value store drivers, such as zip or ocdbt, can be auto-detected.

Format auto-detection is used implicitly whenever a KvStore JSON spec or KvStore URL is specified in place of a TensorStore JSON spec.

Format auto-detection can also be requested explicitly using the auto driver identifier or the TensorStoreUrl/auto.

Note

Auto-detection is performed when opening the TensorStore. The detected driver can be determined by querying the Spec of the open TensorStore.

Warning

Auto-detection involves additional read requests to determine the format and adds latency to the open operation. While the number of bytes read is small, in cases where limiting the number of read requests, and/or limiting the latency of open operations is important, auto-detection should be avoided.

Hint

In distributed execution settings, where the same TensorStore may be opened concurrently by many processes, if auto-detection is required, it is recommended to first auto-detect the format from a single process (e.g. a controller process) and then use the resolved spec to open the TensorStore from worker processes.

json driver/auto : object
Extends:
  • TensorStore — Specifies a TensorStore to open/create.

Required members:
driver : "auto"
kvstore : KvStore | KvStoreUrl

Specifies the underlying storage mechanism.

Optional members:
context : Context

Specifies context resources that augment/override the parent context.

dtype : dtype

Specifies the data type.

rank : integer[0, 32]

Specifies the rank of the TensorStore.

If transform is also specified, the input rank must match. Otherwise, the rank constraint applies to the driver directly.

transform : IndexTransform

Specifies a transform.

schema : Schema

Specifies constraints on the schema.

When opening an existing array, specifies constraints on the existing schema; opening will fail if the constraints do not match. Any soft constraints specified in the chunk_layout are ignored. When creating a new array, a suitable schema will be selected automatically based on the specified schema constraints in combination with any driver-specific constraints.

<resource-type> : ContextResource

Specifies a context resource for use by the detected format drivers.

Unlike context, these resources do not also apply to kvstore. In almost all cases, context can be used instead. This additional override is needed solely to allow certain edge cases to be correctly represented as JSON.

Example

{
  "driver": "auto",
  "kvstore": {"driver": "gcs", "bucket": "my-bucket", "path": "path/to/dataset.zarr/"}
}
json TensorStoreUrl/auto : string

auto: TensorStore URL scheme

Automatic format detection may be specified using the auto: URL syntax.

This URL scheme is used implicitly if a KvStore URL is specified when a TensorStore URL is required, and does not normally need to be included explicitly.

Examples

URL representation

JSON representation

"file:///tmp/dataset.zarr/|auto:"

{"driver": "auto",
 "kvstore": {"driver": "file",
             "path": "/tmp/dataset.zarr/"}
}

"file:///tmp/dataset.zarr/|auto:|cast:uint32"

{"driver": "cast",
 "dtype": "uint32",
 "base": {"driver": "auto",
          "kvstore": {"driver": "file",
                      "path": "/tmp/dataset.zarr/"}}
}
Extends:

Examples

Auto-detecting an array

A zarr3 TensorStore can be detected from its path:

>>> # Create new array
>>> await ts.open("file://tmp/dataset.zarr|zarr3",
...               dtype="int32",
...               shape=[5],
...               create=True)
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'file_io_concurrency': {},
    'file_io_locking': {},
    'file_io_memmap': False,
    'file_io_sync': True,
  },
  'driver': 'zarr3',
  'dtype': 'int32',
  'kvstore': {'driver': 'file', 'path': 'tmp/dataset.zarr/'},
  'metadata': {
    'chunk_grid': {'configuration': {'chunk_shape': [5]}, 'name': 'regular'},
    'chunk_key_encoding': {'name': 'default'},
    'codecs': [{'configuration': {'endian': 'little'}, 'name': 'bytes'}],
    'data_type': 'int32',
    'fill_value': 0,
    'node_type': 'array',
    'shape': [5],
    'zarr_format': 3,
  },
  'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]},
})
>>> # Open with auto-detection
>>> await ts.open("file://tmp/dataset.zarr")
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'file_io_concurrency': {},
    'file_io_locking': {},
    'file_io_memmap': False,
    'file_io_sync': True,
  },
  'driver': 'zarr3',
  'dtype': 'int32',
  'kvstore': {'driver': 'file', 'path': 'tmp/dataset.zarr/'},
  'metadata': {
    'chunk_grid': {'configuration': {'chunk_shape': [5]}, 'name': 'regular'},
    'chunk_key_encoding': {'name': 'default'},
    'codecs': [{'configuration': {'endian': 'little'}, 'name': 'bytes'}],
    'data_type': 'int32',
    'fill_value': 0,
    'node_type': 'array',
    'shape': [5],
    'zarr_format': 3,
  },
  'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]},
})

Explicitly constructing a Spec demonstrates the explicit syntax for using the auto driver:

>>> ts.Spec("file://tmp/dataset|auto")
Spec({'driver': 'auto', 'kvstore': {'driver': 'file', 'path': 'tmp/dataset'}})
>>> ts.Spec("file://tmp/dataset")
Spec({'driver': 'auto', 'kvstore': {'driver': 'file', 'path': 'tmp/dataset'}})

Chaining TensorStore adapters

TensorStore adapters like cast can also be used in conjunction with format auto-detection:

>>> ts.Spec("file://tmp/dataset.zarr|cast:int64")
Spec({
  'base': {
    'driver': 'auto',
    'kvstore': {'driver': 'file', 'path': 'tmp/dataset.zarr'},
  },
  'driver': 'cast',
  'dtype': 'int64',
})
>>> ts.Spec("file://tmp/dataset.zarr|auto|cast:int64")
Spec({
  'base': {
    'driver': 'auto',
    'kvstore': {'driver': 'file', 'path': 'tmp/dataset.zarr'},
  },
  'driver': 'cast',
  'dtype': 'int64',
})
>>> await ts.open("file://tmp/dataset.zarr|cast:int64")
TensorStore({
  'base': {
    'driver': 'zarr3',
    'dtype': 'int32',
    'kvstore': {'driver': 'file', 'path': 'tmp/dataset.zarr/'},
    'metadata': {
      'chunk_grid': {
        'configuration': {'chunk_shape': [5]},
        'name': 'regular',
      },
      'chunk_key_encoding': {'name': 'default'},
      'codecs': [{'configuration': {'endian': 'little'}, 'name': 'bytes'}],
      'data_type': 'int32',
      'fill_value': 0,
      'node_type': 'array',
      'shape': [5],
      'zarr_format': 3,
    },
  },
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'file_io_concurrency': {},
    'file_io_locking': {},
    'file_io_memmap': False,
    'file_io_sync': True,
  },
  'driver': 'cast',
  'dtype': 'int64',
  'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]},
})

Multiple auto-detection steps

Multiple steps of auto-detection are also possible. Here, a zarr3 TensorStore at the root of an OCDBT database can also be detected just from the path to the OCDBT database.

>>> # Create new array within new OCDBT database
>>> await ts.open("file://tmp/dataset.ocdbt|ocdbt|zarr3",
...               dtype="int32",
...               shape=[5],
...               create=True)
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'file_io_concurrency': {},
    'file_io_locking': {},
    'file_io_memmap': False,
    'file_io_sync': True,
    'ocdbt_coordinator': {},
  },
  'driver': 'zarr3',
  'dtype': 'int32',
  'kvstore': {
    'base': {'driver': 'file', 'path': 'tmp/dataset.ocdbt/'},
    'config': {
      'compression': {'id': 'zstd'},
      'max_decoded_node_bytes': 8388608,
      'max_inline_value_bytes': 100,
      'uuid': '...',
      'version_tree_arity_log2': 4,
    },
    'driver': 'ocdbt',
  },
  'metadata': {
    'chunk_grid': {'configuration': {'chunk_shape': [5]}, 'name': 'regular'},
    'chunk_key_encoding': {'name': 'default'},
    'codecs': [{'configuration': {'endian': 'little'}, 'name': 'bytes'}],
    'data_type': 'int32',
    'fill_value': 0,
    'node_type': 'array',
    'shape': [5],
    'zarr_format': 3,
  },
  'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]},
})
>>> # Open with auto-detection
>>> await ts.open("file://tmp/dataset.ocdbt")
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'file_io_concurrency': {},
    'file_io_locking': {},
    'file_io_memmap': False,
    'file_io_sync': True,
    'ocdbt_coordinator': {},
  },
  'driver': 'zarr3',
  'dtype': 'int32',
  'kvstore': {
    'base': {'driver': 'file', 'path': 'tmp/dataset.ocdbt/'},
    'config': {
      'compression': {'id': 'zstd'},
      'max_decoded_node_bytes': 8388608,
      'max_inline_value_bytes': 100,
      'uuid': '...',
      'version_tree_arity_log2': 4,
    },
    'driver': 'ocdbt',
  },
  'metadata': {
    'chunk_grid': {'configuration': {'chunk_shape': [5]}, 'name': 'regular'},
    'chunk_key_encoding': {'name': 'default'},
    'codecs': [{'configuration': {'endian': 'little'}, 'name': 'bytes'}],
    'data_type': 'int32',
    'fill_value': 0,
    'node_type': 'array',
    'shape': [5],
    'zarr_format': 3,
  },
  'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]},
})

Note that auto-detection fails if the zarr array is not at the root of the OCDBT database:

>>> # Create new array within new OCDBT database
>>> await ts.open(
...     "file://tmp/dataset2.ocdbt|ocdbt:path/within/database|zarr3",
...     dtype="int32",
...     shape=[5],
...     create=True)
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'file_io_concurrency': {},
    'file_io_locking': {},
    'file_io_memmap': False,
    'file_io_sync': True,
    'ocdbt_coordinator': {},
  },
  'driver': 'zarr3',
  'dtype': 'int32',
  'kvstore': {
    'base': {'driver': 'file', 'path': 'tmp/dataset2.ocdbt/'},
    'config': {
      'compression': {'id': 'zstd'},
      'max_decoded_node_bytes': 8388608,
      'max_inline_value_bytes': 100,
      'uuid': '...',
      'version_tree_arity_log2': 4,
    },
    'driver': 'ocdbt',
    'path': 'path/within/database/',
  },
  'metadata': {
    'chunk_grid': {'configuration': {'chunk_shape': [5]}, 'name': 'regular'},
    'chunk_key_encoding': {'name': 'default'},
    'codecs': [{'configuration': {'endian': 'little'}, 'name': 'bytes'}],
    'data_type': 'int32',
    'fill_value': 0,
    'node_type': 'array',
    'shape': [5],
    'zarr_format': 3,
  },
  'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]},
})
>>> # Open with auto-detection
>>> await ts.open("file://tmp/dataset2.ocdbt")
Traceback (most recent call last):
    ...
ValueError: FAILED_PRECONDITION: Error opening "auto" driver: Failed to detect format for "" in OCDBT database at local file "tmp/dataset2.ocdbt/"...

Multi-step auto-detection algorithm

Given a base key-value store, auto-detection of the final TensorStore driver proceeds as follows:

  1. A single auto-detection step is applied to the current base key-value store, which results in a list of candidate formats.

  2. There are no candidate formats, or more than one candidate format, auto-detection fails with an error.

  3. Two kinds of drivers can be detected:

    1. If the detected format is a TensorStore driver, it is applied to the current base key-value store, opened, and auto-detection is complete.

    2. If the detected format is a key-value store adapter driver, it is applied to the current base key-value store, and opened. The adapted key-value store becomes the new base key-value store and detection continues at step 1.

Single-step auto-detection algorithm

  1. If the base key-value store potentially specifies a single file (i.e. it has a non-empty path not ending /), single file-format detection is attempted.

    1. Each single-file format that supports auto-detection specifies the number of bytes at the beginning and end of the file that are required for auto-detection.

    2. The prefix and suffix of the file is requested, using the maximum prefix/suffix length required by any format for auto-detection.

    3. If the file is not found, directory format detection continues at step 2.

    4. If the file is found, the single-file formats that match the prefix and suffix read from the file are returned as candidates.

  2. If the base key-value store refers to a directory, directory format detection is attempted.

    1. Each directory format that supports auto-detection specifies one or more relative paths that should be checked to determine if they are present.

    2. The complete set of relative paths required by any directory format is checked.

    3. The directory formats that match (based on the set of relative paths that are present) are returned as candidates.