tensorstore.open - TensorStore

Opens or creates a TensorStore from a Spec.

>>> store = await ts.open(
...     {
...         'driver': 'zarr',
...         'kvstore': {
...             'driver': 'memory'
...         }
...     },
...     create=True,
...     dtype=ts.int32,
...     shape=[1000, 2000, 3000],
...     chunk_layout=ts.ChunkLayout(inner_order=[2, 1, 0]),
... )
>>> store
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'memory_key_value_store': {},
  },
  'driver': 'zarr',
  'dtype': 'int32',
  'kvstore': {'driver': 'memory'},
  'metadata': {
    'chunks': [101, 101, 101],
    'compressor': {
      'blocksize': 0,
      'clevel': 5,
      'cname': 'lz4',
      'id': 'blosc',
      'shuffle': -1,
    },
    'dimension_separator': '.',
    'dtype': '<i4',
    'fill_value': None,
    'filters': None,
    'order': 'F',
    'shape': [1000, 2000, 3000],
    'zarr_format': 2,
  },
  'transform': {
    'input_exclusive_max': [[1000], [2000], [3000]],
    'input_inclusive_min': [0, 0, 0],
  },
})

Parameters:¶

spec: Spec | Any¶

TensorStore Spec to open. May also be specified as JSON.

read: bool | None = None¶

Allow read access. Defaults to True if neither read nor write is specified.

write: bool | None = None¶

Allow write access. Defaults to True if neither read nor write is specified.

open_mode: OpenMode | None = None¶

Overrides the existing open mode.

open: bool | None = None¶

Allow opening an existing TensorStore. Overrides the existing open mode.

create: bool | None = None¶

Allow creating a new TensorStore. Overrides the existing open mode. To open or create, specify create=True and open=True.

delete_existing: bool | None = None¶

Delete any existing data before creating a new array. Overrides the existing open mode. Must be specified in conjunction with create=True.

assume_metadata: bool | None = None¶

Neither read nor write stored metadata. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata need not even exist. Operations such as resizing that modify the stored metadata are not supported. Overrides the existing open mode. Requires that open is True and delete_existing is False. This option takes precedence over assume_cached_metadata if that option is also specified.

Warning

This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.

assume_cached_metadata: bool | None = None¶

Skip reading the metadata when opening. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata may still be accessed by subsequent operations that need to re-validate or modify the metadata. Requires that open is True and delete_existing is False. The assume_metadata option takes precedence if also specified.

Warning

This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.

context: Context | None = None¶

Shared resource context. Defaults to a new (unshared) context with default options, as returned by tensorstore.Context(). To share resources, such as cache pools, between multiple open TensorStores, you must specify a context.

transaction: Transaction | None = None¶

Transaction to use for opening/creating, and for subsequent operations. By default, the open is non-transactional.

Note

To perform transactional operations using a TensorStore that was previously opened without a transaction, use TensorStore.with_transaction.

batch: Batch | None = None¶

Batch to use for reading any metadata required for opening.

Warning

If specified, the returned Future will not, in general, become ready until the batch is submitted. Therefore, immediately awaiting the returned future will lead to deadlock.

kvstore: KvStore.Spec | None = None¶

Sets the associated key-value store used as the underlying storage.

If the kvstore has already been set, it is overridden.

It is an error to specify this if the TensorStore driver does not use a key-value store.

recheck_cached_metadata: RecheckCacheOption | None = None¶

Time after which cached metadata is assumed to be fresh. Cached metadata older than the specified time is revalidated prior to use. The metadata is used to check the bounds of every read or write operation.

Specifying True means that the metadata will be revalidated prior to every read or write operation. With the default value of "open", any cached metadata is revalidated when the TensorStore is opened but is not rechecked for each read or write operation.

recheck_cached_data: RecheckCacheOption | None = None¶

Time after which cached data is assumed to be fresh. Cached data older than the specified time is revalidated prior to being returned from a read operation. Partial chunk writes are always consistent regardless of the value of this option.

The default value of True means that cached data is revalidated on every read. To enable in-memory data caching, you must both specify a cache_pool with a non-zero total_bytes_limit and also specify False, "open", or an explicit time bound for recheck_cached_data.

recheck_cached: RecheckCacheOption | None = None¶

Sets both recheck_cached_data and recheck_cached_metadata.

rank: int | None = None¶

Constrains the rank of the TensorStore. If there is an index transform, the rank constraint must match the rank of the input space.

dtype: dtype | None = None¶

Constrains the data type of the TensorStore. If a data type has already been set, it is an error to specify a different data type.

domain: IndexDomain | None = None¶

Constrains the domain of the TensorStore. If there is an existing domain, the specified domain is merged with it as follows:

The rank must match the existing rank.
All bounds must match, except that a finite or explicit bound is permitted to match an infinite and implicit bound, and takes precedence.
If both the new and existing domain specify non-empty labels for a dimension, the labels must be equal. If only one of the domains specifies a non-empty label for a dimension, the non-empty label takes precedence.

Note that if there is an index transform, the domain must match the input space, not the output space.

shape: Sequence[int] | None = None¶

Constrains the shape and origin of the TensorStore. Equivalent to specifying a domain of ts.IndexDomain(shape=shape).

Note

This option also constrains the origin of all dimensions to be zero.

chunk_layout: ChunkLayout | None = None¶

Constrains the chunk layout. If there is an existing chunk layout constraint, the constraints are merged. If the constraints are incompatible, an error is raised.

codec: CodecSpec | None = None¶

Constrains the codec. If there is an existing codec constraint, the constraints are merged. If the constraints are incompatible, an error is raised.

fill_value: ArrayLike | None = None¶

Specifies the fill value for positions that have not been written.

The fill value data type must be convertible to the actual data type, and the shape must be broadcast-compatible with the domain.

If an existing fill value has already been set as a constraint, it is an error to specify a different fill value (where the comparison is done after normalization by broadcasting).

Specifies the physical units of each dimension of the domain.

The physical unit for a dimension is the physical quantity corresponding to a single index increment along each dimension.

A value of None indicates that the unit is unknown. A dimension-less quantity can be indicated by a unit of "".

schema: Schema | None = None¶

Additional schema constraints to merge with existing constraints.

Examples¶

Opening an existing TensorStore¶

To open an existing TensorStore, you can use a minimal Spec that specifies required driver-specific options, like the storage location. Information that can be determined automatically from the existing metadata, like the data type, domain, and chunk layout, may be omitted:

>>> store = await ts.open(
...     {
...         'driver': 'neuroglancer_precomputed',
...         'kvstore': {
...             'driver': 'gcs',
...             'bucket': 'neuroglancer-janelia-flyem-hemibrain',
...             'path': 'v1.2/segmentation/',
...         },
...     },
...     read=True)
>>> store
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'gcs_request_concurrency': {},
    'gcs_request_retries': {},
    'gcs_user_project': {},
  },
  'driver': 'neuroglancer_precomputed',
  'dtype': 'uint64',
  'kvstore': {
    'bucket': 'neuroglancer-janelia-flyem-hemibrain',
    'driver': 'gcs',
    'path': 'v1.2/segmentation/',
  },
  'multiscale_metadata': {'num_channels': 1, 'type': 'segmentation'},
  'scale_index': 0,
  'scale_metadata': {
    'chunk_size': [64, 64, 64],
    'compressed_segmentation_block_size': [8, 8, 8],
    'encoding': 'compressed_segmentation',
    'key': '8.0x8.0x8.0',
    'resolution': [8.0, 8.0, 8.0],
    'sharding': {
      '@type': 'neuroglancer_uint64_sharded_v1',
      'data_encoding': 'gzip',
      'hash': 'identity',
      'minishard_bits': 6,
      'minishard_index_encoding': 'gzip',
      'preshift_bits': 9,
      'shard_bits': 15,
    },
    'size': [34432, 39552, 41408],
    'voxel_offset': [0, 0, 0],
  },
  'transform': {
    'input_exclusive_max': [34432, 39552, 41408, 1],
    'input_inclusive_min': [0, 0, 0, 0],
    'input_labels': ['x', 'y', 'z', 'channel'],
  },
})

Creating a new TensorStore¶

To create a new TensorStore, you must specify required driver-specific options, like the storage location, as well as Schema constraints like the data type and domain. Suitable defaults are chosen automatically for schema properties that are left unconstrained:

>>> store = await ts.open(
...     {
...         'driver': 'zarr',
...         'kvstore': {
...             'driver': 'memory'
...         },
...     },
...     create=True,
...     dtype=ts.float32,
...     shape=[1000, 2000, 3000],
...     fill_value=42)
>>> store
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'memory_key_value_store': {},
  },
  'driver': 'zarr',
  'dtype': 'float32',
  'kvstore': {'driver': 'memory'},
  'metadata': {
    'chunks': [101, 101, 101],
    'compressor': {
      'blocksize': 0,
      'clevel': 5,
      'cname': 'lz4',
      'id': 'blosc',
      'shuffle': -1,
    },
    'dimension_separator': '.',
    'dtype': '<f4',
    'fill_value': 42.0,
    'filters': None,
    'order': 'C',
    'shape': [1000, 2000, 3000],
    'zarr_format': 2,
  },
  'transform': {
    'input_exclusive_max': [[1000], [2000], [3000]],
    'input_inclusive_min': [0, 0, 0],
  },
})

Partial constraints may be specified on the chunk layout, and the driver will determine a matching chunk layout automatically:

>>> store = await ts.open(
...     {
...         'driver': 'zarr',
...         'kvstore': {
...             'driver': 'memory'
...         },
...     },
...     create=True,
...     dtype=ts.float32,
...     shape=[1000, 2000, 3000],
...     chunk_layout=ts.ChunkLayout(
...         chunk_shape=[10, None, None],
...         chunk_aspect_ratio=[None, 2, 1],
...         chunk_elements=10000000,
...     ),
... )
>>> store
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'memory_key_value_store': {},
  },
  'driver': 'zarr',
  'dtype': 'float32',
  'kvstore': {'driver': 'memory'},
  'metadata': {
    'chunks': [10, 1414, 707],
    'compressor': {
      'blocksize': 0,
      'clevel': 5,
      'cname': 'lz4',
      'id': 'blosc',
      'shuffle': -1,
    },
    'dimension_separator': '.',
    'dtype': '<f4',
    'fill_value': None,
    'filters': None,
    'order': 'C',
    'shape': [1000, 2000, 3000],
    'zarr_format': 2,
  },
  'transform': {
    'input_exclusive_max': [[1000], [2000], [3000]],
    'input_inclusive_min': [0, 0, 0],
  },
})

The schema constraints allow key storage characteristics to be specified independent of the driver/format:

>>> store = await ts.open(
...     {
...         'driver': 'n5',
...         'kvstore': {
...             'driver': 'memory'
...         },
...     },
...     create=True,
...     dtype=ts.float32,
...     shape=[1000, 2000, 3000],
...     chunk_layout=ts.ChunkLayout(
...         chunk_shape=[10, None, None],
...         chunk_aspect_ratio=[None, 2, 1],
...         chunk_elements=10000000,
...     ),
... )
>>> store
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'memory_key_value_store': {},
  },
  'driver': 'n5',
  'dtype': 'float32',
  'kvstore': {'driver': 'memory'},
  'metadata': {
    'blockSize': [10, 1414, 707],
    'compression': {
      'blocksize': 0,
      'clevel': 5,
      'cname': 'lz4',
      'shuffle': 1,
      'type': 'blosc',
    },
    'dataType': 'float32',
    'dimensions': [1000, 2000, 3000],
  },
  'transform': {
    'input_exclusive_max': [[1000], [2000], [3000]],
    'input_inclusive_min': [0, 0, 0],
  },
})

Driver-specific constraints can be used in combination with, or instead of, schema constraints:

>>> store = await ts.open(
...     {
...         'driver': 'zarr',
...         'kvstore': {
...             'driver': 'memory'
...         },
...         'metadata': {
...             'dtype': '>f4'
...         },
...     },
...     create=True,
...     shape=[1000, 2000, 3000])
>>> store
TensorStore({
  'context': {
    'cache_pool': {},
    'data_copy_concurrency': {},
    'memory_key_value_store': {},
  },
  'driver': 'zarr',
  'dtype': 'float32',
  'kvstore': {'driver': 'memory'},
  'metadata': {
    'chunks': [101, 101, 101],
    'compressor': {
      'blocksize': 0,
      'clevel': 5,
      'cname': 'lz4',
      'id': 'blosc',
      'shuffle': -1,
    },
    'dimension_separator': '.',
    'dtype': '>f4',
    'fill_value': None,
    'filters': None,
    'order': 'C',
    'shape': [1000, 2000, 3000],
    'zarr_format': 2,
  },
  'transform': {
    'input_exclusive_max': [[1000], [2000], [3000]],
    'input_inclusive_min': [0, 0, 0],
  },
})

Using `assume_metadata` for improved concurrent open efficiency¶

Normally, when opening or creating a chunked format like zarr, TensorStore first attempts to read the existing metadata (and confirms that it matches any specified constraints), or (if creating is allowed) creates a new metadata file based on any specified constraints.

When the same TensorStore stored on a distributed filesystem or cloud storage is opened concurrently from many machines, the simultaneous requests to read and write the metadata file by every machine can create contention and result in high latency on some distributed filesystems.

The assume_metadata open mode allows redundant reading and writing of the metadata file to be avoided, but requires careful use to avoid data corruption.

Example of skipping reading the metadata when opening an existing array

>>> context = ts.Context()
>>> # First create the array normally
>>> store = await ts.open({
...     "driver": "zarr",
...     "kvstore": "memory://"
... },
...                       context=context,
...                       dtype=ts.float32,
...                       shape=[5],
...                       create=True)
>>> # Note that the .zarray metadata has been written.
>>> await store.kvstore.list()
[b'.zarray']
>>> await store.write([1, 2, 3, 4, 5])
>>> spec = store.spec()
>>> spec
Spec({
  'driver': 'zarr',
  'dtype': 'float32',
  'kvstore': {'driver': 'memory'},
  'metadata': {
    'chunks': [5],
    'compressor': {
      'blocksize': 0,
      'clevel': 5,
      'cname': 'lz4',
      'id': 'blosc',
      'shuffle': -1,
    },
    'dimension_separator': '.',
    'dtype': '<f4',
    'fill_value': None,
    'filters': None,
    'order': 'C',
    'shape': [5],
    'zarr_format': 2,
  },
  'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]},
})
>>> # Re-open later without re-reading metadata
>>> store2 = await ts.open(spec,
...                        context=context,
...                        open=True,
...                        assume_metadata=True)
>>> # Read data using the unverified metadata from `spec`
>>> await store2.read()

Example of skipping writing the metadata when creating a new array

>>> context = ts.Context()
>>> spec = ts.Spec(json={"driver": "zarr", "kvstore": "memory://"})
>>> spec.update(dtype=ts.float32, shape=[5])
>>> # Open the array without writing the metadata.  If using a distributed
>>> # filesystem, this can safely be executed on multiple machines concurrently,
>>> # provided that the `spec` is identical and the metadata is either fully
>>> # constrained, or exactly the same TensorStore version is used to ensure the
>>> # same defaults are applied.
>>> store = await ts.open(spec,
...                       context=context,
...                       open=True,
...                       create=True,
...                       assume_metadata=True)
>>> await store.write([1, 2, 3, 4, 5])
>>> # Note that the data chunk has been written but not the .zarray metadata
>>> await store.kvstore.list()
[b'0']
>>> # From a single machine, actually write the metadata to ensure the array
>>> # can be re-opened knowing the metadata.  This can be done in parallel with
>>> # any other writing.
>>> await ts.open(spec, context=context, open=True, create=True)
>>> # Metadata has now been written.
>>> await store.kvstore.list()
[b'.zarray', b'0']

Examples¶

Opening an existing TensorStore¶

Creating a new TensorStore¶

Using assume_metadata for improved concurrent open efficiency¶

Using `assume_metadata` for improved concurrent open efficiency¶