-
tensorstore.open(spec: Spec | Any, *, read: bool | None =
None
, write: bool | None =None
, open_mode: OpenMode | None =None
, open: bool | None =None
, create: bool | None =None
, delete_existing: bool | None =None
, assume_metadata: bool | None =None
, assume_cached_metadata: bool | None =None
, context: Context | None =None
, transaction: Transaction | None =None
, batch: Batch | None =None
, kvstore: KvStore.Spec | None =None
, recheck_cached_metadata: RecheckCacheOption | None =None
, recheck_cached_data: RecheckCacheOption | None =None
, recheck_cached: RecheckCacheOption | None =None
, rank: int | None =None
, dtype: dtype | None =None
, domain: IndexDomain | None =None
, shape: Sequence[int] | None =None
, chunk_layout: ChunkLayout | None =None
, codec: CodecSpec | None =None
, fill_value: ArrayLike | None =None
, dimension_units: Sequence[Unit | str | Real | tuple[Real, str] | None] | None =None
, schema: Schema | None =None
) Future[TensorStore] Opens or creates a
TensorStore
from aSpec
.>>> store = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... } ... }, ... create=True, ... dtype=ts.int32, ... shape=[1000, 2000, 3000], ... chunk_layout=ts.ChunkLayout(inner_order=[2, 1, 0]), ... ) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'zarr', 'dtype': 'int32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [101, 101, 101], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '<i4', 'fill_value': None, 'filters': None, 'order': 'F', 'shape': [1000, 2000, 3000], 'zarr_format': 2, }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })
- Parameters:¶
- spec: Spec | Any¶
TensorStore Spec to open. May also be specified as
JSON
.- read: bool | None =
None
¶ Allow read access. Defaults to
True
if neitherread
norwrite
is specified.- write: bool | None =
None
¶ Allow write access. Defaults to
True
if neitherread
norwrite
is specified.- open_mode: OpenMode | None =
None
¶ Overrides the existing open mode.
- open: bool | None =
None
¶ Allow opening an existing TensorStore. Overrides the existing open mode.
- create: bool | None =
None
¶ Allow creating a new TensorStore. Overrides the existing open mode. To open or create, specify
create=True
andopen=True
.- delete_existing: bool | None =
None
¶ Delete any existing data before creating a new array. Overrides the existing open mode. Must be specified in conjunction with
create=True
.- assume_metadata: bool | None =
None
¶ Neither read nor write stored metadata. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata need not even exist. Operations such as resizing that modify the stored metadata are not supported. Overrides the existing open mode. Requires that
open
isTrue
anddelete_existing
isFalse
. This option takes precedence overassume_cached_metadata
if that option is also specified.Warning
This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.
- assume_cached_metadata: bool | None =
None
¶ Skip reading the metadata when opening. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata may still be accessed by subsequent operations that need to re-validate or modify the metadata. Requires that
open
isTrue
anddelete_existing
isFalse
. Theassume_metadata
option takes precedence if also specified.Warning
This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.
- context: Context | None =
None
¶ Shared resource context. Defaults to a new (unshared) context with default options, as returned by
tensorstore.Context()
. To share resources, such as cache pools, between multiple open TensorStores, you must specify a context.- transaction: Transaction | None =
None
¶ Transaction to use for opening/creating, and for subsequent operations. By default, the open is non-transactional.
Note
To perform transactional operations using a
TensorStore
that was previously opened without a transaction, useTensorStore.with_transaction
.- batch: Batch | None =
None
¶ Batch to use for reading any metadata required for opening.
Warning
If specified, the returned
Future
will not, in general, become ready until the batch is submitted. Therefore, immediately awaiting the returned future will lead to deadlock.- kvstore: KvStore.Spec | None =
None
¶ Sets the associated key-value store used as the underlying storage.
If the
kvstore
has already been set, it is overridden.It is an error to specify this if the TensorStore driver does not use a key-value store.
- recheck_cached_metadata: RecheckCacheOption | None =
None
¶ Time after which cached metadata is assumed to be fresh. Cached metadata older than the specified time is revalidated prior to use. The metadata is used to check the bounds of every read or write operation.
Specifying
True
means that the metadata will be revalidated prior to every read or write operation. With the default value of"open"
, any cached metadata is revalidated when the TensorStore is opened but is not rechecked for each read or write operation.- recheck_cached_data: RecheckCacheOption | None =
None
¶ Time after which cached data is assumed to be fresh. Cached data older than the specified time is revalidated prior to being returned from a read operation. Partial chunk writes are always consistent regardless of the value of this option.
The default value of
True
means that cached data is revalidated on every read. To enable in-memory data caching, you must both specify acache_pool
with a non-zerototal_bytes_limit
and also specifyFalse
,"open"
, or an explicit time bound forrecheck_cached_data
.- recheck_cached: RecheckCacheOption | None =
None
¶ Sets both
recheck_cached_data
andrecheck_cached_metadata
.- rank: int | None =
None
¶ Constrains the rank of the TensorStore. If there is an index transform, the rank constraint must match the rank of the input space.
- dtype: dtype | None =
None
¶ Constrains the data type of the TensorStore. If a data type has already been set, it is an error to specify a different data type.
- domain: IndexDomain | None =
None
¶ Constrains the domain of the TensorStore. If there is an existing domain, the specified domain is merged with it as follows:
The rank must match the existing rank.
All bounds must match, except that a finite or explicit bound is permitted to match an infinite and implicit bound, and takes precedence.
If both the new and existing domain specify non-empty labels for a dimension, the labels must be equal. If only one of the domains specifies a non-empty label for a dimension, the non-empty label takes precedence.
Note that if there is an index transform, the domain must match the input space, not the output space.
- shape: Sequence[int] | None =
None
¶ Constrains the shape and origin of the TensorStore. Equivalent to specifying a
domain
ofts.IndexDomain(shape=shape)
.Note
This option also constrains the origin of all dimensions to be zero.
- chunk_layout: ChunkLayout | None =
None
¶ Constrains the chunk layout. If there is an existing chunk layout constraint, the constraints are merged. If the constraints are incompatible, an error is raised.
- codec: CodecSpec | None =
None
¶ Constrains the codec. If there is an existing codec constraint, the constraints are merged. If the constraints are incompatible, an error is raised.
- fill_value: ArrayLike | None =
None
¶ Specifies the fill value for positions that have not been written.
The fill value data type must be convertible to the actual data type, and the shape must be broadcast-compatible with the domain.
If an existing fill value has already been set as a constraint, it is an error to specify a different fill value (where the comparison is done after normalization by broadcasting).
- dimension_units: Sequence[Unit | str | Real | tuple[Real, str] | None] | None =
None
¶ Specifies the physical units of each dimension of the domain.
The physical unit for a dimension is the physical quantity corresponding to a single index increment along each dimension.
A value of
None
indicates that the unit is unknown. A dimension-less quantity can be indicated by a unit of""
.- schema: Schema | None =
None
¶ Additional schema constraints to merge with existing constraints.
Examples¶
Opening an existing TensorStore¶
To open an existing TensorStore, you can use a minimal
Spec
that specifies required driver-specific options, like the storage location. Information that can be determined automatically from the existing metadata, like the data type, domain, and chunk layout, may be omitted:>>> store = await ts.open( ... { ... 'driver': 'neuroglancer_precomputed', ... 'kvstore': { ... 'driver': 'gcs', ... 'bucket': 'neuroglancer-janelia-flyem-hemibrain', ... 'path': 'v1.2/segmentation/', ... }, ... }, ... read=True) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'gcs_request_concurrency': {}, 'gcs_request_retries': {}, 'gcs_user_project': {}, }, 'driver': 'neuroglancer_precomputed', 'dtype': 'uint64', 'kvstore': { 'bucket': 'neuroglancer-janelia-flyem-hemibrain', 'driver': 'gcs', 'path': 'v1.2/segmentation/', }, 'multiscale_metadata': {'num_channels': 1, 'type': 'segmentation'}, 'scale_index': 0, 'scale_metadata': { 'chunk_size': [64, 64, 64], 'compressed_segmentation_block_size': [8, 8, 8], 'encoding': 'compressed_segmentation', 'key': '8.0x8.0x8.0', 'resolution': [8.0, 8.0, 8.0], 'sharding': { '@type': 'neuroglancer_uint64_sharded_v1', 'data_encoding': 'gzip', 'hash': 'identity', 'minishard_bits': 6, 'minishard_index_encoding': 'gzip', 'preshift_bits': 9, 'shard_bits': 15, }, 'size': [34432, 39552, 41408], 'voxel_offset': [0, 0, 0], }, 'transform': { 'input_exclusive_max': [34432, 39552, 41408, 1], 'input_inclusive_min': [0, 0, 0, 0], 'input_labels': ['x', 'y', 'z', 'channel'], }, })
Creating a new TensorStore¶
To create a new TensorStore, you must specify required driver-specific options, like the storage location, as well as
Schema
constraints like the data type and domain. Suitable defaults are chosen automatically for schema properties that are left unconstrained:>>> store = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... }, ... }, ... create=True, ... dtype=ts.float32, ... shape=[1000, 2000, 3000], ... fill_value=42) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'zarr', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [101, 101, 101], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '<f4', 'fill_value': 42.0, 'filters': None, 'order': 'C', 'shape': [1000, 2000, 3000], 'zarr_format': 2, }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })
Partial constraints may be specified on the chunk layout, and the driver will determine a matching chunk layout automatically:
>>> store = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... }, ... }, ... create=True, ... dtype=ts.float32, ... shape=[1000, 2000, 3000], ... chunk_layout=ts.ChunkLayout( ... chunk_shape=[10, None, None], ... chunk_aspect_ratio=[None, 2, 1], ... chunk_elements=10000000, ... ), ... ) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'zarr', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [10, 1414, 707], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '<f4', 'fill_value': None, 'filters': None, 'order': 'C', 'shape': [1000, 2000, 3000], 'zarr_format': 2, }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })
The schema constraints allow key storage characteristics to be specified independent of the driver/format:
>>> store = await ts.open( ... { ... 'driver': 'n5', ... 'kvstore': { ... 'driver': 'memory' ... }, ... }, ... create=True, ... dtype=ts.float32, ... shape=[1000, 2000, 3000], ... chunk_layout=ts.ChunkLayout( ... chunk_shape=[10, None, None], ... chunk_aspect_ratio=[None, 2, 1], ... chunk_elements=10000000, ... ), ... ) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'n5', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'blockSize': [10, 1414, 707], 'compression': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'shuffle': 1, 'type': 'blosc', }, 'dataType': 'float32', 'dimensions': [1000, 2000, 3000], }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })
Driver-specific constraints can be used in combination with, or instead of, schema constraints:
>>> store = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... }, ... 'metadata': { ... 'dtype': '>f4' ... }, ... }, ... create=True, ... shape=[1000, 2000, 3000]) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'zarr', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [101, 101, 101], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '>f4', 'fill_value': None, 'filters': None, 'order': 'C', 'shape': [1000, 2000, 3000], 'zarr_format': 2, }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })
Using
assume_metadata
for improved concurrent open efficiency¶Normally, when opening or creating a chunked format like zarr, TensorStore first attempts to read the existing metadata (and confirms that it matches any specified constraints), or (if creating is allowed) creates a new metadata file based on any specified constraints.
When the same TensorStore stored on a distributed filesystem or cloud storage is opened concurrently from many machines, the simultaneous requests to read and write the metadata file by every machine can create contention and result in high latency on some distributed filesystems.
The
assume_metadata
open mode allows redundant reading and writing of the metadata file to be avoided, but requires careful use to avoid data corruption.Example of skipping reading the metadata when opening an existing array
>>> context = ts.Context() >>> # First create the array normally >>> store = await ts.open({ ... "driver": "zarr", ... "kvstore": "memory://" ... }, ... context=context, ... dtype=ts.float32, ... shape=[5], ... create=True) >>> # Note that the .zarray metadata has been written. >>> await store.kvstore.list() [b'.zarray'] >>> await store.write([1, 2, 3, 4, 5]) >>> spec = store.spec() >>> spec Spec({ 'driver': 'zarr', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [5], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '<f4', 'fill_value': None, 'filters': None, 'order': 'C', 'shape': [5], 'zarr_format': 2, }, 'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]}, }) >>> # Re-open later without re-reading metadata >>> store2 = await ts.open(spec, ... context=context, ... open=True, ... assume_metadata=True) >>> # Read data using the unverified metadata from `spec` >>> await store2.read()
Example of skipping writing the metadata when creating a new array
>>> context = ts.Context() >>> spec = ts.Spec(json={"driver": "zarr", "kvstore": "memory://"}) >>> spec.update(dtype=ts.float32, shape=[5]) >>> # Open the array without writing the metadata. If using a distributed >>> # filesystem, this can safely be executed on multiple machines concurrently, >>> # provided that the `spec` is identical and the metadata is either fully >>> # constrained, or exactly the same TensorStore version is used to ensure the >>> # same defaults are applied. >>> store = await ts.open(spec, ... context=context, ... open=True, ... create=True, ... assume_metadata=True) >>> await store.write([1, 2, 3, 4, 5]) >>> # Note that the data chunk has been written but not the .zarray metadata >>> await store.kvstore.list() [b'0'] >>> # From a single machine, actually write the metadata to ensure the array >>> # can be re-opened knowing the metadata. This can be done in parallel with >>> # any other writing. >>> await ts.open(spec, context=context, open=True, create=True) >>> # Metadata has now been written. >>> await store.kvstore.list() [b'.zarray', b'0']