-
tensorstore.open(spec: Spec | Any, *, read: bool | None =
None, write: bool | None =None, open_mode: OpenMode | None =None, open: bool | None =None, create: bool | None =None, delete_existing: bool | None =None, assume_metadata: bool | None =None, assume_cached_metadata: bool | None =None, context: Context | None =None, transaction: Transaction | None =None, batch: Batch | None =None, kvstore: KvStore.Spec | KvStore | None =None, recheck_cached_metadata: RecheckCacheOption | None =None, recheck_cached_data: RecheckCacheOption | None =None, recheck_cached: RecheckCacheOption | None =None, rank: int | None =None, dtype: DTypeLike | None =None, domain: IndexDomain | None =None, shape: Iterable[int] | None =None, chunk_layout: ChunkLayout | None =None, codec: CodecSpec | None =None, fill_value: ArrayLike | None =None, dimension_units: Iterable[Unit | str | Real | tuple[Real, str] | None] | None =None, schema: Schema | None =None) Future[TensorStore] Opens or creates a
TensorStorefrom aSpec.>>> store = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... } ... }, ... create=True, ... dtype=ts.int32, ... shape=[1000, 2000, 3000], ... chunk_layout=ts.ChunkLayout(inner_order=[2, 1, 0]), ... ) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'zarr', 'dtype': 'int32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [101, 101, 101], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '<i4', 'fill_value': None, 'filters': None, 'order': 'F', 'shape': [1000, 2000, 3000], 'zarr_format': 2, }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })- Parameters:¶
- spec: Spec | Any¶
TensorStore Spec to open. May also be specified as
JSONor aURL.- read: bool | None =
None¶ Allow read access. Defaults to
Trueif neitherreadnorwriteis specified.- write: bool | None =
None¶ Allow write access. Defaults to
Trueif neitherreadnorwriteis specified.- open_mode: OpenMode | None =
None¶ Overrides the existing open mode.
- open: bool | None =
None¶ Allow opening an existing TensorStore. Overrides the existing open mode.
- create: bool | None =
None¶ Allow creating a new TensorStore. Overrides the existing open mode. To open or create, specify
create=Trueandopen=True.- delete_existing: bool | None =
None¶ Delete any existing data before creating a new array. Overrides the existing open mode. Must be specified in conjunction with
create=True.- assume_metadata: bool | None =
None¶ Neither read nor write stored metadata. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata need not even exist. Operations such as resizing that modify the stored metadata are not supported. Overrides the existing open mode. Requires that
openisTrueanddelete_existingisFalse. This option takes precedence overassume_cached_metadataif that option is also specified.Warning
This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.
- assume_cached_metadata: bool | None =
None¶ Skip reading the metadata when opening. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata may still be accessed by subsequent operations that need to re-validate or modify the metadata. Requires that
openisTrueanddelete_existingisFalse. Theassume_metadataoption takes precedence if also specified.Warning
This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.
- context: Context | None =
None¶ Shared resource context. Defaults to a new (unshared) context with default options, as returned by
tensorstore.Context(). To share resources, such as cache pools, between multiple open TensorStores, you must specify a context.- transaction: Transaction | None =
None¶ Transaction to use for opening/creating, and for subsequent operations. By default, the open is non-transactional.
Note
To perform transactional operations using a
TensorStorethat was previously opened without a transaction, useTensorStore.with_transaction.- batch: Batch | None =
None¶ Batch to use for reading any metadata required for opening.
Warning
If specified, the returned
Futurewill not, in general, become ready until the batch is submitted. Therefore, immediately awaiting the returned future will lead to deadlock.- kvstore: KvStore.Spec | KvStore | None =
None¶ Sets the associated key-value store used as the underlying storage.
If the
kvstorehas already been set, it is overridden.It is an error to specify this if the TensorStore driver does not use a key-value store.
- recheck_cached_metadata: RecheckCacheOption | None =
None¶ Time after which cached metadata is assumed to be fresh. Cached metadata older than the specified time is revalidated prior to use. The metadata is used to check the bounds of every read or write operation.
Specifying
Truemeans that the metadata will be revalidated prior to every read or write operation. With the default value of"open", any cached metadata is revalidated when the TensorStore is opened but is not rechecked for each read or write operation.- recheck_cached_data: RecheckCacheOption | None =
None¶ Time after which cached data is assumed to be fresh. Cached data older than the specified time is revalidated prior to being returned from a read operation. Partial chunk writes are always consistent regardless of the value of this option.
The default value of
Truemeans that cached data is revalidated on every read. To enable in-memory data caching, you must both specify acache_poolwith a non-zerototal_bytes_limitand also specifyFalse,"open", or an explicit time bound forrecheck_cached_data.- recheck_cached: RecheckCacheOption | None =
None¶ Sets both
recheck_cached_dataandrecheck_cached_metadata.- rank: int | None =
None¶ Constrains the rank of the TensorStore. If there is an index transform, the rank constraint must match the rank of the input space.
- dtype: DTypeLike | None =
None¶ Constrains the data type of the TensorStore. If a data type has already been set, it is an error to specify a different data type.
- domain: IndexDomain | None =
None¶ Constrains the domain of the TensorStore. If there is an existing domain, the specified domain is merged with it as follows:
The rank must match the existing rank.
All bounds must match, except that a finite or explicit bound is permitted to match an infinite and implicit bound, and takes precedence.
If both the new and existing domain specify non-empty labels for a dimension, the labels must be equal. If only one of the domains specifies a non-empty label for a dimension, the non-empty label takes precedence.
Note that if there is an index transform, the domain must match the input space, not the output space.
- shape: Iterable[int] | None =
None¶ Constrains the shape and origin of the TensorStore. Equivalent to specifying a
domainofts.IndexDomain(shape=shape).Note
This option also constrains the origin of all dimensions to be zero.
- chunk_layout: ChunkLayout | None =
None¶ Constrains the chunk layout. If there is an existing chunk layout constraint, the constraints are merged. If the constraints are incompatible, an error is raised.
- codec: CodecSpec | None =
None¶ Constrains the codec. If there is an existing codec constraint, the constraints are merged. If the constraints are incompatible, an error is raised.
- fill_value: ArrayLike | None =
None¶ Specifies the fill value for positions that have not been written.
The fill value data type must be convertible to the actual data type, and the shape must be broadcast-compatible with the domain.
If an existing fill value has already been set as a constraint, it is an error to specify a different fill value (where the comparison is done after normalization by broadcasting).
- dimension_units: Iterable[Unit | str | Real | tuple[Real, str] | None] | None =
None¶ Specifies the physical units of each dimension of the domain.
The physical unit for a dimension is the physical quantity corresponding to a single index increment along each dimension.
A value of
Noneindicates that the unit is unknown. A dimension-less quantity can be indicated by a unit of"".- schema: Schema | None =
None¶ Additional schema constraints to merge with existing constraints.
Examples¶
Opening an existing TensorStore¶
To open an existing TensorStore, you can use a minimal
Specthat specifies required driver-specific options, like the storage location. Information that can be determined automatically from the existing metadata, like the data type, domain, and chunk layout, may be omitted:>>> store = await ts.open( ... { ... 'driver': 'neuroglancer_precomputed', ... 'kvstore': { ... 'driver': 'gcs', ... 'bucket': 'neuroglancer-janelia-flyem-hemibrain', ... 'path': 'v1.2/segmentation/', ... }, ... }, ... read=True) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'gcs_request_concurrency': {}, 'gcs_request_retries': {}, 'gcs_user_project': {}, }, 'driver': 'neuroglancer_precomputed', 'dtype': 'uint64', 'kvstore': { 'bucket': 'neuroglancer-janelia-flyem-hemibrain', 'driver': 'gcs', 'path': 'v1.2/segmentation/', }, 'multiscale_metadata': {'num_channels': 1, 'type': 'segmentation'}, 'scale_index': 0, 'scale_metadata': { 'chunk_size': [64, 64, 64], 'compressed_segmentation_block_size': [8, 8, 8], 'encoding': 'compressed_segmentation', 'key': '8.0x8.0x8.0', 'resolution': [8.0, 8.0, 8.0], 'sharding': { '@type': 'neuroglancer_uint64_sharded_v1', 'data_encoding': 'gzip', 'hash': 'identity', 'minishard_bits': 6, 'minishard_index_encoding': 'gzip', 'preshift_bits': 9, 'shard_bits': 15, }, 'size': [34432, 39552, 41408], 'voxel_offset': [0, 0, 0], }, 'transform': { 'input_exclusive_max': [34432, 39552, 41408, 1], 'input_inclusive_min': [0, 0, 0, 0], 'input_labels': ['x', 'y', 'z', 'channel'], }, })Opening by URL¶
The same TensorStore opened in the previous section can be specified more concisely using a
TensorStore URL:>>> store = await ts.open( ... 'gs://neuroglancer-janelia-flyem-hemibrain/v1.2/segmentation/|neuroglancer-precomputed:', ... read=True)Note
The URL syntax is very limited in the options and parameters that may be specified but is convenient in simple cases.
Opening with format auto-detection¶
Many formats can be auto-detected from a
KvStore URLalone:>>> store = await ts.open( ... 'gs://neuroglancer-janelia-flyem-hemibrain/v1.2/segmentation/', ... read=True) >>> store.url 'gs://neuroglancer-janelia-flyem-hemibrain/v1.2/segmentation/|neuroglancer-precomputed:'A full
KvStore JSON speccan also be specified instead of a URL:>>> store = await ts.open( ... { ... 'driver': 'gcs', ... 'bucket': 'neuroglancer-janelia-flyem-hemibrain', ... 'path': 'v1.2/segmentation/' ... }, ... read=True) >>> store.url 'gs://neuroglancer-janelia-flyem-hemibrain/v1.2/segmentation/|neuroglancer-precomputed:'Creating a new TensorStore¶
To create a new TensorStore, you must specify required driver-specific options, like the storage location, as well as
Schemaconstraints like the data type and domain. Suitable defaults are chosen automatically for schema properties that are left unconstrained:>>> store = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... }, ... }, ... create=True, ... dtype=ts.float32, ... shape=[1000, 2000, 3000], ... fill_value=42) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'zarr', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [101, 101, 101], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '<f4', 'fill_value': 42.0, 'filters': None, 'order': 'C', 'shape': [1000, 2000, 3000], 'zarr_format': 2, }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })Partial constraints may be specified on the chunk layout, and the driver will determine a matching chunk layout automatically:
>>> store = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... }, ... }, ... create=True, ... dtype=ts.float32, ... shape=[1000, 2000, 3000], ... chunk_layout=ts.ChunkLayout( ... chunk_shape=[10, None, None], ... chunk_aspect_ratio=[None, 2, 1], ... chunk_elements=10000000, ... ), ... ) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'zarr', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [10, 1414, 707], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '<f4', 'fill_value': None, 'filters': None, 'order': 'C', 'shape': [1000, 2000, 3000], 'zarr_format': 2, }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })The schema constraints allow key storage characteristics to be specified independent of the driver/format:
>>> store = await ts.open( ... { ... 'driver': 'n5', ... 'kvstore': { ... 'driver': 'memory' ... }, ... }, ... create=True, ... dtype=ts.float32, ... shape=[1000, 2000, 3000], ... chunk_layout=ts.ChunkLayout( ... chunk_shape=[10, None, None], ... chunk_aspect_ratio=[None, 2, 1], ... chunk_elements=10000000, ... ), ... ) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'n5', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'blockSize': [10, 1414, 707], 'compression': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'shuffle': 1, 'type': 'blosc', }, 'dataType': 'float32', 'dimensions': [1000, 2000, 3000], }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })Driver-specific constraints can be used in combination with, or instead of, schema constraints:
>>> store = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... }, ... 'metadata': { ... 'dtype': '>f4' ... }, ... }, ... create=True, ... shape=[1000, 2000, 3000]) >>> store TensorStore({ 'context': { 'cache_pool': {}, 'data_copy_concurrency': {}, 'memory_key_value_store': {}, }, 'driver': 'zarr', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [101, 101, 101], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '>f4', 'fill_value': None, 'filters': None, 'order': 'C', 'shape': [1000, 2000, 3000], 'zarr_format': 2, }, 'transform': { 'input_exclusive_max': [[1000], [2000], [3000]], 'input_inclusive_min': [0, 0, 0], }, })Using
assume_metadatafor improved concurrent open efficiency¶Normally, when opening or creating a chunked format like zarr, TensorStore first attempts to read the existing metadata (and confirms that it matches any specified constraints), or (if creating is allowed) creates a new metadata file based on any specified constraints.
When the same TensorStore stored on a distributed filesystem or cloud storage is opened concurrently from many machines, the simultaneous requests to read and write the metadata file by every machine can create contention and result in high latency on some distributed filesystems.
The
assume_metadataopen mode allows redundant reading and writing of the metadata file to be avoided, but requires careful use to avoid data corruption.Example of skipping reading the metadata when opening an existing array
>>> context = ts.Context() >>> # First create the array normally >>> store = await ts.open({ ... "driver": "zarr", ... "kvstore": "memory://" ... }, ... context=context, ... dtype=ts.float32, ... shape=[5], ... create=True) >>> # Note that the .zarray metadata has been written. >>> await store.kvstore.list() [b'.zarray'] >>> await store.write([1, 2, 3, 4, 5]) >>> spec = store.spec() >>> spec Spec({ 'driver': 'zarr', 'dtype': 'float32', 'kvstore': {'driver': 'memory'}, 'metadata': { 'chunks': [5], 'compressor': { 'blocksize': 0, 'clevel': 5, 'cname': 'lz4', 'id': 'blosc', 'shuffle': -1, }, 'dimension_separator': '.', 'dtype': '<f4', 'fill_value': None, 'filters': None, 'order': 'C', 'shape': [5], 'zarr_format': 2, }, 'transform': {'input_exclusive_max': [[5]], 'input_inclusive_min': [0]}, }) >>> # Re-open later without re-reading metadata >>> store2 = await ts.open(spec, ... context=context, ... open=True, ... assume_metadata=True) >>> # Read data using the unverified metadata from `spec` >>> await store2.read()Example of skipping writing the metadata when creating a new array
>>> context = ts.Context() >>> spec = ts.Spec(json={"driver": "zarr", "kvstore": "memory://"}) >>> spec.update(dtype=ts.float32, shape=[5]) >>> # Open the array without writing the metadata. If using a distributed >>> # filesystem, this can safely be executed on multiple machines concurrently, >>> # provided that the `spec` is identical and the metadata is either fully >>> # constrained, or exactly the same TensorStore version is used to ensure the >>> # same defaults are applied. >>> store = await ts.open(spec, ... context=context, ... open=True, ... create=True, ... assume_metadata=True) >>> await store.write([1, 2, 3, 4, 5]) >>> # Note that the data chunk has been written but not the .zarray metadata >>> await store.kvstore.list() [b'0'] >>> # From a single machine, actually write the metadata to ensure the array >>> # can be re-opened knowing the metadata. This can be done in parallel with >>> # any other writing. >>> await ts.open(spec, context=context, open=True, create=True) >>> # Metadata has now been written. >>> await store.kvstore.list() [b'.zarray', b'0']