zarr3
Driver¶
Zarr v3
is a chunked array storage format.
The zarr3
driver provides access to Zarr v3-format arrays backed by
any supported Key-Value Storage Layer. It supports reading, writing,
creating new arrays, and resizing arrays.
- json driver/zarr3 : object¶
- Extends:¶
- Required members:¶
-
driver :
"zarr3"
¶
- kvstore : KvStore | KvStoreUrl¶
Specifies the underlying storage mechanism.
-
driver :
- Optional members:¶
-
-
rank : integer[
0
,32
]¶ Specifies the rank of the TensorStore.
If
transform
is also specified, the input rank must match. Otherwise, the rank constraint applies to the driver directly.
- transform : IndexTransform¶
Specifies a transform.
- schema : Schema¶
Specifies constraints on the schema.
When opening an existing array, specifies constraints on the existing schema; opening will fail if the constraints do not match. Any soft constraints specified in the
chunk_layout
are ignored. When creating a new array, a suitable schema will be selected automatically based on the specified schema constraints in combination with any driver-specific constraints.
-
path : string =
""
¶ -
This is joined as an additional
"/"
-separated path component after anypath
member directly withinkvstore
. This is supported for backwards compatibility only; theKvStore.path
member should be used instead.Example
"path/to/data"
- open : boolean¶
Open an existing TensorStore. If neither
open
norcreate
is specified, defaults totrue
.
-
create : boolean =
false
¶ Create a new TensorStore. Specify
true
for bothopen
andcreate
to permit either opening an existing TensorStore or creating a new TensorStore if it does not already exist.
-
delete_existing : boolean =
false
¶ Delete any existing data at the specified path before creating a new TensorStore. Requires that
create
istrue
, and thatopen
isfalse
.
-
assume_metadata : boolean =
false
¶ Neither read nor write stored metadata. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata need not even exist. Operations such as resizing that modify the stored metadata are not supported. Requires that
open
istrue
anddelete_existing
isfalse
. This option takes precedence overassume_cached_metadata
if that option is also specified.Warning
This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.
-
assume_cached_metadata : boolean =
false
¶ Skip reading the metadata when opening. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata may still be accessed by subsequent operations that need to re-validate or modify the metadata. Requires that
open
istrue
anddelete_existing
isfalse
. Theassume_metadata
option takes precedence if also specified.Note
Unlike the
assume_metadata
option, operations such as resizing that modify the stored metadata are supported (and access the stored metadata).Warning
This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.
-
cache_pool : ContextResource =
"cache_pool"
¶ Cache pool for data.
Specifies or references a previously defined
Context.cache_pool
. It is normally more convenient to specify a defaultcache_pool
in thecontext
.
- metadata_cache_pool : ContextResource¶
Cache pool for metadata only.
Specifies or references a previously defined
Context.cache_pool
. If not specified, defaults to the value ofcache_pool
.
-
data_copy_concurrency : ContextResource =
"data_copy_concurrency"
¶ Specifies or references a previously defined
Context.data_copy_concurrency
. It is normally more convenient to specify a defaultdata_copy_concurrency
in thecontext
.
-
recheck_cached_metadata : CacheRevalidationBound =
"open"
¶ Time after which cached metadata is assumed to be fresh. Cached metadata older than the specified time is revalidated prior to use. The metadata is used to check the bounds of every read or write operation.
Specifying
true
means that the metadata will be revalidated prior to every read or write operation. With the default value of"open"
, any cached metadata is revalidated when the TensorStore is opened but is not rechecked for each read or write operation.
-
recheck_cached_data : CacheRevalidationBound =
true
¶ Time after which cached data is assumed to be fresh. Cached data older than the specified time is revalidated prior to being returned from a read operation. Partial chunk writes are always consistent regardless of the value of this option.
The default value of
true
means that cached data is revalidated on every read. To enable in-memory data caching, you must both specify acache_pool
with a non-zerototal_bytes_limit
and also specifyfalse
,"open"
, or an explicit time bound forrecheck_cached_data
.
-
fill_missing_data_reads =
true
¶ Replace missing chunks with the fill value when reading.
If disabled, reading a missing chunk will result in an error. Note that the fill value may still be used when writing a partial chunk. Typically this should only be set to
false
in the case thatstore_data_equal_to_fill_value
was enabled when writing.
-
store_data_equal_to_fill_value =
false
¶ Store all explicitly written data, even if it is equal to the fill value.
This ensures that explicitly written data, even if it is equal to the fill value, can be distinguished from missing data. If disabled, chunks equal to the fill value may be represented as missing chunks.
- metadata : driver/zarr3/Metadata¶
Zarr v3 array metadata.
Specifies constraints on the metadata, as in the zarr.json metadata file, except that all members are optional and codecs may be left partially-specified, in which case default options are chosen automatically. When creating a new array, the new metadata is obtained by combining these metadata constraints with any
Schema
constraints.
-
rank : integer[
Example
{ "driver": "zarr3", "kvstore": {"driver": "gcs", "bucket": "my-bucket", "path": "path/to/array/"}, "metadata": { "shape": [1000, 1000], "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [100, 100]}}, "chunk_key_encoding": {"name": "default"}, "codecs": [{"name": "blosc", "configuration": {"cname": "lz4", "clevel": 5}}], "data_type": "int4" } }
- json driver/zarr3/Metadata : object¶
- Optional members:¶
-
zarr_format :
3
¶ Identifies the zarr specification version.
-
node_type :
"array"
¶ Identifies the zarr node type.
-
shape : array of integer[
0
, +∞)¶ Dimensions of the array.
Required when creating a new array if the
Schema.domain
is not otherwise specified.Example
[300, 400, 500]
- data_type : driver/zarr3/DataType¶
Data type of the array.
- chunk_grid : object¶
- Optional members:¶
-
name :
"regular"
¶
- configuration : object¶
- Optional members:¶
-
chunk_shape : array of integer[
1
, +∞)¶ Chunk dimensions.
Specifies the chunk size for each dimension. Must have the same length as
shape
. If not specified when creating a new array, the chunk dimensions are chosen automatically according to theSchema.chunk_layout
.Example
[64, 64, 64]
-
chunk_shape : array of integer[
-
name :
- chunk_key_encoding : driver/zarr3/ChunkKeyEncoding¶
- fill_value¶
Specifies the fill value.
When creating a new array, defaults to
0
for numeric data types andfalse
forbool
.
- codecs : driver/zarr3/CodecChain¶
Specifies the chunk encoding.
- attributes : object¶
Specifies user-defined attributes.
Certain attributes are interpreted specially by TensorStore.
- Optional members:¶
- dimension_units¶
Physical units corresponding to each dimension of the array.
Optional. If specified, the length must match the rank of the array. A value of
null
indicates an unspecified unit, while a value of""
indicates a unitless quantity. If omitted, equivalent to specify an array of allnull
values.Example
For a 3-dimensional array where each voxel has a physical size of 2nm by 3nm by 50nm, the
dimension_units
should be specified as["2 nm", "3 nm", "50 nm"]
.
-
dimension_names : array of string |
null
¶ Specifies an optional name for each dimension.
Optional. If not specified when creating a new array (and also unspecified by the
Schema.domain
), all dimensions are unlabeled (equivalent to specifying an empty string for each dimension). Labels are specified in the same order as theshape
property. Note that this specifies the stored dimension labels. As with any TensorStore driver, dimension labels may also be overridden by specifying atransform
.Example
["x", "y", "z"]
-
zarr_format :
Codecs¶
Chunk data is encoded according to the
codecs
specified in the metadata.
- json driver/zarr3/CodecChain : array of string | driver/zarr3/SingleCodec¶
Specifies a chain of codecs.
Each chunk of the array is converted to its stored representation by a sequence of zero or more
array -> array
codecs, a singlearray -> bytes
codec, and a sequence of zero or morebytes -> bytes
codecs. While required in the actualzarr.json
metadata, in the TensorStore spec it is permitted to omit thearray -> bytes
codec, in which case thearray -> bytes
codec is unconstrained when opening an existing array, and chosen automatically when creating a new array.Each codec is specified either by an object, or as a string. A plain string is equivalent to an object with the string as its
name
. For example,"crc32c"
is equivalent to{"name": "crc32c"}
.
- json driver/zarr3/SingleCodec : object¶
Specifies a single codec.
- Subtypes:¶
driver/zarr3/Codec/blosc
— Specifies Blosc compression.driver/zarr3/Codec/bytes
— Fixed-size encoding for numeric types.driver/zarr3/Codec/crc32c
— Appends a CRC-32C checksum to detect data corruption.driver/zarr3/Codec/gzip
— Specifies gzip compression.driver/zarr3/Codec/sharding_indexed
— Sharding codec that enables hierarchical chunking.driver/zarr3/Codec/transpose
— Transposes the dimensions of an array.driver/zarr3/Codec/zstd
— Specifies Zstd compression.
Array -> array
codecs¶
- json driver/zarr3/Codec/transpose : object¶
Transposes the dimensions of an array.
See also
- Extends:¶
driver/zarr3/SingleCodec
— Specifies a single codec.
- Optional members:¶
- configuration : object¶
- Optional members:¶
-
order : array of integer |
"C"
|"F"
¶ Permutation of the dimensions.
When an array is specified, the
i`th dimension of the encoded representation corresponds to dimension :literal:`order[i]
of the decoded (original) representation.The special value of
"C"
indicates the identity permutation[0, 1, ..., n-1]
of unspecified length (equivalent to not specifying thedriver/zarr3/Codec/transpose
codec at all), and the special value of"F"
indicates the dimension reversal permutation[n-1, ..., 1, 0]
of unspecified length.If combined with the
driver/zarr3/Codec/bytes
codec and no other transformations are applied, specifying"C"
results in chunks stored in C (i.e. lexicographic or row-major) order, and specifying"F"
results in chunks stored in Fortran order (i.e. colexicographic or column-major) order. However, given the possible presence of other transformations, it is recommended to instead just specify a permutation explicitly.
-
order : array of integer |
Example
{"name": "transpose", "configuration": {"order": [2, 0, 1]}}
Array -> bytes
codecs¶
- json driver/zarr3/Codec/bytes : object¶
Fixed-size encoding for numeric types.
See also
- Extends:¶
driver/zarr3/SingleCodec
— Specifies a single codec.
Example
{"name": "bytes", "configuration": {"endian": "little"}}
- json driver/zarr3/Codec/sharding_indexed : object¶
Sharding codec that enables hierarchical chunking.
See also
- Extends:¶
driver/zarr3/SingleCodec
— Specifies a single codec.
- Optional members:¶
- configuration : object¶
- Optional members:¶
-
chunk_shape : array of integer[
1
, +∞)¶ Shape of each sub-chunk.
- codecs : driver/zarr3/CodecChain¶
Sub-chunk codec chain
Codec chain used to encode/decode each individual sub-chunk.
- index_codecs : driver/zarr3/CodecChain¶
Shard index codec chain
Codec chain used to encode/decode the shard index.
-
index_location :
"start"
|"end"
="end"
¶ Location of the shard index within the shard.
-
chunk_shape : array of integer[
Example
{ "name": "sharding_indexed", "configuration": { "chunk_shape": [64, 64, 64], "codecs": [ {"name": "bytes", "configuration": {"endian": "little"}}, {"name": "gzip", "configuration": {"level": "5"}}], "index_codecs": [ {"name": "bytes", "configuration": {"endian": "little"}}, {"name": "crc32c"}], "index_location": "end" } }
Bytes -> bytes
codecs¶
Compression¶
- json driver/zarr3/Codec/gzip : object¶
Specifies gzip compression.
See also
- Extends:¶
driver/zarr3/SingleCodec
— Specifies a single codec.
Example
{"name": "gzip", "configuration": {"level": 9}}
- json driver/zarr3/Codec/blosc : object¶
Specifies Blosc compression.
See also
- Extends:¶
driver/zarr3/SingleCodec
— Specifies a single codec.
- Optional members:¶
- configuration : object¶
- Optional members:¶
-
cname :
"blosclz"
|"lz4"
|"lz4hc"
|"snappy"
|"zlib"
|"zstd"
="lz4"
¶ Specifies the compression method used by Blosc.
-
clevel : integer[
0
,9
] =5
¶ Specifies the Blosc compression level to use.
Higher values are slower but achieve a higher compression ratio.
-
shuffle :
"noshuffle"
|"shuffle"
|"bitshuffle"
¶
-
typesize : integer[
1
,255
]¶ Specifies the stride in bytes for shuffling.
If not specified when creating an array, it is chosen automatically based on the data type.
-
blocksize : integer[
0
, +∞)¶ Specifies the Blosc blocksize.
The default value of 0 causes the block size to be chosen automatically.
-
cname :
Example
{ "name": "blosc", "configuration": {"cname": "blosclz", "clevel": 9, "typesize": 2, "shuffle": "bitshuffle"} }
- json driver/zarr3/Codec/zstd : object¶
Specifies Zstd compression.
- Extends:¶
driver/zarr3/SingleCodec
— Specifies a single codec.
- Optional members:¶
Example
{"name": "zstd", "configuration": {"level": 6}}
Checksum¶
- json driver/zarr3/Codec/crc32c : object¶
Appends a CRC-32C checksum to detect data corruption.
See also
- Extends:¶
driver/zarr3/SingleCodec
— Specifies a single codec.
Example
{"name": "crc32c"}
Chunk key encodings¶
The position of each chunk is encoded as a key according to the
chunk_key_encoding
specified in the
metadata.
- json driver/zarr3/ChunkKeyEncoding : object¶
Specifies the encoding of chunk grid positions as keys in the underlying
kvstore
.If not specified when creating a new array, the
default
chunk key encoding is used.- Subtypes:¶
driver/zarr3/ChunkKeyEncoding.default
— Default chunk key encoding.driver/zarr3/ChunkKeyEncoding.v2
— Zarr v2-compatible chunk key encoding.
- json driver/zarr3/ChunkKeyEncoding.default : object¶
Default chunk key encoding.
Refer to the zarr v3 spec for details.
- Extends:¶
driver/zarr3/ChunkKeyEncoding
— Specifies the encoding of chunk grid positions as keys in the underlyingkvstore
.
- json driver/zarr3/ChunkKeyEncoding.v2 : object¶
Zarr v2-compatible chunk key encoding.
Refer to the zarr v3 spec for details.
- Extends:¶
driver/zarr3/ChunkKeyEncoding
— Specifies the encoding of chunk grid positions as keys in the underlyingkvstore
.
Mapping to TensorStore Schema¶
Example without sharding
For the following zarr driver/zarr3/Metadata
:
{
"zarr_format": 3,
"node_type": "array",
"shape": [1000, 2000, 3000],
"chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [100, 200, 300]}},
"chunk_key_encoding": {"name": "default"},
"data_type": "uint16",
"codecs": [{"name": "bytes", "configuration": {"endian": "little"}}],
"fill_value": 42
}
the corresponding Schema
is:
{
"chunk_layout": {
"grid_origin": [0, 0, 0],
"inner_order": [0, 1, 2],
"read_chunk": {"shape": [100, 200, 300]},
"write_chunk": {"shape": [100, 200, 300]}
},
"codec": {
"codecs": [{"configuration": {"endian": "little"}, "name": "bytes"}],
"driver": "zarr3"
},
"domain": {"exclusive_max": [[1000], [2000], [3000]], "inclusive_min": [0, 0, 0]},
"dtype": "uint16",
"fill_value": 42,
"rank": 3
}
Data type¶
Zarr v3 data types correspond to the TensorStore data type of the same name.
-
json driver/zarr3/DataType :
"bool"
|"int4"
|"int8"
|"uint8"
|"int16"
|"uint16"
|"int32"
|"uint32"
|"int64"
|"uint64"
|"float16"
|"bfloat16"
|"float32"
|"float64"
|"complex64"
|"complex128"
¶ Specifies the zarr data type.
Refer to the zarr v3 spec for details.
- One of:¶
-
"bool"
¶ Boolean value.
-
"int4"
¶ 4-bit signed two’s-complement integer.
Warning
Supported as a non-standard extension.
-
"int8"
¶ 8-bit signed two’s-complement integer.
-
"uint8"
¶ 8-bit unsigned integer.
-
"int16"
¶ 16-bit signed two’s-complement integer.
-
"uint16"
¶ 16-bit unsigned integer.
-
"int32"
¶ 32-bit signed two’s-complement integer.
-
"uint32"
¶ 32-bit unsigned integer.
-
"int64"
¶ 64-bit signed two’s-complement integer.
-
"uint64"
¶ 64-bit unsigned integer.
-
"float16"
¶ IEEE 754 binary16 half-precision floating-point number.
-
"bfloat16"
¶ bfloat16 floating-point format number.
Warning
Supported as a non-standard extension.
-
"float32"
¶ IEEE 754 binary32 single-precision floating-point number.
-
"float64"
¶ IEEE 754 binary64 double-precision floating-point number.
-
Domain¶
The shape
of the Schema.domain
corresponds to driver/zarr3/Metadata.shape
.
Dimension labels may be specified in the
Schema.domain
, and correspond to
driver/zarr3/Metadata.dimension_names
, but with the following differences:
The Zarr v3 specification distinguishes between an empty string (
""
) and an unspecified dimension name (indicated bynull
). In either case, the corresponding TensorStore dimension label is the empty string.The Zarr v3 specification also permits the same non-empty name to be used for more than one dimension, but TensorStore requires that all non-empty dimension labels are unique. If the Zarr metadata specifies dimension names that are not valid TensorStore dimension labels, the corresponding TensorStore domain simply leaves all dimensions unlabeled.
The upper bounds of the domain are resizable (i.e. implicit).
As Zarr v3 does not natively support a non-zero origin, the underlying domain always
has a zero origin (IndexDomain.inclusive_min
is all zero), but it
may be translated by the transform
.
Example
For the following driver/zarr3/Metadata
:
{
"zarr_format": 3,
"node_type": "array",
"shape": [1000, 2000, 3000],
"dimension_names": ["x", "y", "z"],
"chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [100, 200, 300]}},
"chunk_key_encoding": {"name": "default"},
"data_type": "uint16",
"codecs": [{"name": "bytes", "configuration": {"endian": "little"}}],
"fill_value": 0
}
the corresponding IndexDomain
is:
{
"exclusive_max": [[1000], [2000], [3000]],
"inclusive_min": [0, 0, 0],
"labels": ["x", "y", "z"]
}
Chunk layout¶
The ChunkLayout.write_chunk
shape, specifying the granularity at
which writes may be performed efficiently, corresponds to the top-level
chunk_shape
.
The ChunkLayout.grid_origin
is always the zero vector.
The ChunkLayout.inner_order
depends on the
driver/zarr3/Metadata.codecs
that are in use. With just the
default driver/zarr3/Codec/bytes
codec, the inner order is
[0, 1, ..., n-1]
(C order); this order may be altered by the
driver/zarr3/Codec/transpose
codec.
When no sharding codec
is in
use, the ChunkLayout.read_chunk
is equal to the
ChunkLayout.write_chunk
shape.
When using a sharding codec
,
the ChunkLayout.read_chunk
shape corresponds to the inner-most
sub-chunk
shape
.
Selection of chunk layout when creating a new array¶
When creating a new array, the read and write chunk shapes may be constrained
explicitly via ChunkLayout/Grid.shape
or implicitly via
ChunkLayout/Grid.aspect_ratio
and
ChunkLayout/Grid.elements
. If
ChunkLayout/Grid.elements
is not specified, the default is 1
million elements per chunk. Suitable read and write chunk shapes are chosen
automatically based on these constraints, in combination with any constraints
implied by the specified metadata
.
If the chosen read chunk shape is not equal to the chosen write chunk shape, a
sharding codec
is inserted
into the codec chain automatically if not already specified.
If a ChunkLayout.inner_order
constraint is specified, a
driver/zarr3/Codec/transpose
codec may be inserted automatically
just before the inner-most array -> bytes codec.
Example of unconstrained chunk layout
>>> ts.open(
... {
... 'driver': 'zarr3',
... 'kvstore': {
... 'driver': 'memory'
... }
... },
... create=True,
... dtype=ts.uint16,
... shape=[1000, 2000, 3000],
... ).result().chunk_layout
ChunkLayout({
'grid_origin': [0, 0, 0],
'inner_order': [0, 1, 2],
'read_chunk': {'shape': [101, 101, 101]},
'write_chunk': {'shape': [101, 101, 101]},
})
Example of chunk layout with separate read and write chunk constraints
>>> ts.open(
... {
... 'driver': 'zarr3',
... 'kvstore': {
... 'driver': 'memory'
... }
... },
... create=True,
... dtype=ts.uint16,
... chunk_layout=ts.ChunkLayout(
... chunk_aspect_ratio=[2, 1, 1],
... read_chunk_elements=2000000,
... write_chunk_elements=1000000000,
... ),
... shape=[1000, 2000, 3000],
... ).result().chunk_layout
ChunkLayout({
'grid_origin': [0, 0, 0],
'inner_order': [0, 1, 2],
'read_chunk': {'shape': [200, 100, 100]},
'write_chunk': {'shape': [1000, 1000, 1000]},
})
Example of chunk layout with explicit chunk shapes
>>> ts.open(
... {
... 'driver': 'zarr3',
... 'kvstore': {
... 'driver': 'memory'
... }
... },
... create=True,
... dtype=ts.uint16,
... chunk_layout=ts.ChunkLayout(
... read_chunk_shape=[64, 64, 64],
... write_chunk_shape=[512, 512, 512],
... ),
... shape=[1000, 2000, 3000],
... ).result().chunk_layout
ChunkLayout({
'grid_origin': [0, 0, 0],
'inner_order': [0, 1, 2],
'read_chunk': {'shape': [64, 64, 64]},
'write_chunk': {'shape': [512, 512, 512]},
})
Codec¶
Within the Schema.codec
, the chunk codec chain is represented in
the same way as in the driver/zarr3/Metadata
:
- json driver/zarr3/Codec : object¶
-
- Optional members:¶
- codecs : driver/zarr3/CodecChain¶
It is an error to specify any other Codec.driver
.
Fill value¶
The Schema.fill_value
must be a scalar (rank 0).
As an optimization, chunks that are entirely equal to the fill value are not stored.
Dimension units¶
The Schema.dimension_units
property corresponds to the
dimension_units
and
resolution
metadata property. The base unit
is used directly; it is not converted in any way.