zarr3 Driver

Zarr v3 is a chunked array storage format.

The zarr3 driver provides access to Zarr v3-format arrays backed by any supported Key-Value Storage Layer. It supports reading, writing, creating new arrays, and resizing arrays.

json driver/zarr3 : object
Extends:
Required members:
driver : "zarr3"
kvstore : KvStore | KvStoreUrl

Specifies the underlying storage mechanism.

Optional members:
context : Context

Specifies context resources that augment/override the parent context.

dtype : dtype

Specifies the data type.

rank : integer[0, 32]

Specifies the rank of the TensorStore.

If transform is also specified, the input rank must match. Otherwise, the rank constraint applies to the driver directly.

transform : IndexTransform

Specifies a transform.

schema : Schema

Specifies constraints on the schema.

When opening an existing array, specifies constraints on the existing schema; opening will fail if the constraints do not match. Any soft constraints specified in the chunk_layout are ignored. When creating a new array, a suitable schema will be selected automatically based on the specified schema constraints in combination with any driver-specific constraints.

path : string = ""

Additional path within the KvStore specified by kvstore.

This is joined as an additional "/"-separated path component after any path member directly within kvstore. This is supported for backwards compatibility only; the KvStore.path member should be used instead.

Example

"path/to/data"
open : boolean

Open an existing TensorStore. If neither open nor create is specified, defaults to true.

create : boolean = false

Create a new TensorStore. Specify true for both open and create to permit either opening an existing TensorStore or creating a new TensorStore if it does not already exist.

delete_existing : boolean = false

Delete any existing data at the specified path before creating a new TensorStore. Requires that create is true, and that open is false.

assume_metadata : boolean = false

Neither read nor write stored metadata. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata need not even exist. Operations such as resizing that modify the stored metadata are not supported. Requires that open is true and delete_existing is false. This option takes precedence over assume_cached_metadata if that option is also specified.

Warning

This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.

assume_cached_metadata : boolean = false

Skip reading the metadata when opening. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata may still be accessed by subsequent operations that need to re-validate or modify the metadata. Requires that open is true and delete_existing is false. The assume_metadata option takes precedence if also specified.

Note

Unlike the assume_metadata option, operations such as resizing that modify the stored metadata are supported (and access the stored metadata).

Warning

This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.

cache_pool : ContextResource = "cache_pool"

Specifies or references a previously defined Context.cache_pool. It is normally more convenient to specify a default cache_pool in the context.

data_copy_concurrency : ContextResource = "data_copy_concurrency"

Specifies or references a previously defined Context.data_copy_concurrency. It is normally more convenient to specify a default data_copy_concurrency in the context.

recheck_cached_metadata : CacheRevalidationBound = "open"

Time after which cached metadata is assumed to be fresh. Cached metadata older than the specified time is revalidated prior to use. The metadata is used to check the bounds of every read or write operation.

Specifying true means that the metadata will be revalidated prior to every read or write operation. With the default value of "open", any cached metadata is revalidated when the TensorStore is opened but is not rechecked for each read or write operation.

recheck_cached_data : CacheRevalidationBound = true

Time after which cached data is assumed to be fresh. Cached data older than the specified time is revalidated prior to being returned from a read operation. Partial chunk writes are always consistent regardless of the value of this option.

The default value of true means that cached data is revalidated on every read. To enable in-memory data caching, you must both specify a cache_pool with a non-zero total_bytes_limit and also specify false, "open", or an explicit time bound for recheck_cached_data.

metadata : driver/zarr3/Metadata

Zarr v3 array metadata.

Specifies constraints on the metadata, as in the zarr.json metadata file, except that all members are optional and codecs may be left partially-specified, in which case default options are chosen automatically. When creating a new array, the new metadata is obtained by combining these metadata constraints with any Schema constraints.

Example

{
  "driver": "zarr3",
  "kvstore": {"driver": "gcs", "bucket": "my-bucket", "path": "path/to/array/"},
  "metadata": {
    "shape": [1000, 1000],
    "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [100, 100]}},
    "chunk_key_encoding": {"name": "default"},
    "codecs": [{"name": "blosc", "configuration": {"cname": "lz4", "clevel": 5}}],
    "data_type": "int4"
  }
}
json driver/zarr3/Metadata : object
Optional members:
zarr_format : 3

Identifies the zarr specification version.

node_type : "array"

Identifies the zarr node type.

shape : array of integer[0, +∞)

Dimensions of the array.

Required when creating a new array if the Schema.domain is not otherwise specified.

Example

[300, 400, 500]
data_type : driver/zarr3/DataType

Data type of the array.

chunk_grid : object
Optional members:
name : "regular"
configuration : object
Optional members:
chunk_shape : array of integer[1, +∞)

Chunk dimensions.

Specifies the chunk size for each dimension. Must have the same length as shape. If not specified when creating a new array, the chunk dimensions are chosen automatically according to the Schema.chunk_layout.

Example

[64, 64, 64]
chunk_key_encoding : driver/zarr3/ChunkKeyEncoding
fill_value

Specifies the fill value.

When creating a new array, defaults to 0 for numeric data types and false for bool.

codecs : driver/zarr3/CodecChain

Specifies the chunk encoding.

attributes : object

Specifies user-defined attributes.

Certain attributes are interpreted specially by TensorStore.

Optional members:
dimension_units

Physical units corresponding to each dimension of the array.

Optional. If specified, the length must match the rank of the array. A value of null indicates an unspecified unit, while a value of "" indicates a unitless quantity. If omitted, equivalent to specify an array of all null values.

Example

For a 3-dimensional array where each voxel has a physical size of 2nm by 3nm by 50nm, the dimension_units should be specified as ["2 nm", "3 nm", "50 nm"].

dimension_names : array of string | null

Specifies an optional name for each dimension.

Optional. If not specified when creating a new array (and also unspecified by the Schema.domain), all dimensions are unlabeled (equivalent to specifying an empty string for each dimension). Labels are specified in the same order as the shape property. Note that this specifies the stored dimension labels. As with any TensorStore driver, dimension labels may also be overridden by specifying a transform.

Example

["x", "y", "z"]

Codecs

Chunk data is encoded according to the codecs specified in the metadata.

json driver/zarr3/CodecChain : array of string | driver/zarr3/SingleCodec

Specifies a chain of codecs.

Each chunk of the array is converted to its stored representation by a sequence of zero or more array -> array codecs, a single array -> bytes codec, and a sequence of zero or more bytes -> bytes codecs. While required in the actual zarr.json metadata, in the TensorStore spec it is permitted to omit the array -> bytes codec, in which case the array -> bytes codec is unconstrained when opening an existing array, and chosen automatically when creating a new array.

Each codec is specified either by an object, or as a string. A plain string is equivalent to an object with the string as its name. For example, "crc32c" is equivalent to {"name": "crc32c"}.

json driver/zarr3/SingleCodec : object

Specifies a single codec.

Subtypes:
Required members:
name : string

Identifies the codec.

Optional members:
configuration : object

Specifies codec-specific configuration options.

Array -> array codecs

json driver/zarr3/Codec/transpose : object

Transposes the dimensions of an array.

Extends:
Required members:
name : "transpose"
Optional members:
configuration : object
Optional members:
order : array of integer | "C" | "F"

Permutation of the dimensions.

When an array is specified, the i`th dimension of the encoded representation corresponds to dimension :literal:`order[i] of the decoded (original) representation.

The special value of "C" indicates the identity permutation [0, 1, ..., n-1] of unspecified length (equivalent to not specifying the driver/zarr3/Codec/transpose codec at all), and the special value of "F" indicates the dimension reversal permutation [n-1, ..., 1, 0] of unspecified length.

If combined with the driver/zarr3/Codec/bytes codec and no other transformations are applied, specifying "C" results in chunks stored in C (i.e. lexicographic or row-major) order, and specifying "F" results in chunks stored in Fortran order (i.e. colexicographic or column-major) order. However, given the possible presence of other transformations, it is recommended to instead just specify a permutation explicitly.

Example

{"name": "transpose", "configuration": {"order": [2, 0, 1]}}

Array -> bytes codecs

json driver/zarr3/Codec/bytes : object

Fixed-size encoding for numeric types.

Extends:
Required members:
name : "bytes"
Optional members:
configuration : object
Optional members:
endian : "little" | "big"

Example

{"name": "bytes", "configuration": {"endian": "little"}}
json driver/zarr3/Codec/sharding_indexed : object

Sharding codec that enables hierarchical chunking.

Extends:
Required members:
name : "sharding_indexed"
Optional members:
configuration : object
Optional members:
chunk_shape : array of integer[1, +∞)

Shape of each sub-chunk.

codecs : driver/zarr3/CodecChain

Sub-chunk codec chain

Codec chain used to encode/decode each individual sub-chunk.

index_codecs : driver/zarr3/CodecChain

Shard index codec chain

Codec chain used to encode/decode the shard index.

index_location : "start" | "end" = "end"

Location of the shard index within the shard.

Example

{
  "name": "sharding_indexed",
  "configuration": {
    "chunk_shape": [64, 64, 64],
    "codecs": [ {"name": "bytes", "configuration": {"endian": "little"}},
      {"name": "gzip", "configuration": {"level": "5"}}],
    "index_codecs": [ {"name": "bytes", "configuration": {"endian": "little"}},
      {"name": "crc32c"}],
    "index_location": "end"
  }
}

Bytes -> bytes codecs

Compression

json driver/zarr3/Codec/gzip : object

Specifies gzip compression.

Extends:
Required members:
name : "gzip"
Optional members:
configuration : object

Example

{"name": "gzip", "configuration": {"level": 9}}
json driver/zarr3/Codec/blosc : object

Specifies Blosc compression.

Extends:
Required members:
name : "blosc"
Optional members:
configuration : object
Optional members:
cname : "blosclz" | "lz4" | "lz4hc" | "snappy" | "zlib" | "zstd" = "lz4"

Specifies the compression method used by Blosc.

clevel : integer[0, 9] = 5

Specifies the Blosc compression level to use.

Higher values are slower but achieve a higher compression ratio.

shuffle : "noshuffle" | "shuffle" | "bitshuffle"
One of:
"noshuffle"

No shuffling.

"shuffle"

Byte-wise shuffle

"bitshuffle"

Bit-wise shuffle

typesize : integer[1, 255]

Specifies the stride in bytes for shuffling.

If not specified when creating an array, it is chosen automatically based on the data type.

blocksize : integer[0, +∞)

Specifies the Blosc blocksize.

The default value of 0 causes the block size to be chosen automatically.

Example

{
  "name": "blosc",
  "configuration": {"cname": "blosclz", "clevel": 9, "typesize": 2, "shuffle": "bitshuffle"}
}
json driver/zarr3/Codec/zstd : object

Specifies Zstd compression.

Extends:
Required members:
name : "zstd"
Optional members:
configuration : object
Optional members:
level : integer[-131072, 22] = 1

Specifies the compression level to use.

A higher compression level provides improved density but reduced compression speed.

Example

{"name": "zstd", "configuration": {"level": 6}}

Checksum

json driver/zarr3/Codec/crc32c : object

Appends a CRC-32C checksum to detect data corruption.

Extends:
Required members:
name : "crc32c"
Optional members:
configuration : object

No configuration options are supported.

Example

{"name": "crc32c"}

Chunk key encodings

The position of each chunk is encoded as a key according to the chunk_key_encoding specified in the metadata.

json driver/zarr3/ChunkKeyEncoding : object

Specifies the encoding of chunk grid positions as keys in the underlying kvstore.

If not specified when creating a new array, the default chunk key encoding is used.

Subtypes:
Optional members:
name : string

Identifies the chunk key encoding.

configuration : object

Configuration options.

Specifies configuration options specific to the particular chunk key encoding.

json driver/zarr3/ChunkKeyEncoding.default : object

Default chunk key encoding.

Refer to the zarr v3 spec for details.

Extends:
Optional members:
name : "default"
configuration : object
Optional members:
separator : "/" | "." = "/"

Separator character between dimensions

json driver/zarr3/ChunkKeyEncoding.v2 : object

Zarr v2-compatible chunk key encoding.

Refer to the zarr v3 spec for details.

Extends:
Optional members:
name : "v2"
configuration : object
Optional members:
separator : "/" | "." = "."

Separator character between dimensions

Mapping to TensorStore Schema

Example without sharding

For the following zarr driver/zarr3/Metadata:

{
  "zarr_format": 3,
  "node_type": "array",
  "shape": [1000, 2000, 3000],
  "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [100, 200, 300]}},
  "chunk_key_encoding": {"name": "default"},
  "data_type": "uint16",
  "codecs": [{"name": "bytes", "configuration": {"endian": "little"}}],
  "fill_value": 42
}

the corresponding Schema is:

{
  "chunk_layout": {
    "grid_origin": [0, 0, 0],
    "inner_order": [0, 1, 2],
    "read_chunk": {"shape": [100, 200, 300]},
    "write_chunk": {"shape": [100, 200, 300]}
  },
  "codec": {
    "codecs": [{"configuration": {"endian": "little"}, "name": "bytes"}],
    "driver": "zarr3"
  },
  "domain": {"exclusive_max": [[1000], [2000], [3000]], "inclusive_min": [0, 0, 0]},
  "dtype": "uint16",
  "fill_value": 42,
  "rank": 3
}

Data type

Zarr v3 data types correspond to the TensorStore data type of the same name.

json driver/zarr3/DataType : "bool" | "int4" | "int8" | "uint8" | "int16" | "uint16" | "int32" | "uint32" | "int64" | "uint64" | "float16" | "bfloat16" | "float32" | "float64" | "complex64" | "complex128"

Specifies the zarr data type.

Refer to the zarr v3 spec for details.

One of:
"bool"

Boolean value.

"int4"

4-bit signed two’s-complement integer.

Warning

Supported as a non-standard extension.

"int8"

8-bit signed two’s-complement integer.

"uint8"

8-bit unsigned integer.

"int16"

16-bit signed two’s-complement integer.

"uint16"

16-bit unsigned integer.

"int32"

32-bit signed two’s-complement integer.

"uint32"

32-bit unsigned integer.

"int64"

64-bit signed two’s-complement integer.

"uint64"

64-bit unsigned integer.

"float16"

IEEE 754 binary16 half-precision floating-point number.

"bfloat16"

bfloat16 floating-point format number.

Warning

Supported as a non-standard extension.

"float32"

IEEE 754 binary32 single-precision floating-point number.

"float64"

IEEE 754 binary64 double-precision floating-point number.

"complex64"

Complex number, where the real and imaginary components are each represented by a float32.

"complex128"

Complex number, where the real and imaginary components are each represented by a float64.

Domain

The shape of the Schema.domain corresponds to driver/zarr3/Metadata.shape.

Dimension labels may be specified in the Schema.domain, and correspond to driver/zarr3/Metadata.dimension_names, but with the following differences:

  • The Zarr v3 specification distinguishes between an empty string ("") and an unspecified dimension name (indicated by null). In either case, the corresponding TensorStore dimension label is the empty string.

  • The Zarr v3 specification also permits the same non-empty name to be used for more than one dimension, but TensorStore requires that all non-empty dimension labels are unique. If the Zarr metadata specifies dimension names that are not valid TensorStore dimension labels, the corresponding TensorStore domain simply leaves all dimensions unlabeled.

The upper bounds of the domain are resizable (i.e. implicit).

As Zarr v3 does not natively support a non-zero origin, the underlying domain always has a zero origin (IndexDomain.inclusive_min is all zero), but it may be translated by the transform.

Example

For the following driver/zarr3/Metadata:

{
  "zarr_format": 3,
  "node_type": "array",
  "shape": [1000, 2000, 3000],
  "dimension_names": ["x", "y", "z"],
  "chunk_grid": {"name": "regular", "configuration": {"chunk_shape": [100, 200, 300]}},
  "chunk_key_encoding": {"name": "default"},
  "data_type": "uint16",
  "codecs": [{"name": "bytes", "configuration": {"endian": "little"}}],
  "fill_value": 0
}

the corresponding IndexDomain is:

{
  "exclusive_max": [[1000], [2000], [3000]],
  "inclusive_min": [0, 0, 0],
  "labels": ["x", "y", "z"]
}

Chunk layout

The ChunkLayout.write_chunk shape, specifying the granularity at which writes may be performed efficiently, corresponds to the top-level chunk_shape.

The ChunkLayout.grid_origin is always the zero vector.

The ChunkLayout.inner_order depends on the driver/zarr3/Metadata.codecs that are in use. With just the default driver/zarr3/Codec/bytes codec, the inner order is [0, 1, ..., n-1] (C order); this order may be altered by the driver/zarr3/Codec/transpose codec.

When no sharding codec is in use, the ChunkLayout.read_chunk is equal to the ChunkLayout.write_chunk shape.

When using a sharding codec, the ChunkLayout.read_chunk shape corresponds to the inner-most sub-chunk shape.

Selection of chunk layout when creating a new array

When creating a new array, the read and write chunk shapes may be constrained explicitly via ChunkLayout/Grid.shape or implicitly via ChunkLayout/Grid.aspect_ratio and ChunkLayout/Grid.elements. If ChunkLayout/Grid.elements is not specified, the default is 1 million elements per chunk. Suitable read and write chunk shapes are chosen automatically based on these constraints, in combination with any constraints implied by the specified metadata.

If the chosen read chunk shape is not equal to the chosen write chunk shape, a sharding codec is inserted into the codec chain automatically if not already specified.

If a ChunkLayout.inner_order constraint is specified, a driver/zarr3/Codec/transpose codec may be inserted automatically just before the inner-most array -> bytes codec.

Example of unconstrained chunk layout

>>> ts.open(
...     {
...         'driver': 'zarr3',
...         'kvstore': {
...             'driver': 'memory'
...         }
...     },
...     create=True,
...     dtype=ts.uint16,
...     shape=[1000, 2000, 3000],
... ).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [0, 1, 2],
  'read_chunk': {'shape': [101, 101, 101]},
  'write_chunk': {'shape': [101, 101, 101]},
})

Example of chunk layout with separate read and write chunk constraints

>>> ts.open(
...     {
...         'driver': 'zarr3',
...         'kvstore': {
...             'driver': 'memory'
...         }
...     },
...     create=True,
...     dtype=ts.uint16,
...     chunk_layout=ts.ChunkLayout(
...         chunk_aspect_ratio=[2, 1, 1],
...         read_chunk_elements=2000000,
...         write_chunk_elements=1000000000,
...     ),
...     shape=[1000, 2000, 3000],
... ).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [0, 1, 2],
  'read_chunk': {'shape': [200, 100, 100]},
  'write_chunk': {'shape': [1000, 1000, 1000]},
})

Example of chunk layout with explicit chunk shapes

>>> ts.open(
...     {
...         'driver': 'zarr3',
...         'kvstore': {
...             'driver': 'memory'
...         }
...     },
...     create=True,
...     dtype=ts.uint16,
...     chunk_layout=ts.ChunkLayout(
...         read_chunk_shape=[64, 64, 64],
...         write_chunk_shape=[512, 512, 512],
...     ),
...     shape=[1000, 2000, 3000],
... ).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [0, 1, 2],
  'read_chunk': {'shape': [64, 64, 64]},
  'write_chunk': {'shape': [512, 512, 512]},
})

Codec

Within the Schema.codec, the chunk codec chain is represented in the same way as in the driver/zarr3/Metadata:

json driver/zarr3/Codec : object
Extends:
Required members:
driver : "zarr3"
Optional members:
codecs : driver/zarr3/CodecChain

It is an error to specify any other Codec.driver.

Fill value

The Schema.fill_value must be a scalar (rank 0).

As an optimization, chunks that are entirely equal to the fill value are not stored.

Dimension units

The Schema.dimension_units property corresponds to the dimension_units and resolution metadata property. The base unit is used directly; it is not converted in any way.