zarr Driver

Zarr is a chunked array storage format based on the NumPy data type model.

The zarr driver provides access to Zarr-format arrays backed by any supported Key-Value Storage Layer. It supports reading, writing, creating new arrays, and resizing arrays.

Zarr supports arrays with structured data types specifying multiple named fields that are packed together. TensorStore fully supports such arrays, but each field must be opened separately.

json driver/zarr : object

Extends

KeyValueStoreBackedChunkDriver

Required members

driver : "zarr"
kvstore : KeyValueStore

Specifies the underlying storage mechanism.

Optional members

context : Context

Specifies context resources that augment/override the parent context.

dtype : dtype

Specifies the data type.

rank : integer[0, 32]

Specifies the rank of the TensorStore.

If transform is also specified, the input rank must match. Otherwise, the rank constraint applies to the driver directly.

transform : IndexTransform

Specifies a transform.

path : string = ""

Path within the KeyValueStore specified by kvstore.

Example

"path/to/data"
open : boolean

Open an existing TensorStore. If neither open nor create is specified, defaults to true.

create : boolean = false

Create a new TensorStore. Specify true for both open and create to permit either opening an existing TensorStore or creating a new TensorStore if it does not already exist.

delete_existing : boolean = false

Delete any existing data at the specified path before creating a new TensorStore. Requires that create is true, and that open is false.

cache_pool : ContextResource = "cache_pool"

Specifies or references a previously defined Context.cache_pool. It is normally more convenient to specify a default cache_pool in the context.

data_copy_concurrency : ContextResource = "data_copy_concurrency"

Specifies or references a previously defined Context.data_copy_concurrency. It is normally more convenient to specify a default data_copy_concurrency in the context.

recheck_cached_metadata : CacheRevalidationBound = "open"

Time after which cached metadata is assumed to be fresh. Cached metadata older than the specified time is revalidated prior to use. The metadata is used to check the bounds of every read or write operation.

Specifying true means that the metadata will be revalidated prior to every read or write operation. With the default value of "open", any cached metadata is revalidated when the TensorStore is opened but is not rechecked for each read or write operation.

recheck_cached_data : CacheRevalidationBound = true

Time after which cached data is assumed to be fresh. Cached data older than the specified time is revalidated prior to being returned from a read operation. Partial chunk writes are always consistent regardless of the value of this option.

The default value of true means that cached data is revalidated on every read. To enable in-memory data caching, you must both specify a cache_pool with a non-zero total_bytes_limit and also specify false, "open", or an explicit time bound for recheck_cached_data.

field : string | null = null

Name of field to open.

Must be specified if the metadata.dtype specified in the array metadata has more than one field.

metadata : object

Zarr array metadata.

Specifies constraints on the metadata, exactly as in the .zarray metadata file, except that all members are optional. When creating a new array, the new metadata is obtained by combining these metadata constraints with any Schema constraints.

Optional members

zarr_format : 2
shape : array of integer[0, +∞)

Chunked dimensions of the array.

Required when creating a new array if the Schema.domain is not otherwise specified.

Example

[500, 500, 500]
chunks : array of integer[1, +∞)

Chunk dimensions.

Specifies the chunk size for each chunked dimension. Must have the same length as shape. If not specified when creating a new array, the chunk dimensions are chosen automatically according to the Schema.chunk_layout.

Example

[64, 64, 64]
dtype

Specifies the scalar or structured data type.

Refer to the Zarr data type encoding specification. As an extension, TensorStore also supports "bfloat16" for specifying the bfloat16 data type with little endian byte order.

fill_value

Specifies the fill value.

When creating a new array, defaults to null.

order : "C" | "F"

Specifies the data layout for encoded chunks.

"C" for C order, "F" for Fortran order. When creating a new array, defaults to "C".

compressor : driver/zarr/Compressor | null

Specifies the chunk compression method.

Specifies the chunk compressor. Specifying null disables compression. When creating a new array, if not specified, the default compressor of {"id": "blosc"} is used.

filters : null

Specifies the filters to apply to chunks.

When encoding chunk, filters are applied before the compressor. Currently, filters are not supported.

dimension_separator : "." | "/"

Specifies the encoding of chunk indices into key-value store keys.

The default value of "." corresponds to the default encoding used by Zarr, while "/" corresponds to the encoding used by zarr.storage.NestedDirectoryStore.

key_encoding : "." | "/" = "."

Specifies the encoding of chunk indices into key-value store keys.

Deprecated. Equivalent to specifying metadata.dimension_separator.

Example

{
  "driver": "zarr",
  "kvstore": {"driver": "gcs", "bucket": "my-bucket"},
  "path": "path/to/array",
  "key_encoding": ".",
  "metadata": {
    "shape": [1000, 1000],
    "chunks": [100, 100],
    "dtype": "<i2",
    "order": "C",
    "compressor": {"id": "blosc", "shuffle": -1, "clevel": 5, "cname": "lz4"}
  }
}

Compressors

Chunk data is encoded according to the driver/zarr.metadata.compressor specified in the metadata.

json driver/zarr/Compressor : object

Compressor

The id member identifies the compressor. The remaining members are specific to the compressor.

Subtypes

driver/zarr/Compressor/blosc driver/zarr/Compressor/bz2 driver/zarr/Compressor/zlib

Required members

id : string

Identifies the compressor.

The following compressors are supported:

json driver/zarr/Compressor/zlib : object

Specifies zlib compression with a zlib or gzip header.

Extends

driver/zarr/Compressor

Required members

id : "zlib" | "gzip"

Optional members

level : integer[0, 9] = 1

Specifies the zlib compression level to use.

Level 0 indicates no compression (fastest), while level 9 indicates the best compression ratio (slowest).

Example

{"id": "gzip", "level": 9}
json driver/zarr/Compressor/blosc : object

Specifies Blosc compression.

Extends

driver/zarr/Compressor

Required members

id : "blosc"

Optional members

cname : "blosclz" | "lz4" | "lz4hc" | "snappy" | "zlib" | "zstd" = "lz4"

Specifies the compression method used by Blosc.

clevel : integer[0, 9] = 5

Specifies the Blosc compression level to use.

Higher values are slower but achieve a higher compression ratio.

shuffle : -1 | 0 | 1 | 2 = -1

One of

-1

Automatic shuffle.

Bit-wise shuffle if the element size is 1 byte, otherwise byte-wise shuffle.

0

No shuffle

1

Byte-wise shuffle

2

Bit-wise shuffle

blocksize : integer[0, +∞)

Specifies the Blosc blocksize.

The default value of 0 causes the block size to be chosen automatically.

Example

{"id": "blosc", "cname": "blosclz", "clevel": 9, "shuffle": 2}
json driver/zarr/Compressor/bz2 : object

Specifies bzip2 compression.

Extends

driver/zarr/Compressor

Required members

id : "bz2"

Optional members

level : integer[1, 9] = 1

Specifies the bzip2 buffer size/compression level to use.

A level of 1 indicates the smallest buffer (fastest), while level 9 indicates the best compression ratio (slowest).

Mapping to TensorStore Schema

Example with scalar data type

For the following zarr metadata:

{
  "zarr_format": 2,
  "shape": [1000, 2000, 3000],
  "chunks": [100, 200, 300],
  "dtype": "<u2",
  "compressor": null,
  "fill_value": 42,
  "order": "C",
  "filters": null
}

the corresponding Schema is:

{
  "chunk_layout": {
    "grid_origin": [0, 0, 0],
    "inner_order": [0, 1, 2],
    "read_chunk": {"shape": [100, 200, 300]},
    "write_chunk": {"shape": [100, 200, 300]}
  },
  "codec": {"compressor": null, "driver": "zarr", "filters": null},
  "domain": {"exclusive_max": [[1000], [2000], [3000]], "inclusive_min": [0, 0, 0]},
  "dtype": "uint16",
  "fill_value": 42,
  "rank": 3
}

Example with structured data type

For the following zarr metadata:

{
  "zarr_format": 2,
  "shape": [1000, 2000, 3000],
  "chunks": [100, 200, 300],
  "dtype": [["x", "<u2", [2, 3]], ["y", "<f4", [5]]],
  "compressor": {"id": "blosc", "cname": "lz4", "clevel": 5, "shuffle": 1},
  "fill_value": "AQACAAMABAAFAAYAAAAgQQAAMEEAAEBBAABQQQAAYEE=",
  "order": "F",
  "filters": null
}

the corresponding Schema for the "x" field is:

{
  "chunk_layout": {
    "grid_origin": [0, 0, 0, 0, 0],
    "inner_order": [2, 1, 0, 3, 4],
    "read_chunk": {"shape": [100, 200, 300, 2, 3]},
    "write_chunk": {"shape": [100, 200, 300, 2, 3]}
  },
  "codec": {
    "compressor": {"blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1},
    "driver": "zarr",
    "filters": null
  },
  "domain": {
    "exclusive_max": [[1000], [2000], [3000], 2, 3],
    "inclusive_min": [0, 0, 0, 0, 0]
  },
  "dtype": "uint16",
  "fill_value": [[1, 2, 3], [4, 5, 6]],
  "rank": 5
}

and the corresponding Schema for the "y" field is:

{
  "chunk_layout": {
    "grid_origin": [0, 0, 0, 0],
    "inner_order": [2, 1, 0, 3],
    "read_chunk": {"shape": [100, 200, 300, 5]},
    "write_chunk": {"shape": [100, 200, 300, 5]}
  },
  "codec": {
    "compressor": {"blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1},
    "driver": "zarr",
    "filters": null
  },
  "domain": {"exclusive_max": [[1000], [2000], [3000], 5], "inclusive_min": [0, 0, 0, 0]},
  "dtype": "float32",
  "fill_value": [10.0, 11.0, 12.0, 13.0, 14.0],
  "rank": 4
}

Data type

Zarr scalar data types map to TensorStore data types as follows:

Supported data types

TensorStore data type

Zarr data type

Little endian

Big endian

bool

"|b1"

uint8

"|u1"

int8

"|i1"

uint16

"<u2"

">u2"

int16

"<i2"

">i2"

uint32

"<u2"

">u2"

int32

"<i4"

">i4"

uint64

"<u8"

">u8"

int64

"<i8"

">i8"

float16

"<f2"

">f2"

bfloat16

"bfloat16"

float32

"<f4"

">f4"

float64

"<f8"

">f8"

complex64

"<c8"

">c8"

complex128

"<c16"

">c16"

Zarr structured data types are supported, but are represented in TensorStore as scalar arrays with additional dimensions.

When creating a new array, if a driver/zarr.metadata.dtype is not specified explicitly, a scalar Zarr data type with the native endianness is chosen based on the Schema.dtype. To create an array with non-native endianness or a structured data type, the zarr driver/zarr.metadata.dtype must be specified explicitly.

Note

TensorStore supports the non-standard bfloat16 data type as an extension. On little endian platforms, the official Zarr Python library is capable of reading arrays created with the bfloat16 data type provided that a bfloat16 numpy data type has been registered. The TensorStore Python library registers such a data type, as does TensorFlow and JAX.

Warning

zarr datetime/timedelta data types are not currently supported.

Domain

The Schema.domain includes both the chunked dimensions as well as any subarray dimensions in the case of a structured data type.

Example with scalar data type

If the driver/zarr.metadata.dtype is "<u2" and the driver/zarr.metadata.shape is [100, 200], then the Schema.domain is {"shape": [[100], [200]]}.

Example with structured data type

If the driver/zarr.metadata.dtype is [["x", "<u2", [2, 3]]], and the driver/zarr.metadata.shape is [100, 200], then the Schema.domain is {"shape": [[100], [200], 2, 3]}.

As zarr does not natively support a non-zero origin, the underlying domain always has a zero origin (IndexDomain.inclusive_min is all zero), but it may be translated by the transform.

The upper bounds of the chunked dimensions are resizable (i.e. implicit<implicit-bounds>), while the upper bounds of any subarray dimensions are not resizable.

The zarr metadata format does not support persisting dimension labels<dimension-labels>, but dimension labels may still be specified when opening using a transform.

Chunk layout

The zarr format supports a single driver/zarr.metadata.chunks property that corresponds to the ChunkLayout/Grid.shape constraint. As with the Schema.domain, the Schema.chunk_layout includes both the chunked dimensions as well as any subarray dimensions in the case of a structured data type. The chunk size for subarray dimensions is always the full extent.

Example with scalar data type

If the driver/zarr.metadata.dtype is "<u2" and driver/zarr.metadata.chunks is [100, 200], then the ChunkLayout/Grid.shape is [100, 200].

Example with structured data type

If the driver/zarr.metadata.dtype is [["x", "<u2", [2, 3]]], and driver/zarr.metadata.chunks is [100, 200], then the ChunkLayout/Grid.shape is [100, 200, 2, 3].

As the zarr format supports only a single level of chunking, the ChunkLayout.read_chunk and ChunkLayout.write_chunk constraints are combined, and hard constraints on ChunkLayout.codec_chunk must not be specified.

The ChunkLayout.grid_origin is always all-zero.

The ChunkLayout.inner_order corresponds to driver/zarr.metadata.order, but also includes the subarray dimensions, which are always the inner-most dimensions.

Example with scalar data type and C order

If the driver/zarr.metadata.dtype is "<u2", driver/zarr.metadata.order is "C", and there are 3 chunked dimensions, then the ChunkLayout.inner_order is [0, 1, 2].

Example with scalar data type and Fortran order

If the driver/zarr.metadata.dtype is "<u2", driver/zarr.metadata.order is "F", and there are 3 chunked dimensions, then the ChunkLayout.inner_order is [2, 1, 0].

Example with structured data type and C order

If the driver/zarr.metadata.dtype is [["x", "<u2", [2, 3]]], driver/zarr.metadata.order is "C", and there are 3 chunked dimensions, then the ChunkLayout.inner_order is [0, 1, 2, 3, 4].

Example with structured data type and Fortran order

If the driver/zarr.metadata.dtype is [["x", "<u2", [2, 3]]], driver/zarr.metadata.order is "F", and there are 3 chunked dimensions, then the ChunkLayout.inner_order is [2, 1, 0, 3, 4].

Selection of chunk layout when creating a new array

When creating a new array, the chunk sizes may be specified explicitly via ChunkLayout/Grid.shape or implicitly via ChunkLayout/Grid.aspect_ratio and ChunkLayout/Grid.elements. In the latter case, a suitable chunk shape is chosen automatically. If ChunkLayout/Grid.elements is not specified, the default is 1 million elements per chunk:

Example of unconstrained chunk layout

>>> ts.open({
...     'driver': 'zarr',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...         create=True,
...         dtype=ts.uint16,
...         shape=[1000, 2000, 3000]).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [0, 1, 2],
  'read_chunk': {'shape': [102, 102, 102]},
  'write_chunk': {'shape': [102, 102, 102]},
})

Example of explicit chunk shape constraint

>>> ts.open({
...     'driver': 'zarr',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...         create=True,
...         dtype=ts.uint16,
...         shape=[1000, 2000, 3000],
...         chunk_layout=ts.ChunkLayout(
...             chunk_shape=[100, 200, 300])).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [0, 1, 2],
  'read_chunk': {'shape': [100, 200, 300]},
  'write_chunk': {'shape': [100, 200, 300]},
})

Example of chunk aspect ratio constraint

>>> ts.open({
...     'driver': 'zarr',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...         create=True,
...         dtype=ts.uint16,
...         shape=[1000, 2000, 3000],
...         chunk_layout=ts.ChunkLayout(
...             chunk_aspect_ratio=[1, 2, 2])).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [0, 1, 2],
  'read_chunk': {'shape': [64, 128, 128]},
  'write_chunk': {'shape': [64, 128, 128]},
})

Example of chunk aspect ratio and elements constraint

>>> ts.open({
...     'driver': 'zarr',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...         create=True,
...         dtype=ts.uint16,
...         shape=[1000, 2000, 3000],
...         chunk_layout=ts.ChunkLayout(
...             chunk_aspect_ratio=[1, 2, 2],
...             chunk_elements=2000000)).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [0, 1, 2],
  'read_chunk': {'shape': [79, 159, 159]},
  'write_chunk': {'shape': [79, 159, 159]},
})

Codec

Within the Schema.codec, the compression parameters are represented in the same way as in the metadata:

json driver/zarr/Codec : object

Extends

Codec

Required members

driver : "zarr"

Optional members

compressor : driver/zarr/Compressor | null

Specifies the chunk compression method.

Specifies the chunk compressor. Specifying null disables compression. When creating a new array, if not specified, the default compressor of {"id": "blosc"} is used.

filters : null

Specifies the filters to apply to chunks.

When encoding chunk, filters are applied before the compressor. Currently, filters are not supported.

It is an error to specify any other Codec.driver.

Fill value

For scalar zarr data types, the Schema.fill_value must be a scalar (rank 0). For structured data types, the Schema.fill_value must be broadcastable to the subarray shape.

The zarr format allows the fill value to be unspecified, indicated by a driver/zarr.metadata.fill_value of null. In that case, TensorStore always uses a fill value of 0.

Limitations

Filters are not supported.