zarr_sharding_indexed Key-Value Store driver

The zarr_sharding_indexed driver implements support for stored representation used by the Zarr v3 sharding_indexed codec on top of a base key-value store.

For a grid of rank n, keys must be n * 4 bytes long, specifying the grid cell indices, where 0 <= grid_cell_indices[i] < grid_shape[i], as n consecutive uint32be values.

json kvstore/zarr_sharding_indexed : object

Read/write adapter for the zarr v3 sharded_indexed format.

JSON specification of the key-value store.

Extends:
  • KvStore — Key-value store specification.

Required members:
driver : "zarr_sharding_indexed"
base : KvStore

Underlying key-value store with path to shard.

grid_shape : kvstore/neuroglancer_uint64_sharded/ShardingSpec

Shape of the grid of entries in the shard.

index_codecs : driver/zarr3/CodecChain

Codec chain for encoding/decoding the shard index.

Optional members:
path : string

Key prefix within the key-value store.

If the prefix is intended to correspond to a Unix-style directory path, it should end with "/".

context : Context

Specifies context resources that augment/override the parent context.

index_location : "start" | "end" = "end"

Location of the shard index within the shard.

cache_pool : ContextResource = "cache_pool"

Specifies or references a previously defined Context.cache_pool. It is normally more convenient to specify a default cache_pool in the context.

Important

It is very helpful to specify a cache pool with a non-zero total_bytes_limit value. Otherwise, every read operation will require an additional read to obtain the shard index.

data_copy_concurrency : ContextResource = "data_copy_concurrency"

Specifies or references a previously defined Context.data_copy_concurrency. It is normally more convenient to specify a default data_copy_concurrency in the context.

Example JSON specifications

{
  "driver": "zarr_sharding_indexed",
  "kvstore": "gs://my-bucket/path/to/sharded/data",
  "grid_shape": [32, 128],
  "index_codecs" [
    {"name": "bytes", "configuration": {"endian": "little"}},
    {"name": "crc32c"}
  ],
  "context": {
    "cache_pool": {"total_bytes_limit": 1000000000}
  }
}

Limitations

It is strongly recommended to use a transaction when writing, and group writes. Otherwise, there may be significant write amplification due to repeatedly re-writing the entire shard.