n5 Driver

The n5 driver provides access to N5 arrays backed by any supported Key-Value Storage Layer. It supports reading, writing, creating new datasets, and resizing datasets.

json driver/n5 : object
Extends:
Required members:
driver : "n5"
kvstore : KvStore | KvStoreUrl

Specifies the underlying storage mechanism.

Optional members:
context : Context

Specifies context resources that augment/override the parent context.

dtype : dtype

Specifies the data type.

rank : integer[0, 32]

Specifies the rank of the TensorStore.

If transform is also specified, the input rank must match. Otherwise, the rank constraint applies to the driver directly.

transform : IndexTransform

Specifies a transform.

schema : Schema

Specifies constraints on the schema.

When opening an existing array, specifies constraints on the existing schema; opening will fail if the constraints do not match. Any soft constraints specified in the chunk_layout are ignored. When creating a new array, a suitable schema will be selected automatically based on the specified schema constraints in combination with any driver-specific constraints.

path : string = ""

Additional path within the KvStore specified by kvstore.

This is joined as an additional "/"-separated path component after any path member directly within kvstore. This is supported for backwards compatibility only; the KvStore.path member should be used instead.

Example

"path/to/data"
open : boolean

Open an existing TensorStore. If neither open nor create is specified, defaults to true.

create : boolean = false

Create a new TensorStore. Specify true for both open and create to permit either opening an existing TensorStore or creating a new TensorStore if it does not already exist.

delete_existing : boolean = false

Delete any existing data at the specified path before creating a new TensorStore. Requires that create is true, and that open is false.

assume_metadata : boolean = false

Neither read nor write stored metadata. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata need not even exist. Operations such as resizing that modify the stored metadata are not supported. Requires that open is true and delete_existing is false. This option takes precedence over assume_cached_metadata if that option is also specified.

Warning

This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.

assume_cached_metadata : boolean = false

Skip reading the metadata when opening. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata may still be accessed by subsequent operations that need to re-validate or modify the metadata. Requires that open is true and delete_existing is false. The assume_metadata option takes precedence if also specified.

Note

Unlike the assume_metadata option, operations such as resizing that modify the stored metadata are supported (and access the stored metadata).

Warning

This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.

cache_pool : ContextResource = "cache_pool"

Cache pool for data.

Specifies or references a previously defined Context.cache_pool. It is normally more convenient to specify a default cache_pool in the context.

metadata_cache_pool : ContextResource

Cache pool for metadata only.

Specifies or references a previously defined Context.cache_pool. If not specified, defaults to the value of cache_pool.

data_copy_concurrency : ContextResource = "data_copy_concurrency"

Specifies or references a previously defined Context.data_copy_concurrency. It is normally more convenient to specify a default data_copy_concurrency in the context.

recheck_cached_metadata : CacheRevalidationBound = "open"

Time after which cached metadata is assumed to be fresh. Cached metadata older than the specified time is revalidated prior to use. The metadata is used to check the bounds of every read or write operation.

Specifying true means that the metadata will be revalidated prior to every read or write operation. With the default value of "open", any cached metadata is revalidated when the TensorStore is opened but is not rechecked for each read or write operation.

recheck_cached_data : CacheRevalidationBound = true

Time after which cached data is assumed to be fresh. Cached data older than the specified time is revalidated prior to being returned from a read operation. Partial chunk writes are always consistent regardless of the value of this option.

The default value of true means that cached data is revalidated on every read. To enable in-memory data caching, you must both specify a cache_pool with a non-zero total_bytes_limit and also specify false, "open", or an explicit time bound for recheck_cached_data.

fill_missing_data_reads = true

Replace missing chunks with the fill value when reading.

If disabled, reading a missing chunk will result in an error. Note that the fill value may still be used when writing a partial chunk. Typically this should only be set to false in the case that store_data_equal_to_fill_value was enabled when writing.

store_data_equal_to_fill_value = false

Store all explicitly written data, even if it is equal to the fill value.

This ensures that explicitly written data, even if it is equal to the fill value, can be distinguished from missing data. If disabled, chunks equal to the fill value may be represented as missing chunks.

metadata : object

N5 array metadata.

Specifies constraints on the metdata of a dataset exactly as in the attributes.json file, except that all members are optional. When creating a new array, the new metadata is obtained by combining these metadata constraints with any Schema constraints.

Arbitrary additional members may also be specified in addition to the ones listed here. When creating a new array, they will be included in the attributes.json file as additional N5 attributes, but will not be validated in any way. When opening an existing array, all additional members that are specified must be present with identical values in the existing attributes.json file, or the open operation will fail.

Optional members:
dimensions : array of integer[0, +∞)

Dimensions of the dataset.

Required when creating a new array if the Schema.domain is not otherwise specified.

Example

[500, 500, 500]
blockSize : array of integer[1, +∞)

Chunk dimensions.

Specifies the chunk size for each dimension. Must have the same length as dimensions. If not specified when creating a new array, the chunk dimensions are chosen automatically according to the Schema.chunk_layout.

Example

[64, 64, 64]
dataType : "uint8" | "uint16" | "uint32" | "uint64" | "int8" | "int16" | "int32" | "int64" | "float32" | "float64"

Specifies the data type.

Required when creating a new array if Schema.dtype is not otherwise specified.

axes : array of string

Specifies a label for each dimension of the dataset.

Optional. If not specified when creating a new array (and also unspecified by the Schema.domain), all dimensions are unlabeled (equivalent to specifying an empty string for each dimension). Labels are specified in the same order as the dimensions and blockSize properties. Note that this specifies the stored dimension labels. As with any TensorStore driver, dimension labels may also be overridden by specifying a transform.

Example

["x", "y", "z"]
units : array of string

Specifies the base physical unit for each dimension.

Optional. Must have the same length as dimensions.

Example

["nm", "nm", "nm", "s"]
resolution : array of number

Specifies the multiplier for the physical units.

Optional. Must have the same length as dimensions. If resolution is not specified but units is specified, the multipliers are assumed to be all 1. Normally, resolution should only be specified if units is also specified; if resolution is specified but units is not specified, the Schema.dimension_units will be considered unspecified.

Example

[4, 4, 40, 0.5]
compression : driver/n5/Compression

Specifies the chunk compression method.

Compression

json driver/n5/Compression : object

The type member identifies the compression method. The remaining members are specific to the compression method.

Subtypes:
Required members:
type : string

Identifies the compressor.

The following compression methods are supported:

json driver/n5/Compression/raw : object

Chunks are encoded directly as big endian values without compression.

Extends:
Required members:
type : "raw"
json driver/n5/Compression/gzip : object

Specifies zlib compression with a gzip or zlib header.

Extends:
Required members:
type : "gzip"
Optional members:
level : integer[-1, 9] = -1

Specifies the zlib compression level to use.

Level 0 indicates no compression (fastest), while level 9 indicaets the best compression ratio (slowest). The default value of -1 indicates to use the zlib default compression level (equal to 6).

useZlib : boolean = false

If true, use a zlib header. Otherwise, use a gzip header.

json driver/n5/Compression/bzip2 : object

Specifies bzip2 compression.

Extends:
Required members:
type : "bzip2"
Optional members:
blockSize : integer[1, 9] = 9

Specifies the bzip2 block size to use (in units of 100KB), which also determine the compression level.

json driver/n5/Compression/xz : object

Specifies xz compression.

Extends:
Required members:
type : "xz"
Optional members:
preset : integer[0, 9] = 6

Specifies the XZ preset level. Preset 0 corresponds to the fastest compression with the worst compression ratio, while preset 9 corresponds to the slowest compression with the best compression ratio.

json driver/n5/Compression/blosc : object

Specifies Blosc compression.

Extends:
Required members:
type : "blosc"
cname : "blosclz" | "lz4" | "lz4hc" | "snappy" | "zlib" | "zstd"

Specifies the compression method used by Blosc.

clevel : integer[0, 9]

Specifies the Blosc compression level to use.

Higher values are slower but achieve a higher compression ratio.

shuffle : 0 | 1 | 2
One of:
0

No shuffle

1

Byte-wise shuffle

2

Bit-wise shuffle

Example

{"type": "blosc", "cname": "blosclz", "clevel": 9, "shuffle": 2}

Mapping to TensorStore Schema

Example

For the following N5 metadata:

{
  "dimensions": [1000, 2000, 3000],
  "blockSize": [100, 200, 300],
  "dataType": "uint16",
  "compression": {"type": "raw"}
}

the corresponding Schema is:

{
  "chunk_layout": {
    "grid_origin": [0, 0, 0],
    "inner_order": [2, 1, 0],
    "read_chunk": {"shape": [100, 200, 300]},
    "write_chunk": {"shape": [100, 200, 300]}
  },
  "codec": {"compression": {"type": "raw"}, "driver": "n5"},
  "domain": {"exclusive_max": [[1000], [2000], [3000]], "inclusive_min": [0, 0, 0]},
  "dtype": "uint16",
  "rank": 3
}

Data type

N5 data types map to TensorStore data types of the same name:

Note that internally the N5 format always uses big endian encoding.

Domain

The shape of the Schema.domain corresponds to driver/n5.metadata.dimensions.

Dimension labels may be specified in the Schema.domain, and correspond to driver/n5.metadata.axes.

The upper bounds of the domain are resizable (i.e. implicit).

As N5 does not natively support a non-zero origin, the underlying domain always has a zero origin (IndexDomain.inclusive_min is all zero), but it may be translated by the transform.

Example

For the following N5 metadata:

{
  "dimensions": [1000, 2000, 3000],
  "blockSize": [100, 200, 300],
  "dataType": "uint16",
  "compression": {"type": "raw"}
}

the corresponding IndexDomain is:

{"exclusive_max": [[1000], [2000], [3000]], "inclusive_min": [0, 0, 0]}

Chunk layout

The N5 format supports a single driver/n5.metadata.blockSize property that corresponds to the ChunkLayout/Grid.shape constraint.

Example

For the following N5 metadata:

{
  "dimensions": [1000, 2000, 3000],
  "blockSize": [100, 200, 300],
  "dataType": "uint16",
  "compression": {"type": "raw"}
}

the corresponding ChunkLayout is:

{
  "grid_origin": [0, 0, 0],
  "inner_order": [2, 1, 0],
  "read_chunk": {"shape": [100, 200, 300]},
  "write_chunk": {"shape": [100, 200, 300]}
}

The ChunkLayout.grid_origin is always all-zero.

As the N5 format supports only a single level of chunking, the ChunkLayout.read_chunk and ChunkLayout.write_chunk constraints are combined, and hard constraints on ChunkLayout.codec_chunk must not be specified.

The N5 format always stores the data within chunks in colexicographic order (i.e. Fortran order).

Selection of chunk layout when creating a new array

When creating a new array, the chunk shape may be constrained explicitly via ChunkLayout/Grid.shape or implicitly via ChunkLayout/Grid.aspect_ratio and ChunkLayout/Grid.elements. A suitable chunk shape is chosen automatically based on these constraints. If ChunkLayout/Grid.elements is not specified, the default is 1 million elements per chunk:

Example of unconstrained chunk layout

>>> ts.open({
...     'driver': 'n5',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...         create=True,
...         dtype=ts.uint16,
...         shape=[1000, 2000, 3000]).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [2, 1, 0],
  'read_chunk': {'shape': [101, 101, 101]},
  'write_chunk': {'shape': [101, 101, 101]},
})

Example of explicit chunk shape constraint

>>> ts.open({
...     'driver': 'n5',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...         create=True,
...         dtype=ts.uint16,
...         shape=[1000, 2000, 3000],
...         chunk_layout=ts.ChunkLayout(
...             chunk_shape=[100, 200, 300])).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [2, 1, 0],
  'read_chunk': {'shape': [100, 200, 300]},
  'write_chunk': {'shape': [100, 200, 300]},
})

Example of chunk aspect ratio constraint

>>> ts.open({
...     'driver': 'n5',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...         create=True,
...         dtype=ts.uint16,
...         shape=[1000, 2000, 3000],
...         chunk_layout=ts.ChunkLayout(
...             chunk_aspect_ratio=[1, 2, 2])).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [2, 1, 0],
  'read_chunk': {'shape': [64, 128, 128]},
  'write_chunk': {'shape': [64, 128, 128]},
})

Example of chunk aspect ratio and elements constraint

>>> ts.open({
...     'driver': 'n5',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...         create=True,
...         dtype=ts.uint16,
...         shape=[1000, 2000, 3000],
...         chunk_layout=ts.ChunkLayout(
...             chunk_aspect_ratio=[1, 2, 2],
...             chunk_elements=2000000)).result().chunk_layout
ChunkLayout({
  'grid_origin': [0, 0, 0],
  'inner_order': [2, 1, 0],
  'read_chunk': {'shape': [79, 159, 159]},
  'write_chunk': {'shape': [79, 159, 159]},
})

Codec

Within the Schema.codec, the compression parameters are represented in the same way as in the metadata:

json driver/n5/Codec : object
Extends:
Required members:
driver : "n5"
Optional members:
compression : driver/n5/Compression

Specifies the chunk compression method.

It is an error to specify any other Codec.driver.

Fill value

The N5 metadata format does not support specifying a fill value. TensorStore always assumes a fill value of 0.

Dimension units

The Schema.dimension_units correspond to the units and resolution metadata properties. The base unit is used directly; it is not converted in any way.

The N5 format requires that dimension units are specified either for all dimensions, or for no dimensions; it is not possible to specify dimension units for some dimensions while leaving the dimension units of the remaining dimensions unspecified. When creating a new dataset, if dimension units are specified for at least one dimension, any dimensions for which the unit is unspecified are assigned a dimensionless unit of 1.

Limitations

Datasets with varlength chunks are not supported.