n5
Driver¶
The n5
driver provides access to N5
arrays backed by any supported Key-Value Storage Layer. It supports reading,
writing, creating new datasets, and resizing datasets.
- json driver/n5 : object¶
- Extends:¶
- Required members:¶
-
driver :
"n5"
¶
- kvstore : KvStore | KvStoreUrl¶
Specifies the underlying storage mechanism.
-
driver :
- Optional members:¶
-
-
rank : integer[
0
,32
]¶ Specifies the rank of the TensorStore.
If
transform
is also specified, the input rank must match. Otherwise, the rank constraint applies to the driver directly.
- transform : IndexTransform¶
Specifies a transform.
- schema : Schema¶
Specifies constraints on the schema.
When opening an existing array, specifies constraints on the existing schema; opening will fail if the constraints do not match. Any soft constraints specified in the
chunk_layout
are ignored. When creating a new array, a suitable schema will be selected automatically based on the specified schema constraints in combination with any driver-specific constraints.
-
path : string =
""
¶ -
This is joined as an additional
"/"
-separated path component after anypath
member directly withinkvstore
. This is supported for backwards compatibility only; theKvStore.path
member should be used instead.Example
"path/to/data"
- open : boolean¶
Open an existing TensorStore. If neither
open
norcreate
is specified, defaults totrue
.
-
create : boolean =
false
¶ Create a new TensorStore. Specify
true
for bothopen
andcreate
to permit either opening an existing TensorStore or creating a new TensorStore if it does not already exist.
-
delete_existing : boolean =
false
¶ Delete any existing data at the specified path before creating a new TensorStore. Requires that
create
istrue
, and thatopen
isfalse
.
-
assume_metadata : boolean =
false
¶ Neither read nor write stored metadata. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata need not even exist. Operations such as resizing that modify the stored metadata are not supported. Requires that
open
istrue
anddelete_existing
isfalse
. This option takes precedence overassume_cached_metadata
if that option is also specified.Warning
This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.
-
assume_cached_metadata : boolean =
false
¶ Skip reading the metadata when opening. Instead, just assume any necessary metadata based on constraints in the spec, using the same defaults for any unspecified metadata as when creating a new TensorStore. The stored metadata may still be accessed by subsequent operations that need to re-validate or modify the metadata. Requires that
open
istrue
anddelete_existing
isfalse
. Theassume_metadata
option takes precedence if also specified.Note
Unlike the
assume_metadata
option, operations such as resizing that modify the stored metadata are supported (and access the stored metadata).Warning
This option can lead to data corruption if the assumed metadata does not match the stored metadata, or multiple concurrent writers use different assumed metadata.
-
cache_pool : ContextResource =
"cache_pool"
¶ Cache pool for data.
Specifies or references a previously defined
Context.cache_pool
. It is normally more convenient to specify a defaultcache_pool
in thecontext
.
- metadata_cache_pool : ContextResource¶
Cache pool for metadata only.
Specifies or references a previously defined
Context.cache_pool
. If not specified, defaults to the value ofcache_pool
.
-
data_copy_concurrency : ContextResource =
"data_copy_concurrency"
¶ Specifies or references a previously defined
Context.data_copy_concurrency
. It is normally more convenient to specify a defaultdata_copy_concurrency
in thecontext
.
-
recheck_cached_metadata : CacheRevalidationBound =
"open"
¶ Time after which cached metadata is assumed to be fresh. Cached metadata older than the specified time is revalidated prior to use. The metadata is used to check the bounds of every read or write operation.
Specifying
true
means that the metadata will be revalidated prior to every read or write operation. With the default value of"open"
, any cached metadata is revalidated when the TensorStore is opened but is not rechecked for each read or write operation.
-
recheck_cached_data : CacheRevalidationBound =
true
¶ Time after which cached data is assumed to be fresh. Cached data older than the specified time is revalidated prior to being returned from a read operation. Partial chunk writes are always consistent regardless of the value of this option.
The default value of
true
means that cached data is revalidated on every read. To enable in-memory data caching, you must both specify acache_pool
with a non-zerototal_bytes_limit
and also specifyfalse
,"open"
, or an explicit time bound forrecheck_cached_data
.
-
fill_missing_data_reads =
true
¶ Replace missing chunks with the fill value when reading.
If disabled, reading a missing chunk will result in an error. Note that the fill value may still be used when writing a partial chunk. Typically this should only be set to
false
in the case thatstore_data_equal_to_fill_value
was enabled when writing.
-
store_data_equal_to_fill_value =
false
¶ Store all explicitly written data, even if it is equal to the fill value.
This ensures that explicitly written data, even if it is equal to the fill value, can be distinguished from missing data. If disabled, chunks equal to the fill value may be represented as missing chunks.
- metadata : object¶
N5 array metadata.
Specifies constraints on the metdata of a dataset exactly as in the attributes.json file, except that all members are optional. When creating a new array, the new metadata is obtained by combining these metadata constraints with any
Schema
constraints.Arbitrary additional members may also be specified in addition to the ones listed here. When creating a new array, they will be included in the
attributes.json
file as additional N5 attributes, but will not be validated in any way. When opening an existing array, all additional members that are specified must be present with identical values in the existingattributes.json
file, or the open operation will fail.- Optional members:¶
-
dimensions : array of integer[
0
, +∞)¶ Dimensions of the dataset.
Required when creating a new array if the
Schema.domain
is not otherwise specified.Example
[500, 500, 500]
-
blockSize : array of integer[
1
, +∞)¶ Chunk dimensions.
Specifies the chunk size for each dimension. Must have the same length as
dimensions
. If not specified when creating a new array, the chunk dimensions are chosen automatically according to theSchema.chunk_layout
.Example
[64, 64, 64]
-
dataType :
"uint8"
|"uint16"
|"uint32"
|"uint64"
|"int8"
|"int16"
|"int32"
|"int64"
|"float32"
|"float64"
¶ Specifies the data type.
Required when creating a new array if
Schema.dtype
is not otherwise specified.
- axes : array of string¶
Specifies a label for each dimension of the dataset.
Optional. If not specified when creating a new array (and also unspecified by the
Schema.domain
), all dimensions are unlabeled (equivalent to specifying an empty string for each dimension). Labels are specified in the same order as thedimensions
andblockSize
properties. Note that this specifies the stored dimension labels. As with any TensorStore driver, dimension labels may also be overridden by specifying atransform
.Example
["x", "y", "z"]
- units : array of string¶
Specifies the base physical unit for each dimension.
Optional. Must have the same length as
dimensions
.Example
["nm", "nm", "nm", "s"]
- resolution : array of number¶
Specifies the multiplier for the physical
units
.Optional. Must have the same length as
dimensions
. Ifresolution
is not specified butunits
is specified, the multipliers are assumed to be all1
. Normally,resolution
should only be specified ifunits
is also specified; ifresolution
is specified butunits
is not specified, theSchema.dimension_units
will be considered unspecified.Example
[4, 4, 40, 0.5]
- compression : driver/n5/Compression¶
Specifies the chunk compression method.
-
dimensions : array of integer[
-
rank : integer[
Compression¶
- json driver/n5/Compression : object¶
The
type
member identifies the compression method. The remaining members are specific to the compression method.- Subtypes:¶
The following compression methods are supported:
- json driver/n5/Compression/raw : object¶
Chunks are encoded directly as big endian values without compression.
- Extends:¶
- json driver/n5/Compression/gzip : object¶
Specifies zlib compression with a gzip or zlib header.
- Extends:¶
- Optional members:¶
-
level : integer[
-1
,9
] =-1
¶ Specifies the zlib compression level to use.
Level 0 indicates no compression (fastest), while level 9 indicaets the best compression ratio (slowest). The default value of
-1
indicates to use the zlib default compression level (equal to 6).
-
useZlib : boolean =
false
¶ If
true
, use a zlib header. Otherwise, use a gzip header.
-
level : integer[
- json driver/n5/Compression/blosc : object¶
Specifies Blosc compression.
- Extends:¶
- Required members:¶
-
type :
"blosc"
¶
-
cname :
"blosclz"
|"lz4"
|"lz4hc"
|"snappy"
|"zlib"
|"zstd"
¶ Specifies the compression method used by Blosc.
-
clevel : integer[
0
,9
]¶ Specifies the Blosc compression level to use.
Higher values are slower but achieve a higher compression ratio.
-
type :
Example
{"type": "blosc", "cname": "blosclz", "clevel": 9, "shuffle": 2}
Mapping to TensorStore Schema¶
Example
For the following N5 metadata
:
{
"dimensions": [1000, 2000, 3000],
"blockSize": [100, 200, 300],
"dataType": "uint16",
"compression": {"type": "raw"}
}
the corresponding Schema
is:
{
"chunk_layout": {
"grid_origin": [0, 0, 0],
"inner_order": [2, 1, 0],
"read_chunk": {"shape": [100, 200, 300]},
"write_chunk": {"shape": [100, 200, 300]}
},
"codec": {"compression": {"type": "raw"}, "driver": "n5"},
"domain": {"exclusive_max": [[1000], [2000], [3000]], "inclusive_min": [0, 0, 0]},
"dtype": "uint16",
"rank": 3
}
Data type¶
N5 data types map to TensorStore data types of the same name:
Note that internally the N5 format always uses big endian encoding.
Domain¶
The shape
of the Schema.domain
corresponds to driver/n5.metadata.dimensions
.
Dimension labels may be specified in the
Schema.domain
, and correspond to
driver/n5.metadata.axes
.
The upper bounds of the domain are resizable (i.e. implicit).
As N5 does not natively support a non-zero origin, the underlying domain always
has a zero origin (IndexDomain.inclusive_min
is all zero), but it
may be translated by the transform
.
Example
For the following N5 metadata
:
{
"dimensions": [1000, 2000, 3000],
"blockSize": [100, 200, 300],
"dataType": "uint16",
"compression": {"type": "raw"}
}
the corresponding IndexDomain
is:
{"exclusive_max": [[1000], [2000], [3000]], "inclusive_min": [0, 0, 0]}
Chunk layout¶
The N5 format supports a single driver/n5.metadata.blockSize
property that corresponds to the ChunkLayout/Grid.shape
constraint.
Example
For the following N5 metadata
:
{
"dimensions": [1000, 2000, 3000],
"blockSize": [100, 200, 300],
"dataType": "uint16",
"compression": {"type": "raw"}
}
the corresponding ChunkLayout
is:
{
"grid_origin": [0, 0, 0],
"inner_order": [2, 1, 0],
"read_chunk": {"shape": [100, 200, 300]},
"write_chunk": {"shape": [100, 200, 300]}
}
The ChunkLayout.grid_origin
is always all-zero.
As the N5 format supports only a single level of chunking, the
ChunkLayout.read_chunk
and ChunkLayout.write_chunk
constraints are combined, and hard constraints on
ChunkLayout.codec_chunk
must not be specified.
The N5 format always stores the data within chunks in colexicographic order (i.e. Fortran order).
Selection of chunk layout when creating a new array¶
When creating a new array, the chunk shape may be constrained explicitly via
ChunkLayout/Grid.shape
or implicitly via
ChunkLayout/Grid.aspect_ratio
and
ChunkLayout/Grid.elements
. A suitable chunk
shape is chosen automatically based on these constraints. If ChunkLayout/Grid.elements
is
not specified, the default is 1 million elements per chunk:
Example of unconstrained chunk layout
>>> ts.open({
... 'driver': 'n5',
... 'kvstore': {
... 'driver': 'memory'
... }
... },
... create=True,
... dtype=ts.uint16,
... shape=[1000, 2000, 3000]).result().chunk_layout
ChunkLayout({
'grid_origin': [0, 0, 0],
'inner_order': [2, 1, 0],
'read_chunk': {'shape': [101, 101, 101]},
'write_chunk': {'shape': [101, 101, 101]},
})
Example of explicit chunk shape constraint
>>> ts.open({
... 'driver': 'n5',
... 'kvstore': {
... 'driver': 'memory'
... }
... },
... create=True,
... dtype=ts.uint16,
... shape=[1000, 2000, 3000],
... chunk_layout=ts.ChunkLayout(
... chunk_shape=[100, 200, 300])).result().chunk_layout
ChunkLayout({
'grid_origin': [0, 0, 0],
'inner_order': [2, 1, 0],
'read_chunk': {'shape': [100, 200, 300]},
'write_chunk': {'shape': [100, 200, 300]},
})
Example of chunk aspect ratio constraint
>>> ts.open({
... 'driver': 'n5',
... 'kvstore': {
... 'driver': 'memory'
... }
... },
... create=True,
... dtype=ts.uint16,
... shape=[1000, 2000, 3000],
... chunk_layout=ts.ChunkLayout(
... chunk_aspect_ratio=[1, 2, 2])).result().chunk_layout
ChunkLayout({
'grid_origin': [0, 0, 0],
'inner_order': [2, 1, 0],
'read_chunk': {'shape': [64, 128, 128]},
'write_chunk': {'shape': [64, 128, 128]},
})
Example of chunk aspect ratio and elements constraint
>>> ts.open({
... 'driver': 'n5',
... 'kvstore': {
... 'driver': 'memory'
... }
... },
... create=True,
... dtype=ts.uint16,
... shape=[1000, 2000, 3000],
... chunk_layout=ts.ChunkLayout(
... chunk_aspect_ratio=[1, 2, 2],
... chunk_elements=2000000)).result().chunk_layout
ChunkLayout({
'grid_origin': [0, 0, 0],
'inner_order': [2, 1, 0],
'read_chunk': {'shape': [79, 159, 159]},
'write_chunk': {'shape': [79, 159, 159]},
})
Codec¶
Within the Schema.codec
, the compression parameters are
represented in the same way as in the metadata
:
- json driver/n5/Codec : object¶
-
- Optional members:¶
- compression : driver/n5/Compression¶
Specifies the chunk compression method.
It is an error to specify any other Codec.driver
.
Fill value¶
The N5 metadata format does not support specifying a fill value. TensorStore
always assumes a fill value of 0
.
Dimension units¶
The Schema.dimension_units
correspond to the
units
and
resolution
metadata properties. The base
unit is used directly; it is not converted in any way.
The N5 format requires that dimension units are specified either for all
dimensions, or for no dimensions; it is not possible to specify dimension units
for some dimensions while leaving the dimension units of the remaining
dimensions unspecified. When creating a new dataset, if dimension units are
specified for at least one dimension, any dimensions for which the unit is
unspecified are assigned a dimensionless unit of 1
.
Limitations¶
Datasets with varlength chunks are not supported.