Schema

The schema of a TensorStore specifies key properties of the format in a uniform way that is independent of where and how the data is actually stored. When creating a TensorStore, schema constraints and preferences may be specified; the driver combines these constraints with any driver-specific constraints/defaults to choose a suitable schema automatically. When opening an existing TensorStore, its schema is validated against any constraints that are specified.

json Schema : object
Optional members:
rank : integer[0, 32]

Number of dimensions.

The rank is always a hard constraint.

dtype : dtype

Specifies the data type of the TensorStore.

The data type is always a hard constraint.

domain : IndexDomain

Domain of the TensorStore, including bounds and optional dimension labels.

The domain is always a hard constraint, except that a labeled dimension is allowed to match an unlabeled dimension, and an implicit, infinite bound is considered an unspecified bound and does not impose any constraints. When merging two schema constraint objects that both specify domains, any dimensions that are labeled in both domains must have the same label, and any explicit or finite bounds specified in both domains must be equal. If a dimension is labeled in one domain and unlabeled in the other, the label is retained. If a bound is implicit and infinite in one domain, the bound from the other domain is used.

chunk_layout : ChunkLayout

Data storage layout constraints.

The rank of the chunk layout must match the rank of the schema. When merging schema constraints objects, the chunk layout constraints are merged recursively.

codec : Codec

Driver-specific compression and other parameters for encoding/decoding data. When merging schema constraints objects, the codec constraints are merged recursively.

fill_value

Fill value to use for missing data.

Must be broadcast-compatible with the domain.

dimension_units : array of Unit | null

Physical units of each dimension.

Specifies the physical quantity corresponding to an increment of 1 index along each dimension, i.e. the resolution. The length must match the rank of the schema. Specifying null for a dimension indicates that the unit is unknown.

Example class: example

["4nm", "4nm", null] specifies that the voxel size is 4nm along the first two dimensions, and unknown along the third dimension.

Note

null is not equivalent to specifying "" (or equivalently, [1, ""]), which indicates a dimensionless unit of 1.

Chunk layout

For chunked storage formats, the data storage layout can be represented in a driver-independent way as a chunk layout.

A chunk layout specifies a hierarchical regular grid with up to three levels:

  • The write level, the top-most level, specifies the grid to which writes should be aligned. Writes of individual chunkss at this level may be performed without amplification. For the zarr Driver, n5 Driver and the neuroglancer_precomputed Driver using the unsharded format, the write level is also the only level; each write chunk corresponds to a single key in the underlying Key-Value Storage Layer. For the neuroglancer_precomputed Driver using the sharded format, each write chunk corresponds to an entire shard.

  • The read level evenly subdivides write chunks by an additional regular grid. Reads of individual chunks at this level may be performed without amplification. Every write chunk boundary must be aligned to a read chunk boundary. If reads and writes may be performed at the same granularity, such as with the zarr Driver, n5 Driver, and the neuroglancer_precomputed Driver using the unsharded format, there is no additional read grid; a read chunk is the same size as a write chunk. For the neuroglancer_precomputed Driver using the sharded format, each read chunk corresponds to a base chunk as defined by the format.

  • The codec level further subdivides the read level into codec chunks. For formats that make use of it, the codec chunk shape may affect the compression rate. For the neuroglancer_precomputed Driver when using the compressed segmentation encoding, the codec chunk shape specifies the compressed segmentation block shape. The codec block shape does not necessarily evenly subdivide the read chunk shape. (The precise offset of the codec chunk grid relative to the read chunk grid is not specified by the chunk layout.)

When creating a new TensorStore, constraints on the data storage layout can be specified without specifying the precise layout explicitly.

json ChunkLayout : object
Optional members:
rank : integer[0, 32]

Number of dimensions.

The rank is always a hard constraint. It is redundant to specify the rank if any other field that implicitly specifies the rank is included.

grid_origin : array of integer | null

Specifies hard constraints on the origin of the chunk grid.

The length must equal the rank of the index space. Each element constrains the grid origin for the corresponding dimension. A value of null (or, equivalently, -9223372036854775808) indicates no constraint.

grid_origin_soft_constraint : array of integer | null

Specifies preferred values for the origin of the chunk grid rather than hard constraints.

If a non-null value is specified for a given dimension in both grid_origin_soft_constraint and grid_origin, the value in grid_origin takes precedence.

inner_order : array of integer

Permutation specifying the element storage order within the innermost chunks.

This must be a permutation of [0, 1, ..., rank-1]. Lexicographic order (i.e. C order/row-major order) is specified as [0, 1, ..., rank-1], while colexicographic order (i.e. Fortran order/column-major order) is specified as [rank-1, ..., 1, 0].

inner_order_soft_constraint : array of integer

Specifies a preferred value for inner_order rather than a hard constraint. If inner_order is also specified, it takes precedence.

write_chunk : ChunkLayout/Grid

Constraints on the chunk grid over which writes may be efficiently partitioned.

read_chunk : ChunkLayout/Grid

Constraints on the chunk grid over which reads may be efficiently partitioned.

codec_chunk : ChunkLayout/Grid

Constraints on the chunk grid used by the codec, if applicable.

chunk : ChunkLayout/Grid

Combined constraints on write/read/codec chunks.

If aspect_ratio is specified, it applies to write_chunk, read_chunk, and codec_chunk. If aspect_ratio_soft_constraint is specified, it also applies to write_chunk, read_chunk, and codec_chunk, but with lower precedence than any write/read/codec-specific value that is also specified.

If shape or elements is specified, it applies to write_chunk and read_chunk (but not codec_chunk). If shape_soft_constraint or elements_soft_constraint is specified, it also applies to write_chunk and read_chunk, but with lower precedence than any write/read-specific value that is also specified.

json ChunkLayout/Grid : object

Constraints on the write/read/codec chunk grids.

When creating a new TensorStore, the chunk shape can be specified directly using the shape and shape_soft_constraint members, or indirectly by specifying the aspect_ratio and target number of elements.

When opening an existing TensorStore, the preferences indicated by shape_soft_constraint, aspect_ratio, aspect_ratio_soft_constraint, elements, and elements_soft_constraint are ignored; only shape serves as a constraint.

Optional members:
shape : array of integer[0, +∞) | -1 | null

Hard constraints on the chunk size for each dimension.

The length must equal the rank of the index space. Each element constrains the chunk size for the corresponding dimension, and must be a non-negative integer. The special value of 0 (or, equivalently, null)for a given dimension indicates no constraint. The special value of -1 for a given dimension indicates that the chunk size should equal the full extent of the domain, and is always treated as a soft constraint.

shape_soft_constraint : array of integer[0, +∞) | -1 | null

Preferred chunk sizes for each dimension.

If a non-zero, non-null size for a given dimension is specified in both shape and shape_soft_constraint, shape takes precedence.

aspect_ratio : array of number[0, +∞) | null

Aspect ratio of the chunk shape.

Specifies the relative chunk size along each dimension. The special value of 0 (or, equivalently, null) indicates no preference (which results in the default aspect ratio of 1 if not otherwise specified). The aspect ratio preference is only taken into account if the chunk size along a given dimension is not specified by shape or shape_soft_constraint, or otherwise constrained. For example, an aspect_ratio of [1, 1.5, 1.5] indicates that the chunk size along dimensions 1 and 2 should be 1.5 times the chunk size along dimension 0. If the target number of elements is 486000, then the resultant chunk size will be [60, 90, 90] (assuming it is not otherwise constrained).

aspect_ratio_soft_constraint : array of number[0, +∞) | null

Soft constraint on aspect ratio, lower precedence than aspect_ratio.

elements : integer[1, +∞) | null

Preferred number of elements per chunk.

Used in conjunction with aspect_ratio to determine the chunk size for dimensions that are not otherwise constrained. The special value of null indicates no preference, in which case a driver-specific default may be used.

elements_soft_constraint : integer[1, +∞) | null

Preferred number of elements per chunk, lower precedence than elements.

Codec

json Codec : object

Codecs are specified by a required driver property that identifies the driver. All other properties are driver-specific. Refer to the driver documentation for the supported codec drivers and the driver-specific properties.

Subtypes:
Required members:
driver : string

Driver identifier

Specifies the TensorStore driver to which this codec is applicable.

Example

{
  "driver": "zarr",
  "compressor": {"id": "blosc", "cname": "lz4", "clevel": null, "5": null, "shuffle": 1},
  "filters": null
}

Example

{
  "driver": "n5",
  "compression": {"type": "gzip", "level": "6", "useZlib": false}
}

Dimension units

json Unit : [number, string] | string | number

Specifies a physical quantity/unit.

The quantity is specified as the combination of:

  • A numerical multiplier, represented as a double-precision floating-point number. A multiplier of 1 may be used to indicate a quanity equal to a single base unit.

  • A base_unit, represented as a string. An empty string may be used to indicate a dimensionless quantity. In general, TensorStore does not interpret the base unit string; some drivers impose additional constraints on the base unit, while other drivers may store the specified unit directly. It is recommended to follow the udunits2 syntax unless there is a specific need to deviate.

Three JSON formats are supported:

  • The canonical format, as a two-element [multiplier, base_unit] array. This format is always used by TensorStore when returning the JSON representation of a unit.

  • A single string. If the string contains a leading number, it is parsed as the multiplier and the remaining portion, after stripping leading and trailing whitespace, is used as the base_unit. If there is no leading number, the multiplier is 1 and the entire string, after stripping leading and trailing whitespace, is used as the base_unit.

  • A single number, to indicate a dimension-less unit with the specified multiplier.

Example class: example

  • "4.5e-9m", "4.5e-9 m", and [4.5e-9, "m"] are all equivalent.

  • "1nm", "nm", and [1, "nm"] are all equivalent.

  • 5, "5", and [5, ""] are all equivalent.