Index space

TensorStore defines an index space as an \(n\)-dimensional space, where \(n\) is the rank of the space and each index is an integer in the closed interval \([-(2^{62}-2), +(2^{62}-2)]\). The special values \(\pm (2^{62}-1)\) are not valid indices but represent bounds of \(\pm \infty\).

Note

The limits of \(\pm (2^{62}-2)\) permit \(\pm (2^{62}-1)\) to be reserved to represent \(\pm \infty\) and still allow the difference between any two bounds to be represented as a signed 64-bit integer.

TensorStore currently supports ranks \(n\) less than or equal to 32, which is the same constraint imposed by NumPy.

Each of the \(n\) dimensions may optionally be identified by a unique string label, such as "x", "y", or "z". Unlabeled dimensions are indicated by an empty string as the label.

Index domain

An index domain is a rectangular region of an index space.

  • The inclusive lower bound of each dimensions is specified as an integer in \([-(2^{62}-1), 2^{62}-2]\), where the special value of \(-2^{62}-1\) indicates the dimension is unbounded below.

  • The inclusive upper bound of each dimension is specified as an integer in \([-(2^{62}-2), 2^{62}-1]\), where the special value of \(2^{62}-1\) indicates the dimension is unbounded above.

  • The lower and upper bound of each dimension is marked either explicit or implicit (represented as two additional bits per dimension). Explicit bounds indicate hard constraints on the valid indices for that dimension. Implicit bounds may be used for resizable dimensions of arrays: the implicit bound indicates the bound as of a certain time, but the bound may change if a resizable operation is performed, and indexing operations are not constrained by implicit bounds. Implicit bounds of \(\pm \infty\) are also used to indicate unknown or unspecified bounds.

Note

Specifying infinite bounds \((-\infty, +\infty)\) for a dimension is similar to specifying the finite bounds \([-(2^{62}-2, +(2^{62}-2)]\), in that in both cases the domain contains the full range of possible indices. The difference is that translating a dimension with infinite bounds has no effect, while translating a dimension with bounds \([-(2^{62}-2, +(2^{62}-2)]\) by any non-zero offset results in an out-of-bounds error.

json IndexDomain : object

Index domains may be serialized to/from JSON using the following schema.

If neither inclusive_min nor shape is specified, all dimensions receive an implicit lower bound of \(-\infty\). If shape is specified but inclusive_min is not specified, all dimensions receive an explicit lower bound of 0.

At most one of exclusive_max, inclusive_max, and shape may be specified. If none are specified, all dimensions receive an implicit upper bound of \(+\infty\).

Optional members:
rank : integer[0, 32]

Number of dimensions.

The rank must be specified either directly, or implicitly by the number of dimensions specified for inclusive_min, inclusive_max, exclusive_max, shape, or labels.

inclusive_min : array of integer | [integer]

Inclusive lower bounds of the domain.

Length must equal the rank. Bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [1, [2]] specifies an explicit bound of \(1 \leq x\) for the first dimension and an implicit bound of \(2 \leq x\) for the second dimension.

exclusive_max : array of integer | [integer]

Exclusive upper bounds of the domain.

Length must equal the rank. As for inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [5, [7]] specifies an explicit bound of \(x < 5\) for the first dimension and an implicit bound of \(x < 7\) for the second dimension.

inclusive_max : array of integer | [integer]

Inclusive upper bounds of the domain.

Length must equal the rank. As for inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [5, [7]] specifies an explicit bound of \(x \leq 5\) for the first dimension and an implicit bound of \(x \leq 7\) for the second dimension.

shape : array of integer | [integer]

Extent of each dimension of the domain.

Length must equal the rank. As for inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, assuming an inclusive_min of [1, 2], an shape of [5, [7]] specifies an explicit bound of \(x < 6\) for the first dimension and an implicit bound of \(x < 9\) for the second dimension.

labels : array of string

Dimension labels for each dimension.

Length must equal the rank. An empty string indicates an unlabeled dimension. Non-empty strings must not occur more than once. By default, all dimensions are unlabeled.

Index transform

An index transform from rank \(m\) to rank \(n\) maps every \(m\)-dimensional index vector in a rank-\(m\) index domain to an \(n\)-dimensional index vector in the output space.

It is defined by its input index domain and \(n\) output index maps, one for each dimension \(j\) of the output space, each of one of the following three forms:

Output index methods

constant

\[\mathtt{output}[j] = \mathtt{offset},\]

where \(\mathtt{offset}\) is an arbitrary 64-bit integer.

single input dimension

\[\mathtt{output}[j] = \mathtt{offset} + \mathtt{stride} \cdot \mathtt{input}[\mathtt{input\_dimension}],\]

where \(\mathtt{offset}\) and \(\mathtt{stride}\) are arbitrary 64-bit integers and \(\mathtt{input\_dimension}\) is in the range \([0, m)\).

index array

\[\mathtt{output}[j] = \mathtt{offset} + \mathtt{stride} \cdot \mathtt{index\_array}[\mathtt{input}],\]

where \(\mathtt{offset}\) and \(\mathtt{stride}\) are arbitrary 64-bit integers and \(\mathtt{index\_array}\) is an \(n\)-dimensional array of 64-bit integers indexed by a subset of the dimensions of the input index domain with explicit lower and upper bounds, stored as a strided array in memory. (The dimensions by which it is indexed are indicated by non-zero strides.)

TensorStore uses this normalized index transform representation to represent any composition of indexing operations. This representation is complete, since any mapping can be represented as an index array, but in many cases just the more efficient constant and single input dimension maps suffice.

In most cases, two index transforms can be composed with low cost that is independent of the bounds specified in the domains: single input dimension maps may be composed with any other map without needing to copying any index arrays. Only when composing two index array maps is it necessary to write a new index array, and in some cases this can result in index array with a larger representation size.

The following table describes the relationship between the input and output spaces of a transform, and the source and target arrays/TensorStores for read and write operations:

Operation

Input domain

Output range

Read

Corresponds to the target array.

Indicates the positions of the underlying source TensorStore that are accessed.

Write

Corresponds to the source array.

Indicates the positions of the underlying target TensorStore that are modified.

json IndexTransform : object

Index transforms may be serialized to/from JSON using the following schema.

The input domain is specified by input_rank, input_inclusive_min, input_exclusive_max, input_inclusive_max, input_shape, and input_labels, while the output index maps are specified by output.

If neither input_inclusive_min nor input_shape is specified, all dimensions receive an implicit lower bound of \(-\infty\). If input_shape is specified but input_inclusive_min is not specified, all dimensions receive an explicit lower bound of 0.

At most one of input_exclusive_max, input_inclusive_max, and input_shape may be specified. If none are specified, all dimensions receive an implicit upper bound of \(+\infty\).

Optional members:
input_rank : integer[0, 32]

Number of input dimensions.

The input rank must be specified either directly, or implicitly by the number of dimensions specified for input_inclusive_min, input_inclusive_max, input_exclusive_max, input_shape, or input_labels.

input_inclusive_min : array of integer | [integer]

Inclusive lower bounds of the input domain.

Length must equal the input_rank. Bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [1, [2]] specifies an explicit bound of \(1 \leq x\) for the first dimension and an implicit bound of \(2 \leq x\) for the second dimension.

input_exclusive_max : array of integer | [integer]

Exclusive upper bounds of the input domain.

Length must equal the input_rank. As for input_inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [5, [7]] specifies an explicit bound of \(x < 5\) for the first dimension and an implicit bound of \(x < 7\) for the second dimension.

input_inclusive_max : array of integer | [integer]

Inclusive upper bounds of the input domain.

Length must equal the input_rank. As for input_inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [5, [7]] specifies an explicit bound of \(x \leq 5\) for the first dimension and an implicit bound of \(x \leq 7\) for the second dimension.

input_shape : array of integer | [integer]

Extent of each dimension of the input domain.

Length must equal the input_rank. As for input_inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, assuming an input_inclusive_min of [1, 2], an input_shape of [5, [7]] specifies an explicit bound of \(x < 6\) for the first dimension and an implicit bound of \(x < 9\) for the second dimension.

input_labels : array of string

Dimension labels for each input domain dimension.

Length must equal the input_rank. An empty string indicates an unlabeled dimension. Non-empty strings must not occur more than once. By default, all dimensions are unlabeled.

output : array of OutputIndexMap

Specifies the output index map for each output dimension.

If not specified, defaults to an identity transform over the input domain. Otherwise, the length determines the output rank of the transform.

json OutputIndexMap : object

Specifies a transform from an input space to a single output index.

Logically, an output index map is a function that maps n-dimensional input index vectors to a single output index, using one of the supported output index methods. This is used to represent the IndexTransform.output mapping for each output dimension of an IndexTransform.

Optional members:
offset : integer

Specifies an offset for this output dimension. If neither input_dimension nor index_array is specified, this specifies the constant value to which this output dimension maps.

stride : integer

Multiplier for the input index specified by input_dimension or the index array value specified by index_array.

Only valid to specify in conjunction with input_dimension or index_array.

input_dimension : integer[0, +∞)

If present, indicates that this output dimension uses a single input dimension map with the specified input dimension. Must not be specified in conjunction with index_array.

index_array : array | integer

If present, indicates that this output dimension uses an index array map, with the index array specified as a nested list of rank equal to the input_rank.

If the input_rank is 0, must be a numeric value.

index_array_bounds : IndexInterval = ["-inf", "+inf"]

If present, specifies constraints on the values within index_array (which must also be specified). If index_array contains an out-of-bounds index, an error may not be returned immediately but will be returned if the corresponding position within the domain is accessed. If the indices in index_array have already been validated, this need not be specified. This allows transforms containing out-of-bounds index array indices to correctly round trip through JSON, but normally need not be specified manually.

Example

[5, 100]

Example

["-inf", 10]

Example

[-20, "+inf"]
json IndexInterval : [integer | "-inf", integer | "+inf"]

Specifies a closed interval of integer index values.

Alignment and broadcasting

Many operations in TensorStore involving two index domains, such as read, write, and copy operations, automatically align the source domain to the target domain.

The following alignment methods are supported (by default, all alignment methods are used):

permute

Source dimensions are permuted based on their labels in order to align the source domain to the target domain.

translate

Source dimensions are translated in order to align the source domain to the target.

broadcast

Source dimensions of size 1 do not have to match a target dimension, and not all target dimensions must match a source dimension.

Alignment is performed based on the following rules:

First, a subset of the source dimensions are matched to a subset of the target dimensions, according to one of two cases:

M1

At least one of source or target is entirely unlabeled (all dimension labels are empty). In this case, the last \(\mathtt{match\_rank} = \min(\mathtt{source\_rank}, \mathtt{target\_rank})\) dimensions of source match in order to the last \(\mathtt{match\_rank}\) dimensions of target, i.e. dimension \(\mathtt{source\_rank} - \mathtt{match\_rank} + i\) of source matches to dimension \(\mathtt{target\_rank} - \mathtt{match\_rank} + i\) of target, for \(0 \leq i < \mathtt{match\_rank}\). This case also applies if the permute alignment method is not permitted.

M2

Both source and target have at least one labeled dimension. In this case, dimensions of source and target with matching labels are matched. Any remaining labeled dimensions remain unmatched. The unlabeled dimensions of source are matched to the unlabeled dimensions of target using the same method as in case M1 (right to left).

The matching is then validated as follows:

V1

For each match between a dimension \(i\) of source and a dimension \(j\) of target, if \(\mathtt{source\_shape}[i] \neq \mathtt{target\_shape}[j]\), the match is dropped. Note that if \(\mathtt{source\_shape}[i] \neq 1\), this leads to an error in step V3.

V2

If the broadcast alignment method is not permitted, it is an error for any source or target dimension to be unmatched. (In this case, any matches dropped in step V1 result in an error.)

V3

For every unmatched dimension \(i\) of source, \(\mathtt{source\_shape}[i]\) must equal \(1\).

V4

If the translate alignment method is not permitted, for each match between a dimension \(i\) of source and a dimension \(j\) of target, it is an error if \(\mathtt{source\_origin}[i] \neq \mathtt{target\_origin}[j]\).

If matching succeeds, a new alignment transform with an (input) domain equal to target and an output rank equal to \(\mathtt{source\_rank}\) is computed as follows:

A1

For each dimension \(j\) of target with a matching dimension \(i\) of source, output dimension \(i\) of alignment has a single_input_dimension map to input dimension \(j\) with a stride of \(1\) and offset of \(\mathtt{source\_origin}[i] - \mathtt{target\_origin}[j]\).

A2

For every unmatched dimension \(i\) of source, output dimension \(i\) of alignment is a constant map with an offset of \(\mathtt{source\_origin}[i]\). (It must be the case that \(\mathtt{source\_shape}[i] = 1\).)

The alignment transform maps target positions to corresponding source positions; for example, when copying, each position of the target domain is assigned the value at the corresponding position of the source domain. If the broadcast alignment method is used, the transform may map the same source position to multiple target positions.

Examples:

  • All unlabeled dimensions

    • source: [3, 7), [5, 6), [4, 10)

    • target: [2, 6), [0, 4), [6, 12)

    • alignment: rank \(3 \rightarrow 3\), with:

      \[\begin{split}\mathrm{source}[0] &= \mathrm{target}[0] + 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[2] - 2\end{split}\]
  • All labeled dimensions

    • source: "x": [3, 7), "y": [5, 6), "z": [4, 10)

    • target: "z": [6, 12), "x": [4, 8), "y": [0, 4)

    • alignment: rank \(3 \rightarrow 3\), with:

      \[\begin{split}\mathrm{source}[0] &= \mathrm{target}[1] - 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[0] - 2\end{split}\]
  • Partially labeled dimensions

    • source: "x": [3, 7), "y": [5, 6), "": [4, 10)

    • target: "": [0, 10) "": [6, 12), "x": [4, 8), "y": [0, 4)

    • alignment: rank \(4 \rightarrow 3\), with:

      \[\begin{split}\mathrm{source}[0] &= \mathrm{target}[2] - 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[1] - 2\end{split}\]
  • Mismatched labeled dimensions

    • source: "x": [3, 7), "y": [5, 6), "z": [4, 10)

    • target: "z": [6, 12), "w": [4, 8), "y": [0, 4)

    • ERROR: Unmatched source dimension 0 {"x": [3, 7)} does not have a size of 1

Note

The alignment behavior supported by TensorStore is fully compatible with NumPy broadcasting but additionally is extended to support non-zero origins and labeled dimensions.