Index space

TensorStore defines an index space as an \(n\)-dimensional space, where \(n\) is the rank of the space and each index is an integer in the closed interval \([-(2^{62}-2), +(2^{62}-2)]\). The special values \(\pm (2^{62}-1)\) are not valid indices but represent bounds of \(\pm \infty\).

Note

The limits of \(\pm (2^{62}-2)\) permit \(\pm (2^{62}-1)\) to be reserved to represent \(\pm \infty\) and still allow the difference between any two bounds to be represented as a signed 64-bit integer.

TensorStore currently supports ranks \(n\) less than or equal to 32, which is the same constraint imposed by NumPy.

Each of the \(n\) dimensions may optionally be identified by a unique string label, such as "x", "y", or "z". Unlabeled dimensions are indicated by an empty string as the label.

Index domain

An index domain is a rectangular region of an index space.

  • The inclusive lower bound of each dimensions is specified as an integer in \([-(2^{62}-1), 2^{62}-2]\), where the special value of \(-2^{62}-1\) indicates the dimension is unbounded below.

  • The inclusive upper bound of each dimension is specified as an integer in \([-(2^{62}-2), 2^{62}-1]\), where the special value of \(2^{62}-1\) indicates the dimension is unbounded above.

  • The lower and upper bound of each dimension is marked either explicit or implicit (represented as two additional bits per dimension). Explicit bounds indicate hard constraints on the valid indices for that dimension. Implicit bounds may be used for resizable dimensions of arrays: the implicit bound indicates the bound as of a certain time, but the bound may change if a resizable operation is performed, and indexing operations are not constrained by implicit bounds. Implicit bounds of \(\pm \infty\) are also used to indicate unknown or unspecified bounds.

Note

Specifying infinite bounds \((-\infty, +\infty)\) for a dimension is similar to specifying the finite bounds \([-(2^{62}-2, +(2^{62}-2)]\), in that in both cases the domain contains the full range of possible indices. The difference is that translating a dimension with infinite bounds has no effect, while translating a dimension with bounds \([-(2^{62}-2, +(2^{62}-2)]\) by any non-zero offset results in an out-of-bounds error.

JSON Schema

Index domains may be serialized to/from JSON using the following schema.

If neither inclusive_min nor shape is specified, all dimensions receive an implicit lower bound of \(-\infty\). If shape is specified but inclusive_min is not specified, all dimensions receive an explicit lower bound of 0.

At most one of exclusive_max, inclusive_max, and shape may be specified. If none are specified, all dimensions receive an implicit upper bound of \(+\infty\).

object with members:

rank
integer [0, 32]
Number of dimensions.

The rank must be specified either directly, or implicitly by the number of dimensions specified for inclusive_min, inclusive_max, exclusive_max, shape, or labels.

inclusive_min
array of integer (-∞, ∞) | [integer (-∞, ∞)]
Inclusive lower bounds of the domain.

Length must equal the rank. Bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [1, [2]] specifies an explicit bound of \(1 \leq x\) for the first dimension and an implicit bound of \(2 \leq x\) for the second dimension.

exclusive_max
array of integer (-∞, ∞) | [integer (-∞, ∞)]
Exclusive upper bounds of the domain.

Length must equal the rank. As for inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [5, [7]] specifies an explicit bound of \(x < 5\) for the first dimension and an implicit bound of \(x < 7\) for the second dimension.

inclusive_max
array of integer (-∞, ∞) | [integer (-∞, ∞)]
Inclusive upper bounds of the domain.

Length must equal the rank. As for inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [5, [7]] specifies an explicit bound of \(x \leq 5\) for the first dimension and an implicit bound of \(x \leq 7\) for the second dimension.

shape
array of integer (-∞, ∞) | [integer (-∞, ∞)]
Extent of each dimension of the domain.

Length must equal the rank. As for inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, assuming an inclusive_min of [1, 2], an shape of [5, [7]] specifies an explicit bound of \(x < 6\) for the first dimension and an implicit bound of \(x < 9\) for the second dimension.

labels
array of string
Dimension labels for each dimension.

Length must equal the rank. An empty string indicates an unlabeled dimension. Non-empty strings must not occur more than once. By default, all dimensions are unlabeled.

Index transform

An index transform from rank \(m\) to rank \(n\) maps every \(m\)-dimensional index vector in a rank-\(m\) index domain to an \(n\)-dimensional index vector in the output space.

It is defined by its input index domain and \(n\) output index maps, one for each dimension \(j\) of the output space, each of one of the following three forms:

constant

\[\mathtt{output}[j] = \mathtt{offset},\]

where \(\mathtt{offset}\) is an arbitrary 64-bit integer.

single input dimension

\[\mathtt{output}[j] = \mathtt{offset} + \mathtt{stride} \cdot \mathtt{input}[\mathtt{input\_dimension}],\]

where \(\mathtt{offset}\) and \(\mathtt{stride}\) are arbitrary 64-bit integers and \(\mathtt{input\_dimension}\) is in the range \([0, m)\).

index array

\[\mathtt{output}[j] = \mathtt{offset} + \mathtt{stride} \cdot \mathtt{index\_array}[\mathtt{input}],\]

where \(\mathtt{offset}\) and \(\mathtt{stride}\) are arbitrary 64-bit integers and \(\mathtt{index\_array}\) is an \(n\)-dimensional array of 64-bit integers indexed by a subset of the dimensions of the input index domain with explicit lower and upper bounds, stored as a strided array in memory. (The dimensions by which it is indexed are indicated by non-zero strides.)

TensorStore uses this normalized index transform representation to represent any composition of indexing operations. This representation is complete, since any mapping can be represented as an index array, but in many cases just the more efficient constant and single input dimension maps suffice.

In most cases, two index transforms can be composed with low cost that is independent of the bounds specified in the domains: single input dimension maps may be composed with any other map without needing to copying any index arrays. Only when composing two index array maps is it necessary to write a new index array, and in some cases this can result in index array with a larger representation size.

JSON Schema

Index transforms may be serialized to/from JSON using the following schema.

The input domain is specified by input_rank, input_inclusive_min, input_exclusive_max, input_inclusive_max, input_shape, and input_labels, while the output index maps are specified by output.

If neither input_inclusive_min nor input_shape is specified, all dimensions receive an implicit lower bound of \(-\infty\). If input_shape is specified but input_inclusive_min is not specified, all dimensions receive an explicit lower bound of 0.

At most one of input_exclusive_max, input_inclusive_max, and input_shape may be specified. If none are specified, all dimensions receive an implicit upper bound of \(+\infty\).

object with members:

input_rank
integer [0, 32]
Number of input dimensions.

The input rank must be specified either directly, or implicitly by the number of dimensions specified for input_inclusive_min, input_inclusive_max, input_exclusive_max, input_shape, or input_labels.

input_inclusive_min
array of integer (-∞, ∞) | [integer (-∞, ∞)]
Inclusive lower bounds of the input domain.

Length must equal the input_rank. Bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [1, [2]] specifies an explicit bound of \(1 \leq x\) for the first dimension and an implicit bound of \(2 \leq x\) for the second dimension.

input_exclusive_max
array of integer (-∞, ∞) | [integer (-∞, ∞)]
Exclusive upper bounds of the input domain.

Length must equal the input_rank. As for input_inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [5, [7]] specifies an explicit bound of \(x < 5\) for the first dimension and an implicit bound of \(x < 7\) for the second dimension.

input_inclusive_max
array of integer (-∞, ∞) | [integer (-∞, ∞)]
Inclusive upper bounds of the input domain.

Length must equal the input_rank. As for input_inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, [5, [7]] specifies an explicit bound of \(x \leq 5\) for the first dimension and an implicit bound of \(x \leq 7\) for the second dimension.

input_shape
array of integer (-∞, ∞) | [integer (-∞, ∞)]
Extent of each dimension of the input domain.

Length must equal the input_rank. As for input_inclusive_min, bounds specified as n indicate normal, explicit bounds, while bounds specified as [n] indicate implicit bounds. For example, assuming an input_inclusive_min of [1, 2], an input_shape of [5, [7]] specifies an explicit bound of \(x < 6\) for the first dimension and an implicit bound of \(x < 9\) for the second dimension.

input_labels
array of string
Dimension labels for each input domain dimension.

Length must equal the input_rank. An empty string indicates an unlabeled dimension. Non-empty strings must not occur more than once. By default, all dimensions are unlabeled.

output Specifies the output index map for each output dimension.

If not specified, defaults to an identity transform over the input domain. Otherwise, the length determines the output rank of the transform.

array of object with members:

offset
integer (-∞, ∞)
Specifies an offset for this output dimension. If neither input_dimension nor index_array is specified, this specifies the constant value to which this output dimension maps.
stride
integer (-∞, ∞)
Multiplier for the input index specified by input_dimension or the index array value specified by index_array.

Only valid to specify in conjunction with input_dimension or index_array.

input_dimension
integer [0, ∞)
If present, indicates that this output dimension uses a single input dimension map with the specified input dimension. Must not be specified in conjunction with index_array.
index_array
array | integer (-∞, ∞)
If present, indicates that this output dimension uses an index array map, with the index array specified as a nested list of rank equal to input_rank.

If the input rank is 0, must be a numeric value.

index_array_bounds
Index interval (default is [“-inf”, “+inf”])

If present, specifies constraints on the values within index_array (which must also be specified). If index_array contains an out-of-bounds index, an error may not be returned immediately but will be returned if the corresponding position within the domain is accessed. If the indices in index_array have already been validated, this need not be specified. This allows transforms containing out-of-bounds index array indices to correctly round trip through JSON, but normally need not be specified manually.

Example
[5, 100]
Example
["-inf", 10]
Example
[-20, "+inf"]

Index interval JSON Schema

[integer (-∞, ∞) | “-inf”, integer (-∞, ∞) | “+inf”]

Specifies a closed interval of integer index values.

Alignment and broadcasting

Many operations in TensorStore involving two index domains, such as read, write, and copy operations, automatically align the source domain to the target domain.

The following alignment methods are supported (by default, all alignment methods are used):

permute

Source dimensions are permuted based on their labels in order to align the source domain to the target domain.

translate

Source dimensions are translated in order to align the source domain to the target.

broadcast

Source dimensions of size 1 do not have to match a target dimension, and not all target dimensions must match a source dimension.

Alignment is performed based on the following rules:

First, a subset of the source dimensions are matched to a subset of the target dimensions, according to one of two cases:

M1

At least one of source or target is entirely unlabeled (all dimension labels are empty). In this case, the last \(\mathtt{match\_rank} = \min(\mathtt{source\_rank}, \mathtt{target\_rank})\) dimensions of source match in order to the last \(\mathtt{match\_rank}\) dimensions of target, i.e. dimension \(\mathtt{source\_rank} - \mathtt{match\_rank} + i\) of source matches to dimension \(\mathtt{target\_rank} - \mathtt{match\_rank} + i\) of target, for \(0 \leq i < \mathtt{match\_rank}\). This case also applies if the permute alignment method is not permitted.

M2

Both source and target have at least one labeled dimension. In this case, dimensions of source and target with matching labels are matched. Any remaining labeled dimensions remain unmatched. The unlabeled dimensions of source are matched to the unlabeled dimensions of target using the same method as in case M1 (right to left).

The matching is then validated as follows:

V1

For each match between a dimension \(i\) of source and a dimension \(j\) of target, if \(\mathtt{source\_shape}[i] \neq \mathtt{target\_shape}[j]\), the match is dropped. Note that if \(\mathtt{source\_shape}[i] \neq 1\), this leads to an error in step V3.

V2

If the broadcast alignment method is not permitted, it is an error for any source or target dimension to be unmatched. (In this case, any matches dropped in step V1 result in an error.)

V3

For every unmatched dimension \(i\) of source, \(\mathtt{source\_shape}[i]\) must equal \(1\).

V4

If the translate alignment method is not permitted, for each match between a dimension \(i\) of source and a dimension \(j\) of target, it is an error if \(\mathtt{source\_origin}[i] \neq \mathtt{target\_origin}[j]\).

If matching succeeds, a new alignment transform with an (input) domain equal to target and an output rank equal to \(\mathtt{source\_rank}\) is computed as follows:

A1

For each dimension \(j\) of target with a matching dimension \(i\) of source, output dimension \(i\) of alignment has a single_input_dimension map to input dimension \(j\) with a stride of \(1\) and offset of \(\mathtt{source\_origin}[i] - \mathtt{target\_origin}[j]\).

A2

For every unmatched dimension \(i\) of source, output dimension \(i\) of alignment is a constant map with an offset of \(\mathtt{source\_origin}[i]\). (It must be the case that \(\mathtt{source\_shape}[i] = 1\).)

The alignment transform maps target positions to corresponding source positions; for example, when copying, each position of the target domain is assigned the value at the corresponding position of the source domain. If the broadcast alignment method is used, the transform may map the same source position to multiple target positions.

Examples:

  • All unlabeled dimensions

    • source: [3, 7), [5, 6), [4, 10)

    • target: [2, 6), [0, 4), [6, 12)

    • alignment: rank \(3 \rightarrow 3\), with:

      \[\begin{split}\mathrm{source}[0] &= \mathrm{target}[0] + 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[2] - 2\end{split}\]
  • All labeled dimensions

    • source: "x": [3, 7), "y": [5, 6), "z": [4, 10)

    • target: "z": [6, 12), "x": [4, 8), "y": [0, 4)

    • alignment: rank \(3 \rightarrow 3\), with:

      \[\begin{split}\mathrm{source}[0] &= \mathrm{target}[1] - 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[0] - 2\end{split}\]
  • Partially labeled dimensions

    • source: "x": [3, 7), "y": [5, 6), "": [4, 10)

    • target: "": [0, 10) "": [6, 12), "x": [4, 8), "y": [0, 4)

    • alignment: rank \(4 \rightarrow 3\), with:

      \[\begin{split}\mathrm{source}[0] &= \mathrm{target}[2] - 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[1] - 2\end{split}\]
  • Mismatched labeled dimensions

    • source: "x": [3, 7), "y": [5, 6), "z": [4, 10)

    • target: "z": [6, 12), "w": [4, 8), "y": [0, 4)

    • ERROR: Unmatched source dimension 0 {"x": [3, 7)} does not have a size of 1

Note

The alignment behavior supported by TensorStore is fully compatible with NumPy broadcasting but additionally is extended to support non-zero origins and labeled dimensions.