Index space¶
TensorStore defines an index space as an \(n\)-dimensional space, where \(n\) is the rank of the space and each index is an integer in the closed interval \([-(2^{62}-2), +(2^{62}-2)]\). The special values \(\pm (2^{62}-1)\) are not valid indices but represent bounds of \(\pm \infty\).
Note
The limits of \(\pm (2^{62}-2)\) permit \(\pm (2^{62}-1)\) to be reserved to represent \(\pm \infty\) and still allow the difference between any two bounds to be represented as a signed 64-bit integer.
TensorStore currently supports ranks \(n\) less than or equal to 32, which is the same constraint imposed by NumPy.
Each of the \(n\) dimensions may optionally be identified by a unique
string label, such as "x"
, "y"
, or "z"
. Unlabeled
dimensions are indicated by an empty string as the label.
Index domain¶
An index domain is a rectangular region of an index space.
The inclusive lower bound of each dimensions is specified as an integer in \([-(2^{62}-1), 2^{62}-2]\), where the special value of \(-2^{62}-1\) indicates the dimension is unbounded below.
The inclusive upper bound of each dimension is specified as an integer in \([-(2^{62}-2), 2^{62}-1]\), where the special value of \(2^{62}-1\) indicates the dimension is unbounded above.
The lower and upper bound of each dimension is marked either explicit or implicit (represented as two additional bits per dimension). Explicit bounds indicate hard constraints on the valid indices for that dimension. Implicit bounds may be used for resizable dimensions of arrays: the implicit bound indicates the bound as of a certain time, but the bound may change if a resizable operation is performed, and indexing operations are not constrained by implicit bounds. Implicit bounds of \(\pm \infty\) are also used to indicate unknown or unspecified bounds.
Note
Specifying infinite bounds \((-\infty, +\infty)\) for a dimension is similar to specifying the finite bounds \([-(2^{62}-2, +(2^{62}-2)]\), in that in both cases the domain contains the full range of possible indices. The difference is that translating a dimension with infinite bounds has no effect, while translating a dimension with bounds \([-(2^{62}-2, +(2^{62}-2)]\) by any non-zero offset results in an out-of-bounds error.
- json IndexDomain : object¶
Index domains may be serialized to/from JSON using the following schema.
If neither
inclusive_min
norshape
is specified, all dimensions receive an implicit lower bound of \(-\infty\). Ifshape
is specified butinclusive_min
is not specified, all dimensions receive an explicit lower bound of 0.At most one of
exclusive_max
,inclusive_max
, andshape
may be specified. If none are specified, all dimensions receive an implicit upper bound of \(+\infty\).- Optional members:¶
-
rank : integer[
0
,32
]¶ Number of dimensions.
The rank must be specified either directly, or implicitly by the number of dimensions specified for
inclusive_min
,inclusive_max
,exclusive_max
,shape
, orlabels
.
- inclusive_min : array of integer | [integer]¶
Inclusive lower bounds of the domain.
Length must equal the
rank
. Bounds specified asn
indicate normal, explicit bounds, while bounds specified as[n]
indicate implicit bounds. For example,[1, [2]]
specifies an explicit bound of \(1 \leq x\) for the first dimension and an implicit bound of \(2 \leq x\) for the second dimension.
- exclusive_max : array of integer | [integer]¶
Exclusive upper bounds of the domain.
Length must equal the
rank
. As forinclusive_min
, bounds specified asn
indicate normal, explicit bounds, while bounds specified as[n]
indicate implicit bounds. For example,[5, [7]]
specifies an explicit bound of \(x < 5\) for the first dimension and an implicit bound of \(x < 7\) for the second dimension.
- inclusive_max : array of integer | [integer]¶
Inclusive upper bounds of the domain.
Length must equal the
rank
. As forinclusive_min
, bounds specified asn
indicate normal, explicit bounds, while bounds specified as[n]
indicate implicit bounds. For example,[5, [7]]
specifies an explicit bound of \(x \leq 5\) for the first dimension and an implicit bound of \(x \leq 7\) for the second dimension.
- shape : array of integer | [integer]¶
Extent of each dimension of the domain.
Length must equal the
rank
. As forinclusive_min
, bounds specified asn
indicate normal, explicit bounds, while bounds specified as[n]
indicate implicit bounds. For example, assuming aninclusive_min
of[1, 2]
, anshape
of[5, [7]]
specifies an explicit bound of \(x < 6\) for the first dimension and an implicit bound of \(x < 9\) for the second dimension.
- labels : array of string¶
Dimension labels for each dimension.
Length must equal the
rank
. An empty string indicates an unlabeled dimension. Non-empty strings must not occur more than once. By default, all dimensions are unlabeled.
-
rank : integer[
Index transform¶
An index transform from rank \(m\) to rank \(n\) maps every \(m\)-dimensional index vector in a rank-\(m\) index domain to an \(n\)-dimensional index vector in the output space.
It is defined by its input index domain and \(n\) output index maps, one for each dimension \(j\) of the output space, each of one of the following three forms:
constant |
\[\mathtt{output}[j] = \mathtt{offset},\]
where \(\mathtt{offset}\) is an arbitrary 64-bit integer. |
single input dimension |
\[\mathtt{output}[j] = \mathtt{offset} + \mathtt{stride} \cdot \mathtt{input}[\mathtt{input\_dimension}],\]
where \(\mathtt{offset}\) and \(\mathtt{stride}\) are arbitrary 64-bit integers and \(\mathtt{input\_dimension}\) is in the range \([0, m)\). |
index array |
\[\mathtt{output}[j] = \mathtt{offset} + \mathtt{stride} \cdot \mathtt{index\_array}[\mathtt{input}],\]
where \(\mathtt{offset}\) and \(\mathtt{stride}\) are arbitrary 64-bit integers and \(\mathtt{index\_array}\) is an \(n\)-dimensional array of 64-bit integers indexed by a subset of the dimensions of the input index domain with explicit lower and upper bounds, stored as a strided array in memory. (The dimensions by which it is indexed are indicated by non-zero strides.) |
TensorStore uses this normalized index transform representation to represent any composition of indexing operations. This representation is complete, since any mapping can be represented as an index array, but in many cases just the more efficient constant and single input dimension maps suffice.
In most cases, two index transforms can be composed with low cost that is independent of the bounds specified in the domains: single input dimension maps may be composed with any other map without needing to copying any index arrays. Only when composing two index array maps is it necessary to write a new index array, and in some cases this can result in index array with a larger representation size.
The following table describes the relationship between the input and output spaces of a transform, and the source and target arrays/TensorStores for read and write operations:
Operation |
Input domain |
Output range |
---|---|---|
Read |
Corresponds to the target array. |
Indicates the positions of the underlying source TensorStore that are accessed. |
Write |
Corresponds to the source array. |
Indicates the positions of the underlying target TensorStore that are modified. |
- json IndexTransform : object¶
Index transforms may be serialized to/from JSON using the following schema.
The input domain is specified by
input_rank
,input_inclusive_min
,input_exclusive_max
,input_inclusive_max
,input_shape
, andinput_labels
, while the output index maps are specified byoutput
.If neither
input_inclusive_min
norinput_shape
is specified, all dimensions receive an implicit lower bound of \(-\infty\). Ifinput_shape
is specified butinput_inclusive_min
is not specified, all dimensions receive an explicit lower bound of 0.At most one of
input_exclusive_max
,input_inclusive_max
, andinput_shape
may be specified. If none are specified, all dimensions receive an implicit upper bound of \(+\infty\).- Optional members:¶
-
input_rank : integer[
0
,32
]¶ Number of input dimensions.
The input rank must be specified either directly, or implicitly by the number of dimensions specified for
input_inclusive_min
,input_inclusive_max
,input_exclusive_max
,input_shape
, orinput_labels
.
- input_inclusive_min : array of integer | [integer]¶
Inclusive lower bounds of the input domain.
Length must equal the
input_rank
. Bounds specified asn
indicate normal, explicit bounds, while bounds specified as[n]
indicate implicit bounds. For example,[1, [2]]
specifies an explicit bound of \(1 \leq x\) for the first dimension and an implicit bound of \(2 \leq x\) for the second dimension.
- input_exclusive_max : array of integer | [integer]¶
Exclusive upper bounds of the input domain.
Length must equal the
input_rank
. As forinput_inclusive_min
, bounds specified asn
indicate normal, explicit bounds, while bounds specified as[n]
indicate implicit bounds. For example,[5, [7]]
specifies an explicit bound of \(x < 5\) for the first dimension and an implicit bound of \(x < 7\) for the second dimension.
- input_inclusive_max : array of integer | [integer]¶
Inclusive upper bounds of the input domain.
Length must equal the
input_rank
. As forinput_inclusive_min
, bounds specified asn
indicate normal, explicit bounds, while bounds specified as[n]
indicate implicit bounds. For example,[5, [7]]
specifies an explicit bound of \(x \leq 5\) for the first dimension and an implicit bound of \(x \leq 7\) for the second dimension.
- input_shape : array of integer | [integer]¶
Extent of each dimension of the input domain.
Length must equal the
input_rank
. As forinput_inclusive_min
, bounds specified asn
indicate normal, explicit bounds, while bounds specified as[n]
indicate implicit bounds. For example, assuming aninput_inclusive_min
of[1, 2]
, aninput_shape
of[5, [7]]
specifies an explicit bound of \(x < 6\) for the first dimension and an implicit bound of \(x < 9\) for the second dimension.
- input_labels : array of string¶
Dimension labels for each input domain dimension.
Length must equal the
input_rank
. An empty string indicates an unlabeled dimension. Non-empty strings must not occur more than once. By default, all dimensions are unlabeled.
- output : array of OutputIndexMap¶
Specifies the output index map for each output dimension.
If not specified, defaults to an identity transform over the input domain. Otherwise, the length determines the output rank of the transform.
-
input_rank : integer[
- json OutputIndexMap : object¶
Specifies a transform from an input space to a single output index.
Logically, an output index map is a function that maps
n
-dimensional input index vectors to a single output index, using one of the supported output index methods. This is used to represent theIndexTransform.output
mapping for each output dimension of anIndexTransform
.- Optional members:¶
- offset : integer¶
Specifies an offset for this output dimension. If neither
input_dimension
norindex_array
is specified, this specifies the constant value to which this output dimension maps.
- stride : integer¶
Multiplier for the input index specified by
input_dimension
or the index array value specified byindex_array
.Only valid to specify in conjunction with
input_dimension
orindex_array
.
-
input_dimension : integer[
0
, +∞)¶ If present, indicates that this output dimension uses a single input dimension map with the specified input dimension. Must not be specified in conjunction with
index_array
.
- index_array : array | integer¶
If present, indicates that this output dimension uses an index array map, with the index array specified as a nested list of rank equal to the
input_rank
.If the
input_rank
is 0, must be a numeric value.
-
index_array_bounds : IndexInterval =
["-inf", "+inf"]
¶ If present, specifies constraints on the values within
index_array
(which must also be specified). Ifindex_array
contains an out-of-bounds index, an error may not be returned immediately but will be returned if the corresponding position within the domain is accessed. If the indices inindex_array
have already been validated, this need not be specified. This allows transforms containing out-of-bounds index array indices to correctly round trip through JSON, but normally need not be specified manually.Example
[5, 100]
Example
["-inf", 10]
Example
[-20, "+inf"]
-
json IndexInterval : [integer |
"-inf"
, integer |"+inf"
]¶ Specifies a closed interval of integer index values.
Alignment and broadcasting¶
Many operations in TensorStore involving two index domains,
such as read, write, and copy operations, automatically align the source
domain to the target
domain.
The following alignment methods are supported (by default, all alignment methods are used):
- permute
Source dimensions are permuted based on their labels in order to align the source domain to the target domain.
- translate
Source dimensions are translated in order to align the source domain to the target.
- broadcast
Source dimensions of size 1 do not have to match a target dimension, and not all target dimensions must match a source dimension.
Alignment is performed based on the following rules:
First, a subset of the source
dimensions are matched to a subset of the
target
dimensions, according to one of two cases:
M1 |
At least one of |
M2 |
Both |
The matching is then validated as follows:
V1 |
For each match between a dimension \(i\) of |
V2 |
If the broadcast alignment method is not permitted, it is an error for any source or target dimension to be unmatched. (In this case, any matches dropped in step V1 result in an error.) |
V3 |
For every unmatched dimension \(i\) of |
V4 |
If the translate alignment method is not permitted, for each match
between a dimension \(i\) of |
If matching succeeds, a new alignment
transform with an (input) domain equal
to target
and an output rank equal to \(\mathtt{source\_rank}\) is
computed as follows:
A1 |
For each dimension \(j\) of |
A2 |
For every unmatched dimension \(i\) of |
The alignment
transform maps target
positions to corresponding
source
positions; for example, when copying, each position of the target
domain is assigned the value at the corresponding position of the source
domain. If the broadcast
alignment method is used, the transform may map
the same source
position to multiple target
positions.
Examples:
All unlabeled dimensions
source:
[3, 7), [5, 6), [4, 10)
target:
[2, 6), [0, 4), [6, 12)
alignment: rank \(3 \rightarrow 3\), with:
\[\begin{split}\mathrm{source}[0] &= \mathrm{target}[0] + 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[2] - 2\end{split}\]
All labeled dimensions
source:
"x": [3, 7), "y": [5, 6), "z": [4, 10)
target:
"z": [6, 12), "x": [4, 8), "y": [0, 4)
alignment: rank \(3 \rightarrow 3\), with:
\[\begin{split}\mathrm{source}[0] &= \mathrm{target}[1] - 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[0] - 2\end{split}\]
Partially labeled dimensions
source:
"x": [3, 7), "y": [5, 6), "": [4, 10)
target:
"": [0, 10) "": [6, 12), "x": [4, 8), "y": [0, 4)
alignment: rank \(4 \rightarrow 3\), with:
\[\begin{split}\mathrm{source}[0] &= \mathrm{target}[2] - 1 \\ \mathrm{source}[1] &= 5 \\ \mathrm{source}[2] &= \mathrm{target}[1] - 2\end{split}\]
Mismatched labeled dimensions
source:
"x": [3, 7), "y": [5, 6), "z": [4, 10)
target:
"z": [6, 12), "w": [4, 8), "y": [0, 4)
ERROR: Unmatched source dimension 0
{"x": [3, 7)}
does not have a size of 1
Note
The alignment behavior supported by TensorStore is fully compatible with
NumPy broadcasting
but additionally is extended
to support non-zero origins and labeled dimensions.