tensorstore.virtual_chunked

Creates a TensorStore where the content is read/written chunk-wise by an arbitrary function.

Example (read-only):

>>> a = ts.array([[1, 2, 3], [4, 5, 6]], dtype=ts.uint32)
>>> async def do_read(domain: ts.IndexDomain, array: np.ndarray,
...                   read_params: ts.VirtualChunkedReadParameters):
...     print(f'Computing content for: {domain}')
...     array[...] = (await a[domain].read()) + 100
>>> t = ts.virtual_chunked(do_read, dtype=a.dtype, domain=a.domain)
>>> await t.read()
Computing content for: { [0, 2), [0, 3) }
array([[101, 102, 103],
       [104, 105, 106]], dtype=uint32)

Example (read/write):

>>> array = np.zeros(shape=[4, 5], dtype=np.uint32)
>>> array[1] = 50
>>> def do_read(domain, chunk, read_context):
...     chunk[...] = array[domain.index_exp]
>>> def do_write(domain, chunk, write_context):
...     array[domain.index_exp] = chunk
>>> t = ts.virtual_chunked(
...     do_read,
...     do_write,
...     dtype=array.dtype,
...     shape=array.shape,
...     chunk_layout=ts.ChunkLayout(read_chunk_shape=(2, 3)))
>>> await t.read()
array([[ 0,  0,  0,  0,  0],
       [50, 50, 50, 50, 50],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0]], dtype=uint32)
>>> t[1:3, 1:3] = 42
>>> array
array([[ 0,  0,  0,  0,  0],
       [50, 42, 42, 50, 50],
       [ 0, 42, 42,  0,  0],
       [ 0,  0,  0,  0,  0]], dtype=uint32)

Parameters:¶

read_function: Callable[[IndexDomain, ArrayLike, VirtualChunkedReadParameters], FutureLike[KvStore.TimestampedStorageGeneration | None]] | None = None¶

Callback that handles chunk read requests. Must be specified to create a virtual view that supports reads. To create a write-only view, leave this unspecified (as None).

This function should assign to the array the content for the specified IndexDomain.

The returned TimestampedStorageGeneration identifies the version of the content, for caching purposes. If versioning is not applicable, None may be returned to indicate a value that may be cached indefinitely.

If it returns a coroutine, the coroutine will be executed using the event loop indicated by loop.

write_function: Callable[[IndexDomain, ArrayLike, VirtualChunkedWriteParameters], FutureLike[KvStore.TimestampedStorageGeneration | None]] | None = None¶

Callback that handles chunk write requests. Must be specified to create a virtual view that supports writes. To create a read-only view, leave this unspecified (as None).

This function store the content of the array for the specified IndexDomain.

The returned TimestampedStorageGeneration identifies the stored version of the content, for caching purposes. If versioning is not applicable, None may be returned to indicate a value that may be cached indefinitely.

If it returns a coroutine, the coroutine will be executed using the event loop indicated by loop.

loop: asyncio.AbstractEventLoop | None = None¶

Event loop on which to execute read_function and/or write_function if they are async functions. If not specified (or None is specified), defaults to the loop returned by asyncio.get_running_loop (in the context of the call to virtual_chunked). If loop is not specified and there is no running event loop, it is an error for read_function or write_function to return a coroutine.

rank: int | None = None¶

Constrains the rank of the TensorStore. If there is an index transform, the rank constraint must match the rank of the input space.

dtype: dtype | None = None¶

Constrains the data type of the TensorStore. If a data type has already been set, it is an error to specify a different data type.

domain: IndexDomain | None = None¶

Constrains the domain of the TensorStore. If there is an existing domain, the specified domain is merged with it as follows:

The rank must match the existing rank.
All bounds must match, except that a finite or explicit bound is permitted to match an infinite and implicit bound, and takes precedence.
If both the new and existing domain specify non-empty labels for a dimension, the labels must be equal. If only one of the domains specifies a non-empty label for a dimension, the non-empty label takes precedence.

Note that if there is an index transform, the domain must match the input space, not the output space.

shape: Sequence[int] | None = None¶

Constrains the shape and origin of the TensorStore. Equivalent to specifying a domain of ts.IndexDomain(shape=shape).

Note

This option also constrains the origin of all dimensions to be zero.

chunk_layout: ChunkLayout | None = None¶

Constrains the chunk layout. If there is an existing chunk layout constraint, the constraints are merged. If the constraints are incompatible, an error is raised.

Specifies the physical units of each dimension of the domain.

The physical unit for a dimension is the physical quantity corresponding to a single index increment along each dimension.

A value of None indicates that the unit is unknown. A dimension-less quantity can be indicated by a unit of "".

schema: Schema | None = None¶

Additional schema constraints to merge with existing constraints.

context: Context | None = None¶

Shared resource context. Defaults to a new (unshared) context with default options, as returned by tensorstore.Context(). To share resources, such as cache pools, between multiple open TensorStores, you must specify a context.

transaction: Transaction | None = None¶

Transaction to use for opening/creating, and for subsequent operations. By default, the open is non-transactional.

Note

To perform transactional operations using a TensorStore that was previously opened without a transaction, use TensorStore.with_transaction.

Warning

Neither read_function nor write_function should block synchronously while waiting for another TensorStore operation; blocking on another operation that uses the same Context.data_copy_concurrency resource may result in deadlock. Instead, it is better to specify a coroutine function for read_function and write_function and use await to wait for the result of other TensorStore operations.

Caching¶

By default, the computed content of chunks is not cached, and will be recomputed on every read. To enable caching:

Specify a Context that contains a cache_pool with a non-zero size limit, e.g.: {"cache_pool": {"total_bytes_limit": 100000000}} for 100MB.
Additionally, if the data is not immutable, the read_function should return a unique generation and a timestamp that is not float('inf'). When a cached chunk is re-read, the read_function will be called with if_not_equal specified. If the generation specified by if_not_equal is still current, the read_function may leave the output array unmodified and return a TimestampedStorageGeneration with an appropriate time but generation left unspecified.

Pickle support¶

The returned TensorStore supports pickling if, and only if, the read_function and write_function support pickling.

Note

The pickle module only supports global functions defined in named modules. For broader function support, you may wish to use cloudpickle.

Warning

The specified loop is not preserved when the returned TensorStore is pickled, since it is a property of the current thread. Instead, when unpickled, the resultant TensorStore will use the running event loop (as returned by asyncio.get_running_loop) of the thread used for unpickling, if there is one.

Transaction support¶

Transactional reads and writes are supported on virtual_chunked views. A transactional write simply serves to buffer the write in memory until it is committed. Transactional reads will observe prior writes made using the same transaction. However, when the transaction commit is initiated, the write_function is called in exactly the same way as for a non-transactional write, and if more than one chunk is affected, the commit will be non-atomic. If the transaction is atomic, it is an error to write to more than one chunk in the same transaction.

You are also free to use transactional operations, e.g. operations on a KvStore or another TensorStore, within the read_function or write_function.

For read-write views, you should not attempt to use the same transaction within the read_function or write_function that is also used for read or write operations on the virtual view directly, because both write_function and read_function may be called after the commit starts, and any attempt to perform new operations using the same transaction once it is already being committed will fail; instead, any transactional operations performed within the read_function or write_function should use a different transaction.
For read-only views, it is possible to use the same transaction within the read_function as is also used for read operations on the virtual view directly, though this may not be particularly useful.

Specifying a transaction directly when creating the virtual chunked view is no different than binding the transaction to an existing virtual chunked view.