tensorstore.TensorStore.write

tensorstore.TensorStore.write(self, source: TensorStore | ArrayLike, *, batch: Batch | None = None, can_reference_source_data_indefinitely: bool | None = None) → WriteFutures

Writes to the current domain.

Example

>>> dataset = await ts.open(
...     {
...         'driver': 'zarr',
...         'kvstore': {
...             'driver': 'memory'
...         }
...     },
...     dtype=ts.uint32,
...     shape=[70, 80],
...     create=True)
>>> await dataset[5:10, 6:8].write(42)
>>> await dataset[0:10, 0:10].read()
array([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 42, 42,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 42, 42,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 42, 42,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 42, 42,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 42, 42,  0,  0]], dtype=uint32)
>>> await dataset[5:10, 6:8].write([1, 2])
>>> await dataset[5:10, 6:8].read()
array([[1, 2],
       [1, 2],
       [1, 2],
       [1, 2],
       [1, 2]], dtype=uint32)

Parameters:¶

source: TensorStore | ArrayLike¶

Source array, broadcast-compatible with self.domain and with a data type convertible to self.dtype. May be an existing TensorStore or any ArrayLike, including a scalar.

batch: Batch | None = None¶

Batch to use for reading any metadata required for opening.

Warning

If specified, the returned Future will not, in general, become ready until the batch is submitted. Therefore, immediately awaiting the returned future will lead to deadlock.

can_reference_source_data_indefinitely: bool | None = None¶

References to the source data may be retained indefinitely, even after the write is committed. The source data must not be modified until all references are released.

Returns:¶

Future representing the asynchronous result of the write operation.

Logically there are two steps to the write operation:

reading/copying from the source, and
waiting for the write to be committed, such that it will be reflected in subsequent reads.

The completion of these two steps can be tracked separately using the returned WriteFutures.copy and WriteFutures.commit futures, respectively:

Waiting on the returned WriteFutures object itself waits for the entire write operation to complete, and is equivalent to waiting on the WriteFutures.commit future. The returned WriteFutures.copy future becomes ready once the data has been fully read from source. After this point, source may be safely modified without affecting the write operation.

Warning

You must either synchronously or asynchronously wait on the returned future in order to ensure the write actually completes. If all references to the future are dropped without waiting on it, the write may be cancelled.

Non-transactional semantics¶

When not using a Transaction, the returned WriteFutures.commit future becomes ready only once the data has been durably committed by the underlying storage layer. The precise durability guarantees depend on the driver, but for example:

when using the file Key-Value Store driver, the data is only considered committed once the fsync system call completes, which should normally guarantee that it will survive a system crash;
when using the gcs Key-Value Store driver, the data is only considered committed once the write is acknowledged and durability is guaranteed by Google Cloud Storage.

Because committing a write often has significant latency, it is advantageous to issue multiple writes concurrently and then wait on all of them jointly:

>>> dataset = await ts.open(
...     {
...         'driver': 'zarr',
...         'kvstore': {
...             'driver': 'memory'
...         }
...     },
...     dtype=ts.uint32,
...     shape=[70, 80],
...     create=True)
>>> await asyncio.wait([
...     asyncio.ensure_future(dataset[i * 5].write(i)) for i in range(10)
... ])

This can also be accomplished with synchronous blocking:

>>> dataset = ts.open({
...     'driver': 'zarr',
...     'kvstore': {
...         'driver': 'memory'
...     }
... },
...                   dtype=ts.uint32,
...                   shape=[70, 80],
...                   create=True).result()
>>> futures = [dataset[i * 5].write(i) for i in range(10)]
>>> for f in futures:
...     f.result()

Note

When issuing writes asynchronously, keep in mind that uncommitted writes are never reflected in non-transactional reads.

For most drivers, data is written in fixed-size write chunks arranged in a regular grid. When concurrently issuing multiple writes that are not perfectly aligned to disjoint write chunks, specifying a Context.cache_pool enables writeback caching, which can improve efficiency by coalescing multiple writes to the same chunk.

Alternatively, for more explicit control over writeback behavior, you can use a Transaction.

Transactional semantics¶

Transactions provide explicit control over writeback, and allow uncommitted writes to be read:

>>> txn = ts.Transaction()
>>> dataset = await ts.open(
...     {
...         'driver': 'zarr',
...         'kvstore': {
...             'driver': 'memory'
...         }
...     },
...     dtype=ts.uint32,
...     shape=[70, 80],
...     create=True)
>>> await dataset.with_transaction(txn)[5:10, 6:8].write([1, 2])
>>> # Transactional read reflects uncommitted write
>>> await dataset.with_transaction(txn)[5:10, 6:8].read()
array([[1, 2],
       [1, 2],
       [1, 2],
       [1, 2],
       [1, 2]], dtype=uint32)
>>> # Non-transactional read does not reflect uncommitted write
>>> await dataset[5:10, 6:8].read()
array([[0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0]], dtype=uint32)
>>> await txn.commit_async()
>>> # Now, non-transactional read reflects committed write
>>> await dataset[5:10, 6:8].read()
array([[1, 2],
       [1, 2],
       [1, 2],
       [1, 2],
       [1, 2]], dtype=uint32)

Warning

When using a Transaction, the returned WriteFutures.commit future does not indicate that the data is durably committed by the underlying storage layer. Instead, it merely indicates that the write will be reflected in any subsequent reads using the same transaction. The write is only durably committed once the transaction is committed successfully.