-
tensorstore.TensorStore.write(self, source: TensorStore | ArrayLike, *, batch: Batch | None =
None
, can_reference_source_data_indefinitely: bool | None =None
) WriteFutures Writes to the current domain.
Example
>>> dataset = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... } ... }, ... dtype=ts.uint32, ... shape=[70, 80], ... create=True) >>> await dataset[5:10, 6:8].write(42) >>> await dataset[0:10, 0:10].read() array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 42, 42, 0, 0], [ 0, 0, 0, 0, 0, 0, 42, 42, 0, 0], [ 0, 0, 0, 0, 0, 0, 42, 42, 0, 0], [ 0, 0, 0, 0, 0, 0, 42, 42, 0, 0], [ 0, 0, 0, 0, 0, 0, 42, 42, 0, 0]], dtype=uint32) >>> await dataset[5:10, 6:8].write([1, 2]) >>> await dataset[5:10, 6:8].read() array([[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]], dtype=uint32)
- Parameters:¶
- source: TensorStore | ArrayLike¶
Source array, broadcast-compatible with
self.domain
and with a data type convertible toself.dtype
. May be an existingTensorStore
or anyArrayLike
, including a scalar.- batch: Batch | None =
None
¶ Batch to use for reading any metadata required for opening.
Warning
If specified, the returned
Future
will not, in general, become ready until the batch is submitted. Therefore, immediately awaiting the returned future will lead to deadlock.- can_reference_source_data_indefinitely: bool | None =
None
¶ References to the source data may be retained indefinitely, even after the write is committed. The source data must not be modified until all references are released.
- Returns:¶
Future representing the asynchronous result of the write operation.
Logically there are two steps to the write operation:
reading/copying from the
source
, andwaiting for the write to be committed, such that it will be reflected in subsequent reads.
The completion of these two steps can be tracked separately using the returned
WriteFutures.copy
andWriteFutures.commit
futures, respectively:Waiting on the returned
WriteFutures
object itself waits for the entire write operation to complete, and is equivalent to waiting on theWriteFutures.commit
future. The returnedWriteFutures.copy
future becomes ready once the data has been fully read fromsource
. After this point,source
may be safely modified without affecting the write operation.Warning
You must either synchronously or asynchronously wait on the returned future in order to ensure the write actually completes. If all references to the future are dropped without waiting on it, the write may be cancelled.
Non-transactional semantics¶
When not using a
Transaction
, the returnedWriteFutures.commit
future becomes ready only once the data has been durably committed by the underlying storage layer. The precise durability guarantees depend on the driver, but for example:when using the file Key-Value Store driver, the data is only considered committed once the
fsync
system call completes, which should normally guarantee that it will survive a system crash;when using the gcs Key-Value Store driver, the data is only considered committed once the write is acknowledged and durability is guaranteed by Google Cloud Storage.
Because committing a write often has significant latency, it is advantageous to issue multiple writes concurrently and then wait on all of them jointly:
>>> dataset = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... } ... }, ... dtype=ts.uint32, ... shape=[70, 80], ... create=True) >>> await asyncio.wait([ ... asyncio.ensure_future(dataset[i * 5].write(i)) for i in range(10) ... ])
This can also be accomplished with synchronous blocking:
>>> dataset = ts.open({ ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... } ... }, ... dtype=ts.uint32, ... shape=[70, 80], ... create=True).result() >>> futures = [dataset[i * 5].write(i) for i in range(10)] >>> for f in futures: ... f.result()
Note
When issuing writes asynchronously, keep in mind that uncommitted writes are never reflected in non-transactional reads.
For most drivers, data is written in fixed-size write chunks arranged in a regular grid. When concurrently issuing multiple writes that are not perfectly aligned to disjoint write chunks, specifying a
Context.cache_pool
enables writeback caching, which can improve efficiency by coalescing multiple writes to the same chunk.Alternatively, for more explicit control over writeback behavior, you can use a
Transaction
.Transactional semantics¶
Transactions provide explicit control over writeback, and allow uncommitted writes to be read:
>>> txn = ts.Transaction() >>> dataset = await ts.open( ... { ... 'driver': 'zarr', ... 'kvstore': { ... 'driver': 'memory' ... } ... }, ... dtype=ts.uint32, ... shape=[70, 80], ... create=True) >>> await dataset.with_transaction(txn)[5:10, 6:8].write([1, 2]) >>> # Transactional read reflects uncommitted write >>> await dataset.with_transaction(txn)[5:10, 6:8].read() array([[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]], dtype=uint32) >>> # Non-transactional read does not reflect uncommitted write >>> await dataset[5:10, 6:8].read() array([[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]], dtype=uint32) >>> await txn.commit_async() >>> # Now, non-transactional read reflects committed write >>> await dataset[5:10, 6:8].read() array([[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]], dtype=uint32)
Warning
When using a
Transaction
, the returnedWriteFutures.commit
future does not indicate that the data is durably committed by the underlying storage layer. Instead, it merely indicates that the write will be reflected in any subsequent reads using the same transaction. The write is only durably committed once the transaction is committed successfully.