file
Key-Value Store driver¶
The file
driver uses the filesystem as a key-value store directly. A key
directly specifies a path under a given root directory; the value is stored as
the file contents.
Locking provided by the filesystem is used to safely allow concurrent access from multiple processes. (The locking protocol used does not block readers.) Provided that shared locking is supported, concurrent access from multiple machines to a network filesystem is also safe.
- json kvstore/file : object¶
Read/write access to the local filesystem.
JSON specification of the key-value store.
- Optional members:¶
-
- file_io_concurrency : ContextResource¶
Specifies or references a previously defined
Context.file_io_concurrency
.
- file_io_sync : ContextResource¶
Specifies or references a previously defined
Context.file_io_sync
.
- file_io_memmap : ContextResource¶
Specifies or references a previously defined
Context.file_io_memmap
.
- file_io_locking : ContextResource¶
Specifies or references a previously defined
Context.file_io_locking
.
- json KvStoreUrl/file : string¶
file://
KvStore URL schemeFile-based key-value stores may be specified using the widely-supported
file://path
URL syntax.Examples
URL representation
JSON representation
"file:///tmp/dataset/"
{"driver": "file", "path": "/tmp/dataset/"}
"file://C:/Users/abc/dataset/"
{"driver": "file", "path": "C:/Users/abc/dataset/"}
- Extends:¶
KvStoreUrl
— URL representation of a key-value store.
- json Context.file_io_concurrency : object¶
Specifies a limit on the number of concurrently local filesystem I/O operations.
-
json Context.file_io_sync : boolean =
true
¶ Specifies durability of writes.
If
true
, durability is ensured for local file writes (e.g. by callingfsync
). Iffalse
, durability is not guaranteed, and data may be lost in the event of a crash.In cases where durability is not required, setting this to
false
may make write operations faster.
- json Context.file_io_locking : object¶
Specifies locking strategy to use for file writes.
- Optional members:¶
-
mode :
"os"
|"none"
|"lockfile"
="os"
¶ Selects the locking mode.
When set to
"os"
, os locking such asflock
is used. Stale lock files with the suffix".__lock"
may remain if a failure occurs while a write is in progress, but these files will be cleaned up automatically by any subsequent write to the same key.When set to
"lockfile"
, lockfiles are used. Stale lock files with the suffix".__lock"
may remain if a failure occurs while a write is in progress, and these files will need to be deleted manually to unblock any subsequent writes.When set to
"none"
, no locking is used. Conditional writes, as used for read-modify-write operations, such as partial array chunk writes, are not atomic. In the case of concurrent writes to the same key, the conditions may not be respected and updates may be lost. Partial writes will not be observed; unconditional writes are still atomic. If a failure occurs while a write is in progress, stale temporary files with the suffix".__lock"
may remain. These files will not impact subsequent operations but will need to be cleaned up manually to reclaim space.
-
acquire_timeout =
"60s"
¶ Timeout for acquiring a lock when using
"lockfile"
locking.
-
mode :
-
json Context.file_io_memmap : boolean =
false
¶ Specifies use of memory-mapped I/O for reads.
If
true
, the file system uses memory-mapped io on reads. This may improve performance for large read with the following caveats:TensorStore may retain references to memory-mapped buffers, which due to batching may cover a larger region of the file than an individual read request, even after the read competes, depending on the cache and codec configuration, leading to higher-than-expected virtual memory usage.
If a file is truncated or otherwise modified in-place by a non-TensorStore process while it is memory-mapped, the TensorStore process may crash, or some combination of the old and new data may be observed by TensorStore. Overwriting the entire file via rename, as TenosrStore itself does, is safe.
Durability of writes¶
By default, this driver ensures all writes are durable, meaning that committed data won’t be lost in the event that the process or machine crashes.
In cases where durability is not necessary, faster write performance may be
achieved by setting Context.file_io_sync
to :json:false
.
{"driver": "file",
"path": "/local/path/",
"file_io_sync": false}
Limitations¶
Note
This driver is only supported on Windows 10 RS1 or later, due to its reliance on file operations with POSIX semantics.