file Key-Value Store driver

The file driver uses the filesystem as a key-value store directly. A key directly specifies a path under a given root directory; the value is stored as the file contents.

Locking provided by the filesystem is used to safely allow concurrent access from multiple processes. (The locking protocol used does not block readers.) Provided that shared locking is supported, concurrent access from multiple machines to a network filesystem is also safe.

json kvstore/file : object

Read/write access to the local filesystem.

JSON specification of the key-value store.

Extends:
  • KvStore — Key-value store specification.

Required members:
driver : "file"
path : string

Path to root directory on local filesystem.

Optional members:
context : Context

Specifies context resources that augment/override the parent context.

file_io_concurrency : ContextResource

Specifies or references a previously defined Context.file_io_concurrency.

file_io_sync : ContextResource

Specifies or references a previously defined Context.file_io_sync.

file_io_memmap : ContextResource

Specifies or references a previously defined Context.file_io_memmap.

file_io_locking : ContextResource

Specifies or references a previously defined Context.file_io_locking.

json KvStoreUrl/file : string

file:// KvStore URL scheme

File-based key-value stores may be specified using the widely-supported file://path URL syntax.

Examples

URL representation

JSON representation

"file:///tmp/dataset/"

{"driver": "file",
 "path": "/tmp/dataset/"}

"file://C:/Users/abc/dataset/"

{"driver": "file",
 "path": "C:/Users/abc/dataset/"}
Extends:
  • KvStoreUrl — URL representation of a key-value store.

json Context.file_io_concurrency : object

Specifies a limit on the number of concurrently local filesystem I/O operations.

Optional members:
limit : integer[1, +∞) | "shared" = "shared"

The maximum number of concurrent operations. If the special value of "shared" is specified, a shared global limit equal to the number of CPU cores/threads available (or 4 if there are fewer than 4 cores/threads available) applies.

json Context.file_io_sync : boolean = true

Specifies durability of writes.

If true, durability is ensured for local file writes (e.g. by calling fsync). If false, durability is not guaranteed, and data may be lost in the event of a crash.

In cases where durability is not required, setting this to false may make write operations faster.

json Context.file_io_locking : object

Specifies locking strategy to use for file writes.

Optional members:
mode : "os" | "none" | "lockfile" = "os"

Selects the locking mode.

When set to "os", os locking such as flock is used. Stale lock files with the suffix ".__lock" may remain if a failure occurs while a write is in progress, but these files will be cleaned up automatically by any subsequent write to the same key.

When set to "lockfile", lockfiles are used. Stale lock files with the suffix ".__lock" may remain if a failure occurs while a write is in progress, and these files will need to be deleted manually to unblock any subsequent writes.

When set to "none", no locking is used. Conditional writes, as used for read-modify-write operations, such as partial array chunk writes, are not atomic. In the case of concurrent writes to the same key, the conditions may not be respected and updates may be lost. Partial writes will not be observed; unconditional writes are still atomic. If a failure occurs while a write is in progress, stale temporary files with the suffix ".__lock" may remain. These files will not impact subsequent operations but will need to be cleaned up manually to reclaim space.

acquire_timeout = "60s"

Timeout for acquiring a lock when using "lockfile" locking.

json Context.file_io_memmap : boolean = false

Specifies use of memory-mapped I/O for reads.

If true, the file system uses memory-mapped io on reads. This may improve performance for large read with the following caveats:

  • TensorStore may retain references to memory-mapped buffers, which due to batching may cover a larger region of the file than an individual read request, even after the read competes, depending on the cache and codec configuration, leading to higher-than-expected virtual memory usage.

  • If a file is truncated or otherwise modified in-place by a non-TensorStore process while it is memory-mapped, the TensorStore process may crash, or some combination of the old and new data may be observed by TensorStore. Overwriting the entire file via rename, as TenosrStore itself does, is safe.

Durability of writes

By default, this driver ensures all writes are durable, meaning that committed data won’t be lost in the event that the process or machine crashes.

In cases where durability is not necessary, faster write performance may be achieved by setting Context.file_io_sync to :json:false.

{"driver": "file",
 "path": "/local/path/",
 "file_io_sync": false}

Limitations

Note

This driver is only supported on Windows 10 RS1 or later, due to its reliance on file operations with POSIX semantics.