gcs Key-Value Store driver

The gcs driver provides access to Google Cloud Storage. Keys directly correspond to paths within a Google Cloud Storage bucket.

Conditional writes are used to safely allow concurrent access from multiple machines.

json kvstore/gcs : object

Read/write access to Google Cloud Storage (GCS).

JSON specification of the key-value store.

Extends:
  • KvStore — Key-value store specification.

Required members:
driver : "gcs"
bucket : string

Google Cloud Storage bucket to use.

The Google Cloud account that is used must have appropriate permissions on the bucket. If the bucket has Requestor Pays enabled, either additional permissions are required or a separate billing project must be specified using Context.gcs_user_project.

Optional members:
path : string

Key prefix within the key-value store.

If the prefix is intended to correspond to a Unix-style directory path, it should end with "/".

context : Context

Specifies context resources that augment/override the parent context.

gcs_request_concurrency : ContextResource

Specifies or references a previously defined Context.gcs_request_concurrency.

gcs_user_project : ContextResource

Specifies or references a previously defined Context.gcs_user_project.

gcs_request_retries : ContextResource

Specifies or references a previously defined Context.gcs_request_retries.

json Context.gcs_user_project : object

Specifies a Google Cloud project to bill for Google Cloud Storage requests. If a project_id is not specified, requests are billed to the project that owns the bucket by default. For Requestor Pays buckets, however, requests without a project_id specified will fail unless the Google Cloud account has additional permissions.

Optional members:
project_id : string

Google Cloud project id, e.g. "my-project". The Google Cloud account that is used must have appropriate permissions to bill to the specified project.

json Context.gcs_request_concurrency : object

Specifies a limit on the number of concurrent requests to Google Cloud Storage.

Optional members:
limit : integer[1, +∞) | "shared" = "shared"

The maximum number of concurrent requests. If the special value of "shared" is specified, a shared global limit specified by environment variable TENSORSTORE_GCS_REQUEST_CONCURRENCY, which defaults to 32.

json Context.gcs_request_retries : object

Specifies retry parameters for handling transient network errors. An exponential delay is added between consecutive retry attempts. The default values are appropriate for GCS.

Optional members:
max_retries : integer[1, +∞) = 32

Maximum number of attempts in the case of transient errors.

initial_delay : string = "1s"

Initial backoff delay for transient errors.

max_delay : string = "32s"

Maximum backoff delay for transient errors.

json Context.experimental_gcs_rate_limiter : object

Experimental rate limiter configuration for Google Cloud Storage reads and writes.

Optional members:
read_rate : number

The maximum rate or read and/or list calls issued per second. See <https://cloud.google.com/storage/docs/request-rate#ramp-up>

write_rate : number

The maximum rate of write and/or delete calls issued per second. See <https://cloud.google.com/storage/docs/request-rate#ramp-up>

doubling_time : string = "0"

The time interval over which the initial rates scale to 2x. The cases where this setting is useful depend on details to the storage buckets. See <https://cloud.google.com/storage/docs/request-rate#ramp-up>

json KvStoreUrl/gs : object

gs:// KvStore URL scheme

Google Cloud Storage-based key-value stores may be specified using the gs://bucket/path URL syntax, as supported by gsutil.

Examples

URL representation

JSON representation

"gs://my-bucket"

{"driver": "gcs",
 "bucket": "my-bucket"}

"gs://bucket/path/to/dataset"

{"driver": "gcs",
 "bucket": "my-bucket",
 "path": "path/to/dataset"}
Extends:
  • KvStoreUrl — URL representation of a key-value store.

Authentication

To use the gcs driver, you can access buckets that allow public access (i.e. access by allUsers) without credentials. In order to access non-public buckets, you must specify service account credentials, which can be done through one of the following methods:

  1. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the local path to a Google Cloud JSON credentials file.

  2. Set up Google Cloud SDK application default credentials. Install the Google Cloud SDK and run:

    gcloud auth application-default login
    

    This stores Google Cloud credentials in ~/.config/gcloud/application_default_credentials.json or $CLOUDSDK_CONFIG/application_default_credentials.json.

    This is often the most convenient method to use on a development machine.

  3. On Google Compute Engine (GCE), the default service account credentials are retrieved automatically from the metadata service if credentials are not otherwise specified.

TLS CA certificates

TensorStore connects to the Google Cloud Storage API using HTTP and depends on the system certificate authority (CA) store to secure connections. In many cases it will work by default without any additional configuration, but if you receive an error like:

CURL error[77] Problem with the SSL CA cert (path? access rights?):
error setting certificate verify locations:
  CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none

refer to the HTTP request-related environment variables section for information on how to specify the path to the system certificate store at runtime.

Testing

To test the gcs driver with a fake Google Cloud Storage server, such as fake-gcs-server, you can set the TENSORSTORE_GCS_HTTP_URL environment variable to e.g. http://localhost:4443.