gcs
Key-Value Store driver¶
The gcs
driver provides access to Google Cloud Storage. Keys directly
correspond to paths within a Google Cloud Storage bucket.
Conditional writes are used to safely allow concurrent access from multiple machines.
- json kvstore/gcs : object¶
Read/write access to Google Cloud Storage (GCS).
JSON specification of the key-value store.
- Required members:¶
-
driver :
"gcs"
¶
- bucket : string¶
Google Cloud Storage bucket to use.
The Google Cloud account that is used must have appropriate permissions on the bucket. If the bucket has Requestor Pays enabled, either additional permissions are required or a separate billing project must be specified using
Context.gcs_user_project
.
-
driver :
- Optional members:¶
- path : string¶
Key prefix within the key-value store.
If the prefix is intended to correspond to a Unix-style directory path, it should end with
"/"
.
- gcs_request_concurrency : ContextResource¶
Specifies or references a previously defined
Context.gcs_request_concurrency
.
- gcs_user_project : ContextResource¶
Specifies or references a previously defined
Context.gcs_user_project
.
- gcs_request_retries : ContextResource¶
Specifies or references a previously defined
Context.gcs_request_retries
.
- json Context.gcs_user_project : object¶
Specifies a Google Cloud project to bill for Google Cloud Storage requests. If a
project_id
is not specified, requests are billed to the project that owns the bucket by default. For Requestor Pays buckets, however, requests without aproject_id
specified will fail unless the Google Cloud account has additional permissions.
- json Context.gcs_request_concurrency : object¶
Specifies a limit on the number of concurrent requests to Google Cloud Storage.
- Optional members:¶
-
limit : integer[
1
, +∞) |"shared"
="shared"
¶ The maximum number of concurrent requests. If the special value of
"shared"
is specified, a shared global limit specified by environment variableTENSORSTORE_GCS_REQUEST_CONCURRENCY
, which defaults to 32.
-
limit : integer[
- json Context.gcs_request_retries : object¶
Specifies retry parameters for handling transient network errors. An exponential delay is added between consecutive retry attempts. The default values are appropriate for GCS.
- json Context.experimental_gcs_rate_limiter : object¶
Experimental rate limiter configuration for Google Cloud Storage reads and writes.
- Optional members:¶
- read_rate : number¶
The maximum rate or read and/or list calls issued per second. See <https://cloud.google.com/storage/docs/request-rate#ramp-up>
- write_rate : number¶
The maximum rate of write and/or delete calls issued per second. See <https://cloud.google.com/storage/docs/request-rate#ramp-up>
-
doubling_time : string =
"0"
¶ The time interval over which the initial rates scale to 2x. The cases where this setting is useful depend on details to the storage buckets. See <https://cloud.google.com/storage/docs/request-rate#ramp-up>
- json KvStoreUrl/gs : object¶
gs://
KvStore URL schemeGoogle Cloud Storage-based key-value stores may be specified using the
gs://bucket/path
URL syntax, as supported by gsutil.Examples
URL representation
JSON representation
"gs://my-bucket"
{"driver": "gcs", "bucket": "my-bucket"}
"gs://bucket/path/to/dataset"
{"driver": "gcs", "bucket": "my-bucket", "path": "path/to/dataset"}
- Extends:¶
KvStoreUrl
— URL representation of a key-value store.
Authentication¶
To use the gcs
driver, you can access buckets that allow public access
(i.e. access by allUsers
) without credentials. In order to access
non-public buckets, you must specify service account credentials, which can be
done through one of the following methods:
Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the local path to a Google Cloud JSON credentials file.Set up Google Cloud SDK application default credentials. Install the Google Cloud SDK and run:
gcloud auth application-default login
This stores Google Cloud credentials in
~/.config/gcloud/application_default_credentials.json
or$CLOUDSDK_CONFIG/application_default_credentials.json
.This is often the most convenient method to use on a development machine.
On Google Compute Engine (GCE), the default service account credentials are retrieved automatically from the metadata service if credentials are not otherwise specified.
TLS CA certificates¶
TensorStore connects to the Google Cloud Storage API using HTTP and depends on the system certificate authority (CA) store to secure connections. In many cases it will work by default without any additional configuration, but if you receive an error like:
CURL error[77] Problem with the SSL CA cert (path? access rights?):
error setting certificate verify locations:
CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: none
refer to the HTTP request-related environment variables section for information on how to specify the path to the system certificate store at runtime.
Testing¶
To test the gcs
driver with a fake Google Cloud Storage server, such as
fake-gcs-server, you can set the
TENSORSTORE_GCS_HTTP_URL
environment variable to
e.g. http://localhost:4443
.