s3 Key-Value Store driver

The s3 driver provides access to Amazon S3 and S3-compatible object stores. Keys directly correspond to paths within an S3 bucket.

Warning

The s3 key-value store driver does not provide all the atomicity guarantees required by tensorstore. On AWS, specfically, DELETE is not atomic, which leads to race conditions. On other S3-compatible object stores even PUT may not be atomic.

This non-atomicity can lead to unexpected behavior when writing to an S3-backed TensorStore. For example, writing to a zarr array can in some cases lead to a delete rather than a write (if it happens to match the fill value), and therefore a write operation that might be atomic and safe when writing to other key-value store implementations might be unsafe when using s3.

json kvstore/s3 : object

Read/write access to Amazon S3-compatible object stores.

JSON specification of the key-value store.

Extends:
  • KvStore — Key-value store specification.

Required members:
driver : "s3"
bucket : string

AWS S3 Storage bucket.

Optional members:
path : string

Key prefix within the key-value store.

If the prefix is intended to correspond to a Unix-style directory path, it should end with "/".

context : Context

Specifies context resources that augment/override the parent context.

requester_pays : boolean = false

Permit requester-pays requests.

This option must be enabled in order for any operations to succeed if the bucket has Requester Pays enabled and the supplied credentials are not for an owner of the bucket.

aws_region : string

AWS region identifier to use in signatures.

If endpoint is not specified, the region of the bucket is determined automatically.

endpoint : string

S3 server endpoint to use in place of the public Amazon S3 endpoints.

Must be an http or https URL.

Example

"http://localhost:1234"
host_header : string

Override HTTP host header to send in requests.

May only be specified in conjunction with endpoint, to send a different host than specified in endpoint. This may be useful for testing with localstack.”

Example

"mybucket.s3.af-south-1.localstack.localhost.com"
aws_credentials : ContextResource

Specifies or references a previously defined Context.aws_credentials.

s3_request_concurrency : ContextResource

Specifies or references a previously defined Context.s3_request_concurrency.

s3_request_retries : ContextResource

Specifies or references a previously defined Context.s3_request_retries.

experimental_s3_rate_limiter : ContextResource

Specifies or references a previously defined Context.experimental_s3_rate_limiter.

data_copy_concurrency : ContextResource = "data_copy_concurrency"

Specifies or references a previously defined Context.data_copy_concurrency.

json Context.s3_request_concurrency : object

Specifies a limit on the number of concurrent requests to S3.

Optional members:
limit : integer[1, +∞) | "shared" = "shared"

The maximum number of concurrent requests. If the special value of :json:”shared” is specified, a shared global limit specified by environment variable TENSORSTORE_S3_REQUEST_CONCURRENCY, which defaults to 32.

json Context.s3_request_retries : object

Specifies retry parameters for handling transient network errors. An exponential delay is added between consecutive retry attempts. The default values are appropriate for S3.

Optional members:
max_retries : integer[1, +∞) = 32

Maximum number of attempts in the case of transient errors.

initial_delay : string = "1s"

Initial backoff delay for transient errors.

max_delay : string = "32s"

Maximum backoff delay for transient errors.

json Context.experimental_s3_rate_limiter : object

Experimental rate limiter configuration for S3 reads and writes.

Optional members:
read_rate : number

The maximum rate or read and/or list calls issued per second.

write_rate : number

The maximum rate of write and/or delete calls issued per second.

doubling_time : string = "0"

The time interval over which the initial rates scale to 2x. The cases where this setting is useful depend on details to the storage buckets.

json Context.aws_credentials : object

The type member identifies the credentials provider. The remaining members are specific to the credentials provider.

Subtypes:
Required members:
type : string

Specifies the credentials provider.

json Context.aws_credentials/anonymous : object

Uses anonymous credentials.

Extends:
Required members:
type : "anonymous"
json Context.aws_credentials/default : object

Source credentials using the default AWS credentials chain.

Extends:
Required members:
type : "default"
Optional members:
profile : string = "default"

The profile name in the ~/.aws/credentials file.

When unset, AWS credentials also examines the AWS_PROFILE environment variable.

json Context.aws_credentials/profile : object

Sources credentials from the AWS config and credentials files.

Extends:
Required members:
type : "profile"
Optional members:
profile : string = "default"

The profile name in the ~/.aws/credentials file.

When unset, AWS credentials also examines the AWS_PROFILE environment variable.

config_file : string = "${HOME}/.aws/config"

The path to the AWS config file.

When unset, AWS credentials also examines the AWS_CONFIG_FILE environment variable.

credentials_file : string = "${HOME}/.aws/credentials"

The path to the AWS credentials file.

When unset, AWS credentials also examines the AWS_SHARED_CREDENTIALS_FILE environment variable.

json Context.aws_credentials/ecs : object

Sources credentials from ECS container metadata.

Extends:
Required members:
type : "ecs"
Optional members:
endpoint : string

URL used to request credentials from the ECS container metadata service.

When unset, ECS credentials are sourced from the environment.

auth_token_file : string

File path containing the Authorization token to include in an ECS credentials query.

This file contains an authorization token to include in an ECS credentials query. The file will be read each time the credentials are requested.

json Context.aws_credentials/environment : object
Source credentials from the Environment variables:
Extends:
Required members:
type : "environment"
json Context.aws_credentials/imds : object

Source credentials from the EC2 instance metadata service (IMDS).

Extends:
Required members:
type : "imds"
json KvStoreUrl/s3 : object

s3:// KvStore URL scheme

AWS S3 key-value stores may be specified using the s3://bucket/path URL syntax, as supported by aws s3.

Examples

URL representation

JSON representation

"s3://my-bucket"

{"driver": "s3",
 "bucket": "my-bucket"}

"s3://bucket/path/to/dataset"

{"driver": "s3",
 "bucket": "my-bucket",
 "path": "path/to/dataset"}
Extends:
  • KvStoreUrl — URL representation of a key-value store.

Authentication

To use the s3 driver, you can access buckets that allow public access without credentials. Otherwise amazon credentials are required:

  1. Credentials may be obtained from the environment. Set the AWS_ACCESS_KEY_ID environment variable, optionally along with the AWS_SECRET_ACCESS_KEY environment variable and the AWS_SESSION_TOKEN environment variable as they would be used by the aws cli.

  2. Credentials may be obtained from the default user credentials file, when found at ~/.aws/credentials, or the file specified by the environment variable AWS_SHARED_CREDENTIALS_FILE, along with a profile from the schema, or as indicated by the AWS_PROFILE environment variables.

  3. Credentials may be retrieved from the EC2 Instance Metadata Service (IMDS) when it is available.

AWS_ACCESS_KEY_ID

Specifies an AWS access key associated with an IAM account. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_SECRET_ACCESS_KEY

Specifies the secret key associated with the access key. This is essentially the “password” for the access key. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_SESSION_TOKEN

Specifies the session token value that is required if you are using temporary security credentials that you retrieved directly from AWS STS operations. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_PROFILE

Specifies the name of the AWS CLI profile with the credentials and options to use. This can be the name of a profile stored in a credentials or config file, or the value default to use the default profile.

If defined, this environment variable overrides the behavior of using the profile named [default] in the credentials file. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_SHARED_CREDENTIALS_FILE

Specifies the location of the file that the AWS CLI uses to store access keys. The default path is ~/.aws/credentials. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_CONFIG_FILE

Specifies the location of the file that the AWS CLI uses to store config. The default path is ~/.aws/config. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_EC2_METADATA_SERVICE_ENDPOINT

Overrides the default EC2 Instance Metadata Service (IMDS) endpoint of http://169.254.169.254. This must be a valid uri, and should respond to the AWS IMDS api endpoints. See <https://docs.aws.amazon.com/sdkref/latest/guide/feature-imds-credentials.html>

TENSORSTORE_S3_REQUEST_CONCURRENCY

Specifies the concurrency level used by the shared Context Context.s3_request_concurrency resource. Defaults to 32.

TENSORSTORE_S3_USE_CONDITIONAL_WRITE

Enables conditional writes for the S3 driver. This is experimental and may be changed or removed in the future.