s3 Key-Value Store driver

The s3 driver provides access to Amazon S3 and S3-compatible object stores. Keys directly correspond to paths within an S3 bucket.

json kvstore/s3 : object

Read/write access to Amazon S3-compatible object stores.

JSON specification of the key-value store.

Extends:
  • KvStore — Key-value store specification.

Required members:
driver : "s3"
bucket : string

AWS S3 Storage bucket.

Optional members:
path : string

Key prefix within the key-value store.

If the prefix is intended to correspond to a Unix-style directory path, it should end with "/".

context : Context

Specifies context resources that augment/override the parent context.

requester_pays : boolean = false

Permit requester-pays requests.

This option must be enabled in order for any operations to succeed if the bucket has Requester Pays enabled and the supplied credentials are not for an owner of the bucket.

aws_region : string

AWS region identifier to use in signatures.

If endpoint is not specified, the region of the bucket is determined automatically.

endpoint : string

S3 server endpoint to use in place of the public Amazon S3 endpoints.

Must be an http or https URL.

Example

"http://localhost:1234"
host_header : string

Override HTTP host header to send in requests.

May only be specified in conjunction with endpoint, to send a different host than specified in endpoint. This may be useful for testing with localstack.”

Example

"mybucket.s3.af-south-1.localstack.localhost.com"
aws_credentials : ContextResource

Specifies or references a previously defined Context.aws_credentials.

s3_request_concurrency : ContextResource

Specifies or references a previously defined Context.s3_request_concurrency.

s3_request_retries : ContextResource

Specifies or references a previously defined Context.s3_request_retries.

experimental_s3_rate_limiter : ContextResource

Specifies or references a previously defined Context.experimental_s3_rate_limiter.

data_copy_concurrency : ContextResource = "data_copy_concurrency"

Specifies or references a previously defined Context.data_copy_concurrency.

json Context.s3_request_concurrency : object

Specifies a limit on the number of concurrent requests to S3.

Optional members:
limit : integer[1, +∞) | "shared" = "shared"

The maximum number of concurrent requests. If the special value of :json:”shared” is specified, a shared global limit specified by environment variable TENSORSTORE_S3_REQUEST_CONCURRENCY, which defaults to 32.

json Context.s3_request_retries : object

Specifies retry parameters for handling transient network errors. An exponential delay is added between consecutive retry attempts. The default values are appropriate for S3.

Optional members:
max_retries : integer[1, +∞) = 32

Maximum number of attempts in the case of transient errors.

initial_delay : string = "1s"

Initial backoff delay for transient errors.

max_delay : string = "32s"

Maximum backoff delay for transient errors.

json Context.experimental_s3_rate_limiter : object

Experimental rate limiter configuration for S3 reads and writes.

Optional members:
read_rate : number

The maximum rate or read and/or list calls issued per second.

write_rate : number

The maximum rate of write and/or delete calls issued per second.

doubling_time : string = "0"

The time interval over which the initial rates scale to 2x. The cases where this setting is useful depend on details to the storage buckets.

json Context.aws_credentials : object

Specifies parameters to provide AWS credentials.

Optional members:
profile : string

The profile name in the ~/.aws/credentials file, when used. Overrides the AWS_PROFILE environment variables.

filename : string

The filename containing credentials. Overrides the AWS_SHARED_CREDENTIALS_FILE environment variable.

metadata_endpoint : string

The endpoint of the metadata server. Overrides the AWS_EC2_METADATA_SERVICE_ENDPOINT environment variable.

json KvStoreUrl/s3 : object

s3:// KvStore URL scheme

AWS S3 key-value stores may be specified using the s3://bucket/path URL syntax, as supported by aws s3.

Examples

URL representation

JSON representation

"s3://my-bucket"

{"driver": "s3",
 "bucket": "my-bucket"}

"s3://bucket/path/to/dataset"

{"driver": "s3",
 "bucket": "my-bucket",
 "path": "path/to/dataset"}
Extends:
  • KvStoreUrl — URL representation of a key-value store.

Authentication

To use the s3 driver, you can access buckets that allow public access without credentials. Otherwise amazon credentials are required:

  1. Credentials may be obtained from the environment. Set the AWS_ACCESS_KEY_ID environment variable, optionally along with the AWS_SECRET_ACCESS_KEY environment variable and the AWS_SESSION_TOKEN environment variable as they would be used by the aws cli.

  2. Credentials may be obtained from the default user credentials file, when found at ~/.aws/credentials, or the file specified by the environment variable AWS_SHARED_CREDENTIALS_FILE, along with a profile from the schema, or as indicated by the AWS_PROFILE environment variables.

  3. Credentials may be retrieved from the EC2 Instance Metadata Service (IMDS) when it is available.

AWS_ACCESS_KEY_ID

Specifies an AWS access key associated with an IAM account. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_SECRET_ACCESS_KEY

Specifies the secret key associated with the access key. This is essentially the “password” for the access key. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_SESSION_TOKEN

Specifies the session token value that is required if you are using temporary security credentials that you retrieved directly from AWS STS operations. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_SHARED_CREDENTIALS_FILE

Specifies the location of the file that the AWS CLI uses to store access keys. The default path is ~/.aws/credentials. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_PROFILE

Specifies the name of the AWS CLI profile with the credentials and options to use. This can be the name of a profile stored in a credentials or config file, or the value default to use the default profile.

If defined, this environment variable overrides the behavior of using the profile named [default] in the credentials file. See <https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html>

AWS_EC2_METADATA_SERVICE_ENDPOINT

Overrides the default EC2 Instance Metadata Service (IMDS) endpoint of http://169.254.169.254. This must be a valid uri, and should respond to the AWS IMDS api endpoints. See <https://docs.aws.amazon.com/sdkref/latest/guide/feature-imds-credentials.html>

TENSORSTORE_S3_REQUEST_CONCURRENCY

Specifies the concurrency level used by the shared Context Context.s3_request_concurrency resource. Defaults to 32.