Tutorial¶

Reading and writing a local N5 dataset¶

Create a new N5 dataset on the local filesystem using the file Key-Value Store driver:

>>> import tensorstore as ts
>>> import numpy as np
>>> dataset = ts.open({
...     'driver': 'n5',
...     'kvstore': {
...         'driver': 'file',
...         'path': 'tmp/dataset/',
...     },
...     'metadata': {
...         'compression': {
...             'type': 'gzip'
...         },
...         'dataType': 'uint32',
...         'dimensions': [1000, 20000],
...         'blockSize': [100, 100],
...     },
...     'create': True,
...     'delete_existing': True,
... }).result()

Asynchronously write to a sub-region:

>>> write_future = dataset[80:82, 99:102].write([[1, 2, 3], [4, 5, 6]])

Wait for the write to complete using tensorstore.Future.result:

>>> write_future.result()

In an async function (or with top-level await support), await can also be used for interoperability with asyncio:

>>> await write_future

Subscript assignment can also be used to write synchronously:

>>> dataset[80:82, 99:102] = [[1, 2, 3], [4, 5, 6]]

Read back a larger region that contains the region that was written (positions not written have the fill value of 0):

>>> dataset[80:83, 99:102].read().result()
array([[1, 2, 3],
       [4, 5, 6],
       [0, 0, 0]], dtype=uint32)

Reading the Janelia FlyEM Hemibrain dataset¶

This example demonstrates accessing the Janelia FlyeEM Hemibrain 1.1 segmentation using the neuroglancer_precomputed Driver.

While this dataset is public, the gcs Key-Value Store driver currently requires that you supply Google Cloud credentials.

Open the dataset asynchronously to obtain a tensorstore.Future:

>>> import tensorstore as ts
>>> import numpy as np
>>> dataset_future = ts.open({
...     'driver':
...         'neuroglancer_precomputed',
...     'kvstore':
...         'gs://neuroglancer-janelia-flyem-hemibrain/v1.1/segmentation/',
...     # Use 100MB in-memory cache.
...     'context': {
...         'cache_pool': {
...             'total_bytes_limit': 100_000_000
...         }
...     },
...     'recheck_cached_data':
...         'open',
... })
>>> dataset_future
<tensorstore.Future object at 0x...>

Wait for the open to complete:

>>> dataset = dataset_future.result()
>>> dataset
TensorStore({
  'context': {
    'cache_pool': {'total_bytes_limit': 100000000},
    'data_copy_concurrency': {},
    'gcs_request_concurrency': {},
    'gcs_request_retries': {},
    'gcs_user_project': {},
  },
  'driver': 'neuroglancer_precomputed',
  'dtype': 'uint64',
  'kvstore': {
    'bucket': 'neuroglancer-janelia-flyem-hemibrain',
    'driver': 'gcs',
    'path': 'v1.1/segmentation/',
  },
  'multiscale_metadata': {'num_channels': 1, 'type': 'segmentation'},
  'recheck_cached_data': 'open',
  'scale_index': 0,
  'scale_metadata': {
    'chunk_size': [64, 64, 64],
    'compressed_segmentation_block_size': [8, 8, 8],
    'encoding': 'compressed_segmentation',
    'key': '8.0x8.0x8.0',
    'resolution': [8.0, 8.0, 8.0],
    'sharding': {
      '@type': 'neuroglancer_uint64_sharded_v1',
      'data_encoding': 'gzip',
      'hash': 'identity',
      'minishard_bits': 6,
      'minishard_index_encoding': 'gzip',
      'preshift_bits': 9,
      'shard_bits': 15,
    },
    'size': [34432, 39552, 41408],
    'voxel_offset': [0, 0, 0],
  },
  'transform': {
    'input_exclusive_max': [34432, 39552, 41408, 1],
    'input_inclusive_min': [0, 0, 0, 0],
    'input_labels': ['x', 'y', 'z', 'channel'],
  },
})

In an async function, a tensorstore.Future is also compatible with await.

>>> dataset = await dataset_future

>>> dataset.domain
{ "x": [0, 34432), "y": [0, 39552), "z": [0, 41408), "channel": [0, 1) }

There is only a single channel, so create a 3-d view without the 'channel' dimension:

>>> dataset_3d = dataset[ts.d['channel'][0]]
>>> dataset_3d.domain
{ "x": [0, 34432), "y": [0, 39552), "z": [0, 41408) }

Create a view of a 100x100x1 slice from the middle, without performing any I/O:

>>> x = dataset_3d[15000:15100, 15000:15100, 20000]
>>> x
TensorStore({
  'context': {
    'cache_pool': {'total_bytes_limit': 100000000},
    'data_copy_concurrency': {},
    'gcs_request_concurrency': {},
    'gcs_request_retries': {},
    'gcs_user_project': {},
  },
  'driver': 'neuroglancer_precomputed',
  'dtype': 'uint64',
  'kvstore': {
    'bucket': 'neuroglancer-janelia-flyem-hemibrain',
    'driver': 'gcs',
    'path': 'v1.1/segmentation/',
  },
  'multiscale_metadata': {'num_channels': 1, 'type': 'segmentation'},
  'recheck_cached_data': 'open',
  'scale_index': 0,
  'scale_metadata': {
    'chunk_size': [64, 64, 64],
    'compressed_segmentation_block_size': [8, 8, 8],
    'encoding': 'compressed_segmentation',
    'key': '8.0x8.0x8.0',
    'resolution': [8.0, 8.0, 8.0],
    'sharding': {
      '@type': 'neuroglancer_uint64_sharded_v1',
      'data_encoding': 'gzip',
      'hash': 'identity',
      'minishard_bits': 6,
      'minishard_index_encoding': 'gzip',
      'preshift_bits': 9,
      'shard_bits': 15,
    },
    'size': [34432, 39552, 41408],
    'voxel_offset': [0, 0, 0],
  },
  'transform': {
    'input_exclusive_max': [15100, 15100],
    'input_inclusive_min': [15000, 15000],
    'input_labels': ['x', 'y'],
    'output': [
      {'input_dimension': 0},
      {'input_dimension': 1},
      {'offset': 20000},
      {},
    ],
  },
})
>>> x.domain
{ "x": [15000, 15100), "y": [15000, 15100) }

Read the slice asynchronously using the tensorstore.TensorStore.read method to obtain a tensorstore.Future:

>>> read_future = x.read()

Wait for the read to complete:

>>> read_future.result()
array([[1194100437, 1194100437, 1194100437, ..., 1408314276, 1408314276,
        1408314276],
       [1194100437, 1194100437, 1194100437, ..., 1408314276, 1408314276,
        1408314276],
       [1194100437, 1194100437, 1194100437, ..., 1161117856, 1161117856,
        1161117856],
       ...,
       [1132030694, 1132030694, 1132030694, ..., 5813054053, 5813054053,
        5813054053],
       [1132030694, 1132030694, 1132030694, ..., 5813054053, 5813054053,
        5813054053],
       [1132030694, 1132030694, 1132030694, ..., 5813054053, 5813054053,
        5813054053]], shape=(100, 100), dtype=uint64)

Conversion to an numpy.ndarray also implicitly performs a synchronous read (which hits the in-memory cache since the same region was just retrieved):

>>> np.array(dataset_3d[15000:15100, 15000:15100, 20000])
array([[1194100437, 1194100437, 1194100437, ..., 1408314276, 1408314276,
        1408314276],
       [1194100437, 1194100437, 1194100437, ..., 1408314276, 1408314276,
        1408314276],
       [1194100437, 1194100437, 1194100437, ..., 1161117856, 1161117856,
        1161117856],
       ...,
       [1132030694, 1132030694, 1132030694, ..., 5813054053, 5813054053,
        5813054053],
       [1132030694, 1132030694, 1132030694, ..., 5813054053, 5813054053,
        5813054053],
       [1132030694, 1132030694, 1132030694, ..., 5813054053, 5813054053,
        5813054053]], shape=(100, 100), dtype=uint64)