koladata

Home
Overview
Fundamentals
Glossary
Cheatsheet
API Reference
Quick Recipes
Deep Dive
Common Pitfalls and Gotchas
Persistent Storage

View the Project on GitHub google/koladata

kd.ids API

Operators that work with ItemIds.

kd.ids.agg_uuid(x, ndim=unspecified)

Aliases:

Computes aggregated uuid of elements over the last `ndim` dimensions.

Args:
  x: A DataSlice.
  ndim: The number of dimensions to aggregate over. Requires 0 <= ndim <=
    get_ndim(x).

Returns:
  DataSlice with that has `rank = rank - ndim` and shape: `shape =
  shape[:-ndim]`.

kd.ids.decode_itemid(ds)

Aliases:

Returns ItemIds decoded from the base62 strings.

kd.ids.deep_uuid(x, /, schema=unspecified, *, seed='')

Aliases:

Recursively computes uuid for x.

Args:
  x: The slice to take uuid on.
  schema: The schema to use to resolve '*' and '**' tokens. If not specified,
    will use the schema of the 'x' DataSlice.
  seed: The seed to use for uuid computation.

Returns:
  Result of recursive uuid application `x`.

kd.ids.encode_itemid(ds)

Aliases:

Returns the base62 encoded ItemIds in `ds` as strings.

kd.ids.has_uuid(x)

Returns present for each item in `x` that has an UUID.

Also see `kd.ids.is_uuid` for checking if `x` is a UUIDs DataSlice. But note
that `kd.all(kd.has_uuid(x))` is not always equivalent to `kd.is_uuid(x)`. For
example,

  kd.ids.is_uuid(kd.item(None, kd.OBJECT)) -> kd.present
  kd.all(kd.ids.has_uuid(kd.item(None, kd.OBJECT))) -> invalid for kd.all
  kd.ids.is_uuid(kd.item([None], kd.OBJECT)) -> kd.present
  kd.all(kd.ids.has_uuid(kd.item([None], kd.OBJECT))) -> kd.missing

Args:
  x: DataSlice to check.

Returns:
  A MASK DataSlice with the same shape as `x`.

kd.ids.hash_itemid(x)

Aliases:

Returns a INT64 DataSlice of hash values of `x`.

The hash values are in the range of [0, 2**63-1].

The hash algorithm is subject to change. It is not guaranteed to be stable in
future releases.

Args:
  x: DataSlice of ItemIds.

Returns:
  A DataSlice of INT64 hash values.

kd.ids.is_uuid(x)

Returns whether x is an UUID DataSlice.

Note that the operator returns `kd.present` even for missing values, as long
as their schema does not prevent containing UUIDs.

Also see `kd.ids.has_uuid` for a pointwise version. But note that
`kd.all(kd.ids.has_uuid(x))` is not always equivalent to `kd.is_uuid(x)`. For
example,

  kd.ids.is_uuid(kd.item(None, kd.OBJECT)) -> kd.present
  kd.all(kd.ids.has_uuid(kd.item(None, kd.OBJECT))) -> invalid for kd.all
  kd.ids.is_uuid(kd.item([None], kd.OBJECT)) -> kd.present
  kd.all(kd.ids.has_uuid(kd.item([None], kd.OBJECT))) -> kd.missing

Args:
  x: DataSlice to check.

Returns:
  A MASK DataItem.

kd.ids.uuid(seed='', **kwargs)

Aliases:

Creates a DataSlice whose items are Fingerprints identifying arguments.

Args:
  seed: text seed for the uuid computation.
  **kwargs: a named tuple mapping attribute names to DataSlices. The DataSlice
    values must be alignable.

Returns:
  DataSlice of Uuids. The i-th uuid is computed by taking the i-th (aligned)
  item from each kwarg value.

kd.ids.uuid_for_dict(seed='', **kwargs)

Aliases:

Creates a DataSlice whose items are Fingerprints identifying arguments.

To be used for keying dict items.

e.g.

kd.dict(['a', 'b'], [1, 2], itemid=kd.uuid_for_dict(seed='seed', a=ds(1)))

Args:
  seed: text seed for the uuid computation.
  **kwargs: a named tuple mapping attribute names to DataSlices. The DataSlice
    values must be alignable.

Returns:
  DataSlice of Uuids. The i-th uuid is computed by taking the i-th (aligned)
  item from each kwarg value.

kd.ids.uuid_for_list(seed='', **kwargs)

Aliases:

Creates a DataSlice whose items are Fingerprints identifying arguments.

To be used for keying list items.

e.g.

kd.list([1, 2, 3], itemid=kd.uuid_for_list(seed='seed', a=ds(1)))

Args:
  seed: text seed for the uuid computation.
  **kwargs: a named tuple mapping attribute names to DataSlices. The DataSlice
    values must be alignable.

Returns:
  DataSlice of Uuids. The i-th uuid is computed by taking the i-th (aligned)
  item from each kwarg value.

kd.ids.uuids_with_allocation_size(seed='', *, size)

Aliases:

Creates a DataSlice whose items are uuids.

The uuids are allocated in a single allocation. They are all distinct.
You can think of the result as a DataSlice created with:
[fingerprint(seed, size, i) for i in range(size)]

Args:
  seed: text seed for the uuid computation.
  size: the size of the allocation. It will also be used for the uuid
    computation.

Returns:
  A 1-dimensional DataSlice with `size` distinct uuids.