Home
Overview
Fundamentals
Glossary
Cheatsheet
API Reference
Quick Recipes
Deep Dive
Common Pitfalls and Gotchas
Persistent Storage
Base class of all Arolla values in Python.
QValue is immutable. It provides only basic functionality.
Subclasses of this class might have further specialization.
DataSlice.LListSlicing helper for DataSlice.
x.L on DataSlice returns a ListSlicingHelper, which treats the first dimension
of DataSlice x as a a list.
DataSlice.SSlicing helper for DataSlice.
It is a syntactic sugar for kd.subslice. That is, kd.subslice(ds, *slices)
is equivalent to ds.S[*slices]. For example,
kd.subslice(x, 0) == x.S[0]
kd.subslice(x, 0, 1, kd.item(0)) == x.S[0, 1, kd.item(0)]
kd.subslice(x, slice(0, -1)) == x.S[0:-1]
kd.subslice(x, slice(0, -1), slice(0, 1), slice(1, None))
== x.S[0:-1, 0:1, 1:]
kd.subslice(x, ..., slice(1, None)) == x.S[..., 1:]
kd.subslice(x, slice(1, None)) == x.S[1:]
Please see kd.subslice for more detailed explanations and examples.
DataSlice.append(value, /)Append a value to each list in this DataSlice
DataSlice.clear()Clears all dicts or lists in this DataSlice
DataSlice.clone(self, *, itemid: Any = unspecified, schema: Any = unspecified, **overrides: Any) -> DataSliceCreates a DataSlice with clones of provided entities in a new DataBag.
The entities themselves are cloned (with new ItemIds) and their attributes are
extracted (with the same ItemIds).
Also see kd.shallow_clone and kd.deep_clone.
Note that unlike kd.deep_clone, if there are multiple references to the same
entity, the returned DataSlice will have multiple clones of it rather than
references to the same clone.
Args:
x: The DataSlice to copy.
itemid: The ItemId to assign to cloned entities. If not specified, new
ItemIds will be allocated.
schema: The schema to resolve attributes, and also to assign the schema to
the resulting DataSlice. If not specified, will use the schema of `x`.
**overrides: attribute overrides.
Returns:
A copy of the entities where entities themselves are cloned (new ItemIds)
and all of the rest extracted.
DataSlice.clone_as_full(self, *, itemid: Any = unspecified, **overrides: Any) -> DataSliceClones the DataSlice, filling missing items with new empty entities.
Equivalent to:
x = x | kd.new_shaped(x.get_shape(), schema=x.get_schema())
return kd.clone(x, itemid=itemid, **overrides)
This operator can be used to fill missing items in a DataSlice attribute. For
example, consider the following two snippets:
x.updated(kd.attrs(x.maybe_missing_attr,
must_be_present_attr=...
))
x.updated(kd.attrs(x,
maybe_missing_attr=kd.clone_as_full(
x.maybe_missing_attr,
must_be_present_attr=...
),
))
In the first snippet the values of `must_be_present_attr` will be skipped when
`maybe_missing_attr` is missing. In the second snippet the whole
`maybe_missing_attr` will be overwritten as full (keeping all the pre-existing
attributes) and so all the values of `must_be_present_attr` will be preserved.
Args:
x: The DataSlice to copy. It must have an entity schema.
itemid: The ItemId to assign to cloned entities. If not specified, new
ItemIds will be allocated.
**overrides: attribute overrides.
Returns:
A copy of the entities where entities themselves are cloned (new ItemIds)
and all of the rest extracted. Missing items in `x` are replaced by new
empty entities.
DataSlice.deep_clone(self, schema: Any = unspecified, **overrides: Any) -> DataSliceCreates a slice with a (deep) copy of the given slice.
The entities themselves and all their attributes including both top-level and
non-top-level attributes are cloned (with new ItemIds).
Also see kd.shallow_clone and kd.clone.
Note that unlike kd.clone, if there are multiple references to the same entity
in `x`, or multiple ways to reach one entity through attributes, there will be
exactly one clone made per entity.
Args:
x: The slice to copy.
schema: The schema to use to find attributes to clone, and also to assign
the schema to the resulting DataSlice. If not specified, will use the
schema of 'x'.
**overrides: attribute overrides.
Returns:
A (deep) copy of the given DataSlice.
All referenced entities will be copied with newly allocated ItemIds. Note
that UUIDs will be copied as ItemIds.
DataSlice.deep_uuid(self, schema: Any = unspecified, *, seed: str | DataSlice = '') -> DataSliceRecursively computes uuid for x.
Args:
x: The slice to take uuid on.
schema: The schema to use to resolve '*' and '**' tokens. If not specified,
will use the schema of the 'x' DataSlice.
seed: The seed to use for uuid computation.
Returns:
Result of recursive uuid application `x`.
DataSlice.dict_size(self) -> DataSliceReturns size of a Dict.
DataSlice.display(self: DataSlice, options: Any | None = None) -> NoneVisualizes a DataSlice as an html widget.
Args:
self: The DataSlice to visualize.
options: This should be a `koladata.ext.vis.DataSliceVisOptions`.
DataSlice.embed_schema()Returns a DataSlice with OBJECT schema.
* For primitives no data change is done.
* For Entities schema is stored as '__schema__' attribute.
* Embedding Entities requires a DataSlice to be associated with a DataBag.
DataSlice.enriched(self, *bag: DataBag) -> DataSliceReturns a copy of a DataSlice with a additional fallback DataBag(s).
Values in the original DataBag of `ds` take precedence over the ones in
`*bag`. The original DataBag (if present) must be immutable. If the original
DataBag is mutable, either freeze `ds` first, or add updates inplace using
mutable API.
The DataBag attached to the result is a new immutable DataBag that falls back
to the DataBag of `ds` if present and then to `*bag`.
`enriched(x, a, b)` is equivalent to `enriched(enriched(x, a), b)`, and so on
for additional DataBag args.
Args:
ds: DataSlice.
*bag: additional fallback DataBag(s).
Returns:
DataSlice with additional fallbacks.
DataSlice.expand_to(self, target: Any, ndim: Any = unspecified) -> DataSliceExpands `x` based on the shape of `target`.
When `ndim` is not set, expands `x` to the shape of
`target`. The dimensions of `x` must be the same as the first N
dimensions of `target` where N is the number of dimensions of `x`. For
example,
Example 1:
x: kd.slice([[1, 2], [3]])
target: kd.slice([[[0], [0, 0]], [[0, 0, 0]]])
result: kd.slice([[[1], [2, 2]], [[3, 3, 3]]])
Example 2:
x: kd.slice([[1, 2], [3]])
target: kd.slice([[[0]], [[0, 0, 0]]])
result: incompatible shapes
Example 3:
x: kd.slice([[1, 2], [3]])
target: kd.slice([0, 0])
result: incompatible shapes
When `ndim` is set, the expansion is performed in 3 steps:
1) the last N dimensions of `x` are first imploded into lists
2) the expansion operation is performed on the DataSlice of lists
3) the lists in the expanded DataSlice are exploded
The result will have M + ndim dimensions where M is the number
of dimensions of `target`.
For example,
Example 4:
x: kd.slice([[1, 2], [3]])
target: kd.slice([[1], [2, 3]])
ndim: 1
result: kd.slice([[[1, 2]], [[3], [3]]])
Example 5:
x: kd.slice([[1, 2], [3]])
target: kd.slice([[1], [2, 3]])
ndim: 2
result: kd.slice([[[[1, 2], [3]]], [[[1, 2], [3]], [[1, 2], [3]]]])
Args:
x: DataSlice to expand.
target: target DataSlice.
ndim: the number of dimensions to implode during expansion.
Returns:
Expanded DataSlice
DataSlice.explode(self, ndim: int | DataSlice = 1) -> DataSliceExplodes a List DataSlice `x` a specified number of times.
A single list "explosion" converts a rank-K DataSlice of LIST[T] to a
rank-(K+1) DataSlice of T, by unpacking the items in the Lists in the original
DataSlice as a new DataSlice dimension in the result. Missing values in the
original DataSlice are treated as empty lists.
A single list explosion can also be done with `x[:]`.
If `ndim` is set to a non-negative integer, explodes recursively `ndim` times.
An `ndim` of zero is a no-op.
If `ndim` is set to a negative integer, explodes as many times as possible,
until at least one of the items of the resulting DataSlice is not a List.
Args:
x: DataSlice of Lists to explode
ndim: the number of explosion operations to perform, defaults to 1
Returns:
DataSlice
DataSlice.extract(self, schema: Any = unspecified) -> DataSliceCreates a DataSlice with a new DataBag containing only reachable attrs.
Args:
ds: DataSlice to extract.
schema: schema of the extracted DataSlice.
Returns:
A DataSlice with a new immutable DataBag attached.
DataSlice.extract_update(self, schema: Any = unspecified) -> DataBagCreates a new DataBag containing only reachable attrs from 'ds'.
Args:
ds: DataSlice to extract.
schema: schema of the extracted DataSlice.
Returns:
A new immutable DataBag with only the reachable attrs from 'ds'.
DataSlice.flatten(self, from_dim: int | DataSlice = 0, to_dim: Any = unspecified) -> DataSliceReturns `x` with dimensions `[from_dim:to_dim]` flattened.
Indexing works as in python:
* If `to_dim` is unspecified, `to_dim = rank()` is used.
* If `to_dim < from_dim`, `to_dim = from_dim` is used.
* If `to_dim < 0`, `max(0, to_dim + rank())` is used. The same goes for
`from_dim`.
* If `to_dim > rank()`, `rank()` is used. The same goes for `from_dim`.
The above-mentioned adjustments places both `from_dim` and `to_dim` in the
range `[0, rank()]`. After adjustments, the new DataSlice has `rank() ==
old_rank - (to_dim - from_dim) + 1`. Note that if `from_dim == to_dim`, a
"unit" dimension is inserted at `from_dim`.
Example:
# Flatten the last two dimensions into a single dimension, producing a
# DataSlice with `rank = old_rank - 1`.
kd.get_shape(x) # -> JaggedShape(..., [2, 1], [7, 5, 3])
flat_x = kd.flatten(x, -2)
kd.get_shape(flat_x) # -> JaggedShape(..., [12, 3])
# Flatten all dimensions except the last, producing a DataSlice with
# `rank = 2`.
kd.get_shape(x) # -> jaggedShape(..., [7, 5, 3])
flat_x = kd.flatten(x, 0, -1)
kd.get_shape(flat_x) # -> JaggedShape([3], [7, 5, 3])
# Flatten all dimensions.
kd.get_shape(x) # -> JaggedShape([3], [7, 5, 3])
flat_x = kd.flatten(x)
kd.get_shape(flat_x) # -> JaggedShape([15])
Args:
x: a DataSlice.
from_dim: start of dimensions to flatten. Defaults to `0` if unspecified.
to_dim: end of dimensions to flatten. Defaults to `rank()` if unspecified.
DataSlice.flatten_end(self, n_times: int | DataSlice = 1) -> DataSliceReturns `x` with a shape flattened `n_times` from the end.
The new shape has x.get_ndim() - n_times dimensions.
Given that flattening happens from the end, only positive integers are
allowed. For more control over flattening, please use `kd.flatten`, instead.
Args:
x: a DataSlice.
n_times: number of dimensions to flatten from the end
(0 <= n_times <= rank).
DataSlice.follow(self) -> DataSliceReturns the original DataSlice from a NoFollow DataSlice.
When a DataSlice is wrapped into a NoFollow DataSlice, it's attributes
are not further traversed during extract, clone, deep_clone, etc.
`kd.follow` operator inverses the DataSlice back to a traversable DataSlice.
Inverse of `nofollow`.
Args:
x: DataSlice to unwrap, if nofollowed.
DataSlice.fork_bag(self) -> DataSliceReturns a copy of the DataSlice with a forked mutable DataBag.
DataSlice.freeze_bag()Returns a frozen DataSlice equivalent to `self`.
DataSlice.from_vals(x, /, schema=None)Aliases:
Returns a DataSlice created from `x`.
If `schema` is set, that schema is used, otherwise the schema is inferred from
`x`.
Args:
x: a Python value or a DataSlice. If it is a (nested) Python list or tuple,
a multidimensional DataSlice is created.
schema: schema DataItem to set. If `x` is already a DataSlice, this will
cast it to the given schema.
DataSlice.get_attr(attr_name, /, default=None)Gets attribute `attr_name` where missing items are filled from `default`.
Args:
attr_name: name of the attribute to get.
default: optional default value to fill missing items.
Note that this value can be fully omitted.
DataSlice.get_attr_names(self)Returns a DataSlice with sorted attribute names for each item in `x`.
The result has a new dimension with the attribute names.
In case of OBJECT schema, attribute names are fetched from the `__schema__`
attribute. In case of Entity schema, the attribute names are fetched from the
schema. In case of primitives, an empty slice is returned.
Args:
x: A DataSlice.
DataSlice.get_bag()Returns the attached DataBag.
DataSlice.get_dtype(self) -> DataSliceReturns a primitive schema representing the underlying items' dtype.
If `ds` has a primitive schema, this returns that primitive schema, even if
all items in `ds` are missing. If `ds` has an OBJECT schema but contains
primitive values of a single dtype, it returns the schema for that primitive
dtype.
In case of items in `ds` have non-primitive types or mixed dtypes, returns
a missing schema (i.e. `kd.item(None, kd.SCHEMA)`).
Examples:
kd.get_primitive_schema(kd.slice([1, 2, 3])) -> kd.INT32
kd.get_primitive_schema(kd.slice([None, None, None], kd.INT32)) -> kd.INT32
kd.get_primitive_schema(kd.slice([1, 2, 3], kd.OBJECT)) -> kd.INT32
kd.get_primitive_schema(kd.slice([1, 'a', 3], kd.OBJECT)) -> missing schema
kd.get_primitive_schema(kd.obj())) -> missing schema
Args:
ds: DataSlice to get dtype from.
Returns:
a primitive schema DataSlice.
DataSlice.get_itemid(self) -> DataSliceCasts `x` to ITEMID using explicit (permissive) casting rules.
DataSlice.get_keys()Returns keys of all dicts in this DataSlice.
DataSlice.get_ndim(self) -> DataSliceReturns the number of dimensions of DataSlice `x`.
DataSlice.get_obj_schema(self) -> DataSliceReturns a DataSlice of schemas for Objects and primitives in `x`.
DataSlice `x` must have OBJECT schema.
Examples:
db = kd.bag()
s = db.new_schema(a=kd.INT32)
obj = s(a=1).embed_schema()
kd.get_obj_schema(kd.slice([1, None, 2.0, obj]))
-> kd.slice([kd.INT32, NONE, kd.FLOAT32, s])
Args:
x: OBJECT DataSlice
Returns:
A DataSlice of schemas.
DataSlice.get_present_count(self) -> DataSliceReturns the count of present items over all dimensions.
The result is a zero-dimensional DataItem.
Args:
x: A DataSlice of numbers.
DataSlice.get_schema()Returns a schema DataItem with type information about this DataSlice.
DataSlice.get_shape()Returns the shape of the DataSlice.
DataSlice.get_size(self) -> DataSliceReturns the number of items in `x`, including missing items.
Args:
x: A DataSlice.
Returns:
The size of `x`.
DataSlice.get_sizes(self) -> DataSliceReturns a DataSlice of sizes of the DataSlice's shape.
DataSlice.get_values(self, key_ds: Any = unspecified) -> DataSliceReturns values corresponding to `key_ds` for dicts in `dict_ds`.
When `key_ds` is specified, it is equivalent to dict_ds[key_ds].
When `key_ds` is unspecified, it returns all values in `dict_ds`. The result
DataSlice has one more dimension used to represent values in each dict than
`dict_ds`. While the order of values within a dict is arbitrary, it is the
same as get_keys().
Args:
dict_ds: DataSlice of Dicts.
key_ds: DataSlice of keys or unspecified.
Returns:
A DataSlice of values.
DataSlice.has_attr(self, attr_name: str) -> DataSliceIndicates whether the items in `x` DataSlice have the given attribute.
This function checks for attributes based on data rather than "schema" and may
be slow in some cases.
Args:
x: DataSlice
attr_name: Name of the attribute to check.
Returns:
A MASK DataSlice with the same shape as `x` that contains present if the
attribute exists for the corresponding item.
DataSlice.has_bag()Returns `present` if DataSlice `ds` has a DataBag attached.
DataSlice.implode(self, ndim: int | DataSlice = 1, itemid: Any = unspecified) -> DataSliceImplodes a Dataslice `x` a specified number of times.
A single list "implosion" converts a rank-(K+1) DataSlice of T to a rank-K
DataSlice of LIST[T], by folding the items in the last dimension of the
original DataSlice into newly-created Lists.
If `ndim` is set to a non-negative integer, implodes recursively `ndim` times.
If `ndim` is set to a negative integer, implodes as many times as possible,
until the result is a DataItem (i.e. a rank-0 DataSlice) containing a single
nested List.
Args:
x: the DataSlice to implode
ndim: the number of implosion operations to perform
itemid: optional ITEMID DataSlice used as ItemIds of the resulting lists.
Returns:
DataSlice of nested Lists
DataSlice.internal_as_arolla_value()Converts primitive DataSlice / DataItem into an equivalent Arolla value.
DataSlice.internal_as_dense_array()Converts primitive DataSlice to an Arolla DenseArray with appropriate qtype.
DataSlice.internal_as_py()Returns a Python object equivalent to this DataSlice.
If the values in this DataSlice represent objects, then the returned python
structure will contain DataItems.
DataSlice.internal_is_itemid_schema()Returns present iff this DataSlice is ITEMID Schema.
DataSlice.is_dict()Returns present iff this DataSlice has Dict schema or contains only dicts.
DataSlice.is_dict_schema()Returns present iff this DataSlice is a Dict Schema.
DataSlice.is_empty()Returns present iff this DataSlice is empty.
DataSlice.is_entity()Returns present iff this DataSlice has Entity schema or contains only entities.
DataSlice.is_entity_schema()Returns present iff this DataSlice represents an Entity Schema.
DataSlice.is_list()Returns present iff this DataSlice has List schema or contains only lists.
DataSlice.is_list_schema()Returns present iff this DataSlice is a List Schema.
DataSlice.is_mutable()Returns present iff the attached DataBag is mutable.
DataSlice.is_primitive(self) -> DataSliceReturns whether x is a primitive DataSlice.
`x` is a primitive DataSlice if it meets one of the following conditions:
1) it has a primitive schema
2) it has OBJECT/SCHEMA/NONE schema and only has primitives
Also see `kd.has_primitive` for a pointwise version. But note that
`kd.all(kd.has_primitive(x))` is not always equivalent to
`kd.is_primitive(x)`. For example,
kd.is_primitive(kd.int32(None)) -> kd.present
kd.all(kd.has_primitive(kd.int32(None))) -> invalid for kd.all
kd.is_primitive(kd.int32([None])) -> kd.present
kd.all(kd.has_primitive(kd.int32([None]))) -> kd.missing
Args:
x: DataSlice to check.
Returns:
A MASK DataItem.
DataSlice.is_primitive_schema()Returns present iff this DataSlice is a primitive (scalar) Schema.
DataSlice.is_struct_schema()Returns present iff this DataSlice represents a Struct Schema.
DataSlice.list_size(self) -> DataSliceReturns size of a List.
DataSlice.maybe(self, attr_name: str) -> DataSliceA shortcut for kd.get_attr(x, attr_name, default=None).
DataSlice.new(self, **attrs)Returns a new Entity with this Schema.
DataSlice.no_bag()Returns a copy of DataSlice without DataBag.
DataSlice.pop(index, /)Pop a value from each list in this DataSlice
DataSlice.ref(self) -> DataSliceReturns `ds` with the DataBag removed.
Unlike `no_bag`, `ds` is required to hold ItemIds and no primitives are
allowed.
The result DataSlice still has the original schema. If the schema is an Entity
schema (including List/Dict schema), it is treated an ItemId after the DataBag
is removed.
Args:
ds: DataSlice of ItemIds.
DataSlice.repeat(self, sizes: Any) -> DataSliceReturns `x` with values repeated according to `sizes`.
The resulting DataSlice has `rank = rank + 1`. The input `sizes` are
broadcasted to `x`, and each value is repeated the given number of times.
Example:
ds = kd.slice([[1, None], [3]])
sizes = kd.slice([[1, 2], [3]])
kd.repeat(ds, sizes) # -> kd.slice([[[1], [None, None]], [[3, 3, 3]]])
ds = kd.slice([[1, None], [3]])
sizes = kd.slice([2, 3])
kd.repeat(ds, sizes) # -> kd.slice([[[1, 1], [None, None]], [[3, 3, 3]]])
ds = kd.slice([[1, None], [3]])
size = kd.item(2)
kd.repeat(ds, size) # -> kd.slice([[[1, 1], [None, None]], [[3, 3]]])
Args:
x: A DataSlice of data.
sizes: A DataSlice of sizes that each value in `x` should be repeated for.
DataSlice.reshape(self, shape: JaggedShape) -> DataSliceReturns a DataSlice with the provided shape.
Examples:
x = kd.slice([1, 2, 3, 4])
# Using a shape.
kd.reshape(x, kd.shapes.new(2, 2)) # -> kd.slice([[1, 2], [3, 4]])
# Using a tuple of sizes.
kd.reshape(x, kd.tuple(2, 2)) # -> kd.slice([[1, 2], [3, 4]])
# Using a tuple of sizes and a placeholder dimension.
kd.reshape(x, kd.tuple(-1, 2)) # -> kd.slice([[1, 2], [3, 4]])
# Using a tuple of sizes and a placeholder dimension.
kd.reshape(x, kd.tuple(-1, 2)) # -> kd.slice([[1, 2], [3, 4]])
# Using a tuple of slices and a placeholder dimension.
kd.reshape(x, kd.tuple(-1, kd.slice([3, 1])))
# -> kd.slice([[1, 2, 3], [4]])
# Reshaping a scalar.
kd.reshape(1, kd.tuple(1, 1)) # -> kd.slice([[1]])
# Reshaping an empty slice.
kd.reshape(kd.slice([]), kd.tuple(2, 0)) # -> kd.slice([[], []])
Args:
x: a DataSlice.
shape: a JaggedShape or a tuple of dimensions that forms a shape through
`kd.shapes.new`, with additional support for a `-1` placeholder dimension.
DataSlice.reshape_as(self, shape_from: DataSlice) -> DataSliceReturns a DataSlice x reshaped to the shape of DataSlice shape_from.
DataSlice.select(self, fltr: Any, expand_filter: bool | DataSlice = True) -> DataSliceCreates a new DataSlice by filtering out missing items in fltr.
It is not supported for DataItems because their sizes are always 1.
The dimensions of `fltr` needs to be compatible with the dimensions of `ds`.
By default, `fltr` is expanded to 'ds' and items in `ds` corresponding
missing items in `fltr` are removed. The last dimension of the resulting
DataSlice is changed while the first N-1 dimensions are the same as those in
`ds`.
Example:
val = kd.slice([[1, None, 4], [None], [2, 8]])
kd.select(val, val > 3) -> [[4], [], [8]]
fltr = kd.slice(
[[None, kd.present, kd.present], [kd.present], [kd.present, None]])
kd.select(val, fltr) -> [[None, 4], [None], [2]]
fltr = kd.slice([kd.present, kd.present, None])
kd.select(val, fltr) -> [[1, None, 4], [None], []]
kd.select(val, fltr, expand_filter=False) -> [[1, None, 4], [None]]
Args:
ds: DataSlice with ndim > 0 to be filtered.
fltr: filter DataSlice with dtype as kd.MASK. It can also be a Koda Functor
or a Python function which can be evalauted to such DataSlice. A Python
function will be traced for evaluation, so it cannot have Python control
flow operations such as `if` or `while`.
expand_filter: flag indicating if the 'filter' should be expanded to 'ds'
Returns:
Filtered DataSlice.
DataSlice.select_items(self, fltr: Any) -> DataSliceSelects List items by filtering out missing items in fltr.
Also see kd.select.
Args:
ds: List DataSlice to be filtered
fltr: filter can be a DataSlice with dtype as kd.MASK. It can also be a Koda
Functor or a Python function which can be evalauted to such DataSlice. A
Python function will be traced for evaluation, so it cannot have Python
control flow operations such as `if` or `while`.
Returns:
Filtered DataSlice.
DataSlice.select_keys(self, fltr: Any) -> DataSliceSelects Dict keys by filtering out missing items in `fltr`.
Also see kd.select.
Args:
ds: Dict DataSlice to be filtered
fltr: filter DataSlice with dtype as kd.MASK or a Koda Functor or a Python
function which can be evalauted to such DataSlice. A Python function will
be traced for evaluation, so it cannot have Python control flow operations
such as `if` or `while`.
Returns:
Filtered DataSlice.
DataSlice.select_present(self) -> DataSliceCreates a new DataSlice by removing missing items.
It is not supported for DataItems because their sizes are always 1.
Example:
val = kd.slice([[1, None, 4], [None], [2, 8]])
kd.select_present(val) -> [[1, 4], [], [2, 8]]
Args:
ds: DataSlice with ndim > 0 to be filtered.
Returns:
Filtered DataSlice.
DataSlice.select_values(self, fltr: Any) -> DataSliceSelects Dict values by filtering out missing items in `fltr`.
Also see kd.select.
Args:
ds: Dict DataSlice to be filtered
fltr: filter DataSlice with dtype as kd.MASK or a Koda Functor or a Python
function which can be evalauted to such DataSlice. A Python function will
be traced for evaluation, so it cannot have Python control flow operations
such as `if` or `while`.
Returns:
Filtered DataSlice.
DataSlice.set_attr(attr_name, value, /, overwrite_schema=False)Sets an attribute `attr_name` to `value`.
Requires DataSlice to have a mutable DataBag attached. Compared to
`__setattr__`, it allows overwriting the schema for attribute `attr_name` when
`overwrite_schema` is True. Additionally, it allows `attr_name` to be a
non-Python-identifier (e.g. "123-f", "5", "%#$", etc.). `attr_name` still has to
be a valid UTF-8 unicode.
Args:
attr_name: UTF-8 unicode representing the attribute name.
value: new value for attribute `attr_name`.
overwrite_schema: if True, schema for attribute is always updated.
DataSlice.set_attrs(*, overwrite_schema=False, **attrs)Sets multiple attributes on an object / entity.
Args:
overwrite_schema: (bool) overwrite schema if attribute schema is missing or
incompatible.
**attrs: attribute values that are converted to DataSlices with DataBag
adoption.
DataSlice.set_schema(schema, /)Returns a copy of DataSlice with the provided `schema`.
If `schema` has a different DataBag than the DataSlice, `schema` is merged into
the DataBag of the DataSlice. See kd.set_schema for more details.
Args:
schema: schema DataSlice to set.
Returns:
DataSlice with the provided `schema`.
DataSlice.shallow_clone(self, *, itemid: Any = unspecified, schema: Any = unspecified, **overrides: Any) -> DataSliceCreates a DataSlice with shallow clones of immediate attributes.
The entities themselves get new ItemIds and their top-level attributes are
copied by reference.
Also see kd.clone and kd.deep_clone.
Note that unlike kd.deep_clone, if there are multiple references to the same
entity, the returned DataSlice will have multiple clones of it rather than
references to the same clone.
Args:
x: The DataSlice to copy.{SELF}
itemid: The ItemId to assign to cloned entities. If not specified, will
allocate new ItemIds.
schema: The schema to resolve attributes, and also to assign the schema to
the resulting DataSlice. If not specified, will use the schema of 'x'.
**overrides: attribute overrides.
Returns:
A copy of the entities with new ItemIds where all top-level attributes are
copied by reference.
DataSlice.strict_with_attrs(self, **attrs) -> DataSliceReturns a DataSlice with a new DataBag containing updated attrs in `x`.
Strict version of kd.attrs disallowing adding new attributes.
Args:
x: Entity for which the attributes update is being created.
**attrs: attrs to set in the update.
DataSlice.stub(self, attrs: DataSlice = []) -> DataSliceCopies a DataSlice's schema stub to a new DataBag.
The "schema stub" of a DataSlice is a subset of its schema (including embedded
schemas) that contains just enough information to support direct updates to
that DataSlice.
Optionally copies `attrs` schema attributes to the new DataBag as well.
This method works for items, objects, and for lists and dicts stored as items
or objects. The intended usage is to add new attributes to the object in the
new bag, or new items to the dict in the new bag, and then to be able
to merge the bags to obtain a union of attributes/values. For lists, we
extract the list with stubs for list items, which also works recursively so
nested lists are deep-extracted. Note that if you modify the list afterwards
by appending or removing items, you will no longer be able to merge the result
with the original bag.
Args:
x: DataSlice to extract the schema stub from.
attrs: Optional list of additional schema attribute names to copy. The
schemas for those attributes will be copied recursively (so including
attributes of those attributes etc).
Returns:
DataSlice with the same schema stub in the new DataBag.
DataSlice.take(self, indices: Any) -> DataSliceReturns a new DataSlice with items at provided indices.
`indices` must have INT32 or INT64 dtype or OBJECT schema holding INT32 or
INT64 items.
Indices in the DataSlice `indices` are based on the last dimension of the
DataSlice `x`. Negative indices are supported and out-of-bound indices result
in missing items.
If ndim(x) - 1 > ndim(indices), indices are broadcasted to shape(x)[:-1].
If ndim(x) <= ndim(indices), indices are unchanged but shape(x)[:-1] must be
broadcastable to shape(indices).
Example:
x = kd.slice([[1, None, 2], [3, 4]])
kd.take(x, kd.item(1)) # -> kd.slice([[None, 4]])
kd.take(x, kd.slice([0, 1])) # -> kd.slice([1, 4])
kd.take(x, kd.slice([[0, 1], [1]])) # -> kd.slice([[1, None], [4]])
kd.take(x, kd.slice([[[0, 1], []], [[1], [0]]]))
# -> kd.slice([[[1, None]], []], [[4], [3]]])
kd.take(x, kd.slice([3, -3])) # -> kd.slice([None, None])
kd.take(x, kd.slice([-1, -2])) # -> kd.slice([2, 3])
kd.take(x, kd.slice('1')) # -> dtype mismatch error
kd.take(x, kd.slice([1, 2, 3])) -> incompatible shape
Args:
x: DataSlice to be indexed
indices: indices used to select items
Returns:
A new DataSlice with items selected by indices.
DataSlice.to_py(ds: DataSlice, *, max_depth: int = 2, obj_as_dict: bool = False, include_missing_attrs: bool = True, output_class: Any | None = None) -> AnyReturns a readable python object from a DataSlice.
Attributes, lists, and dicts are recursively converted to Python objects.
Args:
ds: A DataSlice
max_depth: Maximum depth for recursive printing. Each attribute, list, and
dict increments the depth by 1. Use -1 for unlimited depth. If
output_class is set, this is ignored and the depth is determined by the
output_class.
obj_as_dict: Whether to convert objects to python dicts. By default objects
are converted to automatically constructed 'Obj' dataclass instances.
include_missing_attrs: whether to include attributes with None value in
objects.
output_class: If not None, will be used recursively as the output type.
DataSlice.to_pytree(ds: DataSlice, max_depth: int = 2, include_missing_attrs: bool = True) -> AnyReturns a readable python object from a DataSlice.
Attributes, lists, and dicts are recursively converted to Python objects.
Objects are converted to Python dicts.
Same as kd.to_py(..., obj_as_dict=True)
Args:
ds: A DataSlice
max_depth: Maximum depth for recursive printing. Each attribute, list, and
dict increments the depth by 1. Use -1 for unlimited depth.
include_missing_attrs: whether to include attributes with None value in
objects.
DataSlice.updated(self, *bag: DataBag) -> DataSliceReturns a copy of a DataSlice with DataBag(s) of updates applied.
Values in `*bag` take precedence over the ones in the original DataBag of
`ds`. The original DataBag (if present) must be immutable. If the original
DataBag is mutable, either freeze `ds` first, or add updates inplace using
mutable API.
The DataBag attached to the result is a new immutable DataBag that falls back
to the DataBag of `ds` if present and then to `*bag`.
`updated(x, a, b)` is equivalent to `updated(updated(x, b), a)`, and so on
for additional DataBag args.
Args:
ds: DataSlice.
*bag: DataBag(s) of updates.
Returns:
DataSlice with additional fallbacks.
DataSlice.with_attr(self, attr_name: str | DataSlice, value: Any, overwrite_schema: bool | DataSlice = False) -> DataSliceReturns a DataSlice with a new DataBag containing a single updated attribute.
This operator is useful if attr_name cannot be used as a key in keyword
arguments. E.g.: "123-f", "5", "%#$", etc. It still has to be a valid utf-8
unicode.
See kd.with_attrs docstring for more details on the rules and regarding
`overwrite` argument.
Args:
x: Entity / Object for which the attribute update is being created.
attr_name: utf-8 unicode representing the attribute name.
value: new value for attribute `attr_name`.
overwrite_schema: if True, schema for attribute is always updated.
DataSlice.with_attrs(self, *, overwrite_schema: bool | DataSlice = False, **attrs) -> DataSliceReturns a DataSlice with a new DataBag containing updated attrs in `x`.
This is a shorter version of `x.updated(kd.attrs(x, ...))`.
Example:
x = x.with_attrs(foo=..., bar=...)
# Or equivalent:
# x = kd.with_attrs(x, foo=..., bar=...)
In case some attribute "foo" already exists and the update contains "foo",
either:
1) the schema of "foo" in the update must be implicitly castable to
`x.foo.get_schema()`; or
2) `x` is an OBJECT, in which case schema for "foo" will be overwritten.
An exception to (2) is if it was an Entity that was casted to an OBJECT using
kd.obj, e.g. then update for "foo" also must be castable to
`x.foo.get_schema()`. If this is not the case, an Error is raised.
This behavior can be overwritten by passing `overwrite=True`, which will cause
the schema for attributes to always be updated.
Args:
x: Entity / Object for which the attributes update is being created.
overwrite_schema: if True, schema for attributes is always updated.
**attrs: attrs to set in the update.
DataSlice.with_bag(bag, /)Returns a copy of DataSlice with DataBag `db`.
DataSlice.with_dict_update(self, keys: Any, values: Any = unspecified) -> DataSliceReturns a DataSlice with a new DataBag containing updated dicts.
This operator has two forms:
kd.with_dict_update(x, keys, values) where keys and values are slices
kd.with_dict_update(x, dict_updates) where dict_updates is a DataSlice of
dicts
If both keys and values are specified, they must both be broadcastable to the
shape of `x`. If only keys is specified (as dict_updates), it must be
broadcastable to 'x'.
Args:
x: DataSlice of dicts to update.
keys: A DataSlice of keys, or a DataSlice of dicts of updates.
values: A DataSlice of values, or unspecified if `keys` contains dicts.
DataSlice.with_list_append_update(self, append: Any) -> DataSliceReturns a DataSlice with a new DataBag containing updated appended lists.
The updated lists are the lists in `x` with the specified items appended at
the end.
`x` and `append` must have compatible shapes.
The resulting lists maintain the same ItemIds. Also see kd.appended_list()
which works similarly but resulting lists have new ItemIds.
Args:
x: DataSlice of lists.
append: DataSlice of values to append to each list in `x`.
Returns:
A DataSlice of lists in a new immutable DataBag.
DataSlice.with_merged_bag(self) -> DataSliceReturns a DataSlice with the DataBag of `ds` merged with its fallbacks.
Note that a DataBag has multiple fallback DataBags and fallback DataBags can
have fallbacks as well. This operator merges all of them into a new immutable
DataBag.
If `ds` has no attached DataBag, it raises an exception. If the DataBag of
`ds` does not have fallback DataBags, it is equivalent to `ds.freeze_bag()`.
Args:
ds: DataSlice to merge fallback DataBags of.
Returns:
A new DataSlice with an immutable DataBags.
DataSlice.with_name(obj: Any, name: str | Text) -> AnyAlias for kd.types.DataBag.with_name
DataSlice.with_schema(schema, /)Returns a copy of DataSlice with the provided `schema`.
`schema` must have no DataBag or the same DataBag as the DataSlice. If `schema`
has a different DataBag, use `set_schema` instead. See kd.with_schema for more
details.
Args:
schema: schema DataSlice to set.
Returns:
DataSlice with the provided `schema`.
DataSlice.with_schema_from_obj(self) -> DataSliceReturns `x` with its embedded common schema set as the schema.
* `x` must have OBJECT schema.
* All items in `x` must have a common schema.
* If `x` is empty, the schema is set to NONE.
* If `x` contains mixed primitives without a common primitive type, the output
will have OBJECT schema.
Args:
x: An OBJECT DataSlice.