Home
Overview
Fundamentals
Glossary
Cheatsheet
API Reference
Quick Recipes
Deep Dive
Common Pitfalls and Gotchas
Persistent Storage
Base class of all Arolla values in Python.
QValue is immutable. It provides only basic functionality.
Subclasses of this class might have further specialization.
DataBag.adopt(slice, /)Adopts all data reachable from the given slice into this DataBag.
Args:
slice: DataSlice to adopt data from.
Returns:
The DataSlice with this DataBag (including adopted data) attached.
DataBag.adopt_stub(slice, /)Copies the given DataSlice's schema stub into this DataBag.
The "schema stub" of a DataSlice is a subset of its schema (including embedded
schemas) that contains just enough information to support direct updates to
that DataSlice. See kd.stub() for more details.
Args:
slice: DataSlice to extract the schema stub from.
Returns:
The "stub" with this DataBag attached.
DataBag.concat_lists(self: DataBag, /, *lists: _DataSlice) -> _DataSliceReturns a DataSlice of Lists concatenated from the List items of `lists`.
Each input DataSlice must contain only present List items, and the item
schemas of each input must be compatible. Input DataSlices are aligned (see
`kd.align`) automatically before concatenation.
If `lists` is empty, this returns a single empty list.
The specified `db` is used to create the new concatenated lists, and is the
DataBag used by the result DataSlice. If `db` is not specified, a new DataBag
is created for this purpose.
Args:
*lists: the DataSlices of Lists to concatenate
db: optional DataBag to populate with the result
Returns:
DataSlice of concatenated Lists
DataBag.contents_repr(self: DataBag, /, *, triple_limit: int = 1000) -> ContentsReprWrapperReturns a representation of the DataBag contents.
DataBag.data_triples_repr(self: DataBag, *, triple_limit: int = 1000) -> ContentsReprWrapperReturns a representation of the DataBag contents, omitting schema triples.
DataBag.dict(self: DataBag, /, items_or_keys: dict[Any, Any] | _DataSlice | None = None, values: _DataSlice | None = None, *, key_schema: _DataSlice | None = None, value_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSliceCreates a Koda dict.
Acceptable arguments are:
1) no argument: a single empty dict
2) a Python dict whose keys are either primitives or DataItems and values
are primitives, DataItems, Python list/dict which can be converted to a
List/Dict DataItem, or a DataSlice which can folded into a List DataItem:
a single dict
3) two DataSlices/DataItems as keys and values: a DataSlice of dicts whose
shape is the last N-1 dimensions of keys/values DataSlice
Examples:
dict() -> returns a single new dict
dict({1: 2, 3: 4}) -> returns a single new dict
dict({1: [1, 2]}) -> returns a single dict, mapping 1->List[1, 2]
dict({1: kd.slice([1, 2])}) -> returns a single dict, mapping 1->List[1, 2]
dict({db.uuobj(x=1, y=2): 3}) -> returns a single dict, mapping uuid->3
dict(kd.slice([1, 2]), kd.slice([3, 4])) -> returns a dict, mapping 1->3 and
2->4
dict(kd.slice([[1], [2]]), kd.slice([3, 4])) -> returns two dicts, one
mapping
1->3 and another mapping 2->4
dict('key', 12) -> returns a single dict mapping 'key'->12
Args:
items_or_keys: a Python dict in case of items and a DataSlice in case of
keys.
values: a DataSlice. If provided, `items_or_keys` must be a DataSlice as
keys.
key_schema: the schema of the dict keys. If not specified, it will be
deduced from keys or defaulted to OBJECT.
value_schema: the schema of the dict values. If not specified, it will be
deduced from values or defaulted to OBJECT.
schema: The schema to use for the newly created Dict. If specified, then
key_schema and value_schema must not be specified.
itemid: Optional ITEMID DataSlice used as ItemIds of the resulting dicts.
Returns:
A DataSlice with the dict.
DataBag.dict_like(self: DataBag, shape_and_mask_from: _DataSlice, /, items_or_keys: dict[Any, Any] | _DataSlice | None = None, values: _DataSlice | None = None, *, key_schema: _DataSlice | None = None, value_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSliceCreates new Koda dicts with shape and sparsity of `shape_and_mask_from`.
If items_or_keys and values are not provided, creates empty dicts. Otherwise,
the function assigns the given keys and values to the newly created dicts. So
the keys and values must be either broadcastable to shape_and_mask_from
shape, or one dimension higher.
Args:
self: the DataBag.
shape_and_mask_from: a DataSlice with the shape and sparsity for the desired
dicts.
items_or_keys: either a Python dict (if `values` is None) or a DataSlice
with keys. The Python dict case is supported only for scalar
shape_and_mask_from.
values: a DataSlice of values, when `items_or_keys` represents keys.
key_schema: the schema of the dict keys. If not specified, it will be
deduced from keys or defaulted to OBJECT.
value_schema: the schema of the dict values. If not specified, it will be
deduced from values or defaulted to OBJECT.
schema: The schema to use for the newly created Dict. If specified, then
key_schema and value_schema must not be specified.
itemid: Optional ITEMID DataSlice used as ItemIds of the resulting dicts.
Returns:
A DataSlice with the dicts.
DataBag.dict_schema(key_schema, value_schema)Returns a dict schema from the schemas of the keys and values
DataBag.dict_shaped(self: DataBag, shape: _jagged_shape.JaggedShape, /, items_or_keys: dict[Any, Any] | _DataSlice | None = None, values: _DataSlice | None = None, *, key_schema: _DataSlice | None = None, value_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSliceCreates new Koda dicts with the given shape.
If items_or_keys and values are not provided, creates empty dicts. Otherwise,
the function assigns the given keys and values to the newly created dicts. So
the keys and values must be either broadcastable to `shape` or one dimension
higher.
Args:
self: the DataBag.
shape: the desired shape.
items_or_keys: either a Python dict (if `values` is None) or a DataSlice
with keys. The Python dict case is supported only for scalar shape.
values: a DataSlice of values, when `items_or_keys` represents keys.
key_schema: the schema of the dict keys. If not specified, it will be
deduced from keys or defaulted to OBJECT.
value_schema: the schema of the dict values. If not specified, it will be
deduced from values or defaulted to OBJECT.
schema: The schema to use for the newly created Dict. If specified, then
key_schema and value_schema must not be specified.
itemid: Optional ITEMID DataSlice used as ItemIds of the resulting dicts.
Returns:
A DataSlice with the dicts.
DataBag.empty()Aliases:
Returns an empty immutable DataBag.
DataBag.empty_mutable()Aliases:
Returns an empty mutable DataBag. Only works in eager mode.
DataBag.fork(mutable=True)Returns a newly created DataBag with the same content as self.
Changes to either DataBag will not be reflected in the other.
Args:
mutable: If true (default), returns a mutable DataBag. If false, the DataBag
will be immutable.
Returns:
data_bag.DataBag
DataBag.freeze(self: DataBag) -> DataBagReturns a frozen DataBag equivalent to `self`.
DataBag.get_approx_byte_size()Returns approximate size of the DataBag in bytes.
DataBag.get_approx_size()Returns approximate size of the DataBag in triples.
DataBag.implode(self: DataBag, x: _DataSlice, /, ndim: int | _DataSlice = 1, itemid: _DataSlice | None = None) -> _DataSliceImplodes a Dataslice `x` a specified number of times.
A single list "implosion" converts a rank-(K+1) DataSlice of T to a rank-K
DataSlice of LIST[T], by folding the items in the last dimension of the
original DataSlice into newly-created Lists.
If `ndim` is set to a non-negative integer, implodes recursively `ndim` times.
If `ndim` is set to a negative integer, implodes as many times as possible,
until the result is a DataItem (i.e. a rank-0 DataSlice) containing a single
nested List.
The specified `db` is used to create any new Lists, and is the DataBag of the
result DataSlice. If `db` is not specified, a new, empty DataBag is created
for this purpose.
Args:
x: the DataSlice to implode
ndim: the number of implosion operations to perform
itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.
db: optional DataBag where Lists are created from
Returns:
DataSlice of nested Lists
DataBag.is_empty()Returns True if the DataBag is empty.
DataBag.is_mutable()Returns present iff this DataBag is mutable.
DataBag.list(self: DataBag, /, items: list[Any] | _DataSlice | None = None, *, item_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSliceCreates list(s) by collapsing `items`.
If there is no argument, returns an empty Koda List.
If the argument is a Python list, creates a nested Koda List.
Examples:
list() -> a single empty Koda List
list([1, 2, 3]) -> Koda List with items 1, 2, 3
list([[1, 2, 3], [4, 5]]) -> nested Koda List [[1, 2, 3], [4, 5]]
# items are Koda lists.
Args:
items: The items to use. If not specified, an empty list of OBJECTs will be
created.
item_schema: the schema of the list items. If not specified, it will be
deduced from `items` or defaulted to OBJECT.
schema: The schema to use for the list. If specified, then item_schema must
not be specified.
itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.
Returns:
A DataSlice with the list/lists.
DataBag.list_like(self: DataBag, shape_and_mask_from: _DataSlice, /, items: list[Any] | _DataSlice | None = None, *, item_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSliceCreates new Koda lists with shape and sparsity of `shape_and_mask_from`.
Args:
shape_and_mask_from: a DataSlice with the shape and sparsity for the desired
lists.
items: optional items to assign to the newly created lists. If not given,
the function returns empty lists.
item_schema: the schema of the list items. If not specified, it will be
deduced from `items` or defaulted to OBJECT.
schema: The schema to use for the list. If specified, then item_schema must
not be specified.
itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.
Returns:
A DataSlice with the lists.
DataBag.list_schema(item_schema)Returns a list schema from the schema of the items
DataBag.list_shaped(self: DataBag, shape: _jagged_shape.JaggedShape, /, items: list[Any] | _DataSlice | None = None, *, item_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSliceCreates new Koda lists with the given shape.
Args:
shape: the desired shape.
items: optional items to assign to the newly created lists. If not given,
the function returns empty lists.
item_schema: the schema of the list items. If not specified, it will be
deduced from `items` or defaulted to OBJECT.
schema: The schema to use for the list. If specified, then item_schema must
not be specified.
itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.
Returns:
A DataSlice with the lists.
DataBag.merge_fallbacks()Returns a new DataBag with all the fallbacks merged.
DataBag.merge_inplace(self: DataBag, other_bags: DataBag | Iterable[DataBag], /, *, overwrite: bool = True, allow_data_conflicts: bool = True, allow_schema_conflicts: bool = False) -> DataBagCopies all data from `other_bags` to this DataBag.
Args:
other_bags: Either a DataBag or a list of DataBags to merge into the current
DataBag.
overwrite: In case of conflicts, whether the new value (or the rightmost of
the new values, if multiple) should be used instead of the old value. Note
that this flag has no effect when allow_data_conflicts=False and
allow_schema_conflicts=False. Note that db1.fork().inplace_merge(db2,
overwrite=False) and db2.fork().inplace_merge(db1, overwrite=True) produce
the same result.
allow_data_conflicts: Whether we allow the same attribute to have different
values in the bags being merged. When True, the overwrite= flag controls
the behavior in case of a conflict. By default, both this flag and
overwrite= are True, so we overwrite with the new values in case of a
conflict.
allow_schema_conflicts: Whether we allow the same attribute to have
different types in an explicit schema. Note that setting this flag to True
can be dangerous, as there might be some objects with the old schema that
are not overwritten, and therefore will end up in an inconsistent state
with their schema after the overwrite. When True, overwrite= flag controls
the behavior in case of a conflict.
Returns:
self, so that multiple DataBag modifications can be chained.
DataBag.named_schema(name, /, **attrs)Creates a named schema with ItemId derived only from its name.
DataBag.new(*, schema=None, overwrite_schema=False, itemid=None, **attrs)Creates Entities with given attrs.
Args:
schema: optional DataSlice schema. If not specified, a new explicit schema
will be automatically created based on the schemas of the passed **attrs.
overwrite_schema: if schema attribute is missing and the attribute is being
set through `attrs`, schema is successfully updated.
itemid: optional ITEMID DataSlice used as ItemIds of the resulting entities.
itemid will only be set when the args is not a primitive or primitive slice
if args present.
**attrs: attrs to set in the returned Entity.
Returns:
data_slice.DataSlice with the given attrs.
DataBag.new_like(shape_and_mask_from, *, schema=None, overwrite_schema=False, itemid=None, **attrs)Creates new Entities with the shape and sparsity from shape_and_mask_from.
Args:
shape_and_mask_from: DataSlice, whose shape and sparsity the returned
DataSlice will have.
schema: optional DataSlice schema. If not specified, a new explicit schema
will be automatically created based on the schemas of the passed **attrs.
overwrite_schema: if schema attribute is missing and the attribute is being
set through `attrs`, schema is successfully updated.
itemid: optional ITEMID DataSlice used as ItemIds of the resulting entities.
**attrs: attrs to set in the returned Entity.
Returns:
data_slice.DataSlice with the given attrs.
DataBag.new_schema(**attrs)Creates new schema object with given types of attrs.
DataBag.new_shaped(shape, *, schema=None, overwrite_schema=False, itemid=None, **attrs)Creates new Entities with the given shape.
Args:
shape: JaggedShape that the returned DataSlice will have.
schema: optional DataSlice schema. If not specified, a new explicit schema
will be automatically created based on the schemas of the passed **attrs.
overwrite_schema: if schema attribute is missing and the attribute is being
set through `attrs`, schema is successfully updated.
itemid: optional ITEMID DataSlice used as ItemIds of the resulting entities.
**attrs: attrs to set in the returned Entity.
Returns:
data_slice.DataSlice with the given attrs.
DataBag.obj(arg, *, itemid=None, **attrs)Creates new Objects with an implicit stored schema.
Returned DataSlice has OBJECT schema.
Args:
arg: optional Koda object or Python primitive to be converted to an Object.
itemid: optional ITEMID DataSlice used as ItemIds of the resulting obj(s).
ItemIds will only be set when the arg is not provided, otherwise an error
will be raised.
**attrs: attrs to set on the returned object.
Returns:
data_slice.DataSlice with the given attrs and kd.OBJECT schema.
DataBag.obj_like(shape_and_mask_from, *, itemid=None, **attrs)Creates Objects with shape and sparsity from shape_and_mask_from.
Returned DataSlice has OBJECT schema.
Args:
shape_and_mask_from: DataSlice, whose shape and sparsity the returned
DataSlice will have.
itemid: optional ITEMID DataSlice used as ItemIds of the resulting obj(s).
db: optional DataBag where entities are created.
**attrs: attrs to set in the returned Entity.
Returns:
data_slice.DataSlice with the given attrs.
DataBag.obj_shaped(shape, *, itemid=None, **attrs)Creates Objects with the given shape.
Returned DataSlice has OBJECT schema.
Args:
shape: JaggedShape that the returned DataSlice will have.
itemid: optional ITEMID DataSlice used as ItemIds of the resulting obj(s).
**attrs: attrs to set in the returned Entity.
Returns:
data_slice.DataSlice with the given attrs.
DataBag.overwriting_merge_update(other_db)Returns a new DataBag with the update from other_db.
db.overwriting_merge_update(other_db) returns a DataBag tht can be passed
instead of `other_db` to `merge_inplace` to get the same result.
db.merge_inplace(other_db, allow_schema_conflicts=True).
Notes about the returned DataBag:
1. It may still contain data that is present in "self".
2. It may lack schema information and so it couldn't be used directly.
Args:
other_db: DataBag to overwrite data and schema from.
Returns:
DataBag with the update from other_db.
DataBag.schema_triples_repr(self: DataBag, *, triple_limit: int = 1000) -> ContentsReprWrapperReturns a representation of schema triples in the DataBag.
DataBag.uu(seed='', *, schema=None, overwrite_schema=False, **kwargs)Creates an item whose ids are uuid(s) with the set attributes.
In order to create a different "Type" from the same arguments, use
`seed` key with the desired value, e.g.
kd.uu(seed='type_1', x=kd.slice([1, 2, 3]), y=kd.slice([4, 5, 6]))
and
kd.uu(seed='type_2', x=kd.slice([1, 2, 3]), y=kd.slice([4, 5, 6]))
have different ids.
If 'schema' is provided, the resulting DataSlice has the provided schema.
Otherwise, uses the corresponding uuschema instead.
Args:
seed: (str) Allows different item(s) to have different ids when created
from the same inputs.
schema: schema for the resulting DataSlice
overwrite_schema: if true, will overwrite schema attributes in the schema's
corresponding db from the argument values.
**kwargs: key-value pairs of object attributes where values are DataSlices
or can be converted to DataSlices using kd.new.
Returns:
data_slice.DataSlice
DataBag.uu_schema(seed='', **attrs)Creates new uuschema from given types of attrs.
DataBag.uuobj(seed='', **kwargs)Creates object(s) whose ids are uuid(s) with the provided attributes.
In order to create a different "Type" from the same arguments, use
`seed` key with the desired value, e.g.
kd.uuobj(seed='type_1', x=kd.slice([1, 2, 3]), y=kd.slice([4, 5, 6]))
and
kd.uuobj(seed='type_2', x=kd.slice([1, 2, 3]), y=kd.slice([4, 5, 6]))
have different ids.
Args:
seed: (str) Allows different uuobj(s) to have different ids when created
from the same inputs.
**kwargs: key-value pairs of object attributes where values are DataSlices
or can be converted to DataSlices using kd.new.
Returns:
data_slice.DataSlice
DataBag.with_name(obj: Any, name: str | Text) -> AnyAliases:
Checks that the `name` is a string and returns `obj` unchanged.
This method is useful in tracing workflows: when tracing, we will assign
the given name to the subexpression computing `obj`. In eager mode, this
method is effectively a no-op.
Args:
obj: Any object.
name: The name to be used for this sub-expression when tracing this code.
Must be a string.
Returns:
obj unchanged.