koladata

Home
Overview
Fundamentals
Glossary
Cheatsheet
API Reference
Quick Recipes
Deep Dive
Common Pitfalls and Gotchas
Persistent Storage

View the Project on GitHub google/koladata

kd.types.DataBag API

Base class of all Arolla values in Python.

QValue is immutable. It provides only basic functionality.
Subclasses of this class might have further specialization.

DataBag.adopt(slice, /)

Adopts all data reachable from the given slice into this DataBag.

Args:
  slice: DataSlice to adopt data from.

Returns:
  The DataSlice with this DataBag (including adopted data) attached.

DataBag.adopt_stub(slice, /)

Copies the given DataSlice's schema stub into this DataBag.

The "schema stub" of a DataSlice is a subset of its schema (including embedded
schemas) that contains just enough information to support direct updates to
that DataSlice. See kd.stub() for more details.

Args:
  slice: DataSlice to extract the schema stub from.

Returns:
  The "stub" with this DataBag attached.

DataBag.concat_lists(self: DataBag, /, *lists: _DataSlice) -> _DataSlice

Returns a DataSlice of Lists concatenated from the List items of `lists`.

Each input DataSlice must contain only present List items, and the item
schemas of each input must be compatible. Input DataSlices are aligned (see
`kd.align`) automatically before concatenation.

If `lists` is empty, this returns a single empty list.

The specified `db` is used to create the new concatenated lists, and is the
DataBag used by the result DataSlice. If `db` is not specified, a new DataBag
is created for this purpose.

Args:
  *lists: the DataSlices of Lists to concatenate
  db: optional DataBag to populate with the result

Returns:
  DataSlice of concatenated Lists

DataBag.contents_repr(self: DataBag, /, *, triple_limit: int = 1000) -> ContentsReprWrapper

Returns a representation of the DataBag contents.

DataBag.data_triples_repr(self: DataBag, *, triple_limit: int = 1000) -> ContentsReprWrapper

Returns a representation of the DataBag contents, omitting schema triples.

DataBag.dict(self: DataBag, /, items_or_keys: dict[Any, Any] | _DataSlice | None = None, values: _DataSlice | None = None, *, key_schema: _DataSlice | None = None, value_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSlice

Creates a Koda dict.

Acceptable arguments are:
  1) no argument: a single empty dict
  2) a Python dict whose keys are either primitives or DataItems and values
     are primitives, DataItems, Python list/dict which can be converted to a
     List/Dict DataItem, or a DataSlice which can folded into a List DataItem:
     a single dict
  3) two DataSlices/DataItems as keys and values: a DataSlice of dicts whose
     shape is the last N-1 dimensions of keys/values DataSlice

Examples:
dict() -> returns a single new dict
dict({1: 2, 3: 4}) -> returns a single new dict
dict({1: [1, 2]}) -> returns a single dict, mapping 1->List[1, 2]
dict({1: kd.slice([1, 2])}) -> returns a single dict, mapping 1->List[1, 2]
dict({db.uuobj(x=1, y=2): 3}) -> returns a single dict, mapping uuid->3
dict(kd.slice([1, 2]), kd.slice([3, 4])) -> returns a dict, mapping 1->3 and
2->4
dict(kd.slice([[1], [2]]), kd.slice([3, 4])) -> returns two dicts, one
mapping
  1->3 and another mapping 2->4
dict('key', 12) -> returns a single dict mapping 'key'->12

Args:
  items_or_keys: a Python dict in case of items and a DataSlice in case of
    keys.
  values: a DataSlice. If provided, `items_or_keys` must be a DataSlice as
    keys.
  key_schema: the schema of the dict keys. If not specified, it will be
    deduced from keys or defaulted to OBJECT.
  value_schema: the schema of the dict values. If not specified, it will be
    deduced from values or defaulted to OBJECT.
  schema: The schema to use for the newly created Dict. If specified, then
    key_schema and value_schema must not be specified.
  itemid: Optional ITEMID DataSlice used as ItemIds of the resulting dicts.

Returns:
  A DataSlice with the dict.

DataBag.dict_like(self: DataBag, shape_and_mask_from: _DataSlice, /, items_or_keys: dict[Any, Any] | _DataSlice | None = None, values: _DataSlice | None = None, *, key_schema: _DataSlice | None = None, value_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSlice

Creates new Koda dicts with shape and sparsity of `shape_and_mask_from`.

If items_or_keys and values are not provided, creates empty dicts. Otherwise,
the function assigns the given keys and values to the newly created dicts. So
the keys and values must be either broadcastable to shape_and_mask_from
shape, or one dimension higher.

Args:
  self: the DataBag.
  shape_and_mask_from: a DataSlice with the shape and sparsity for the desired
    dicts.
  items_or_keys: either a Python dict (if `values` is None) or a DataSlice
    with keys. The Python dict case is supported only for scalar
    shape_and_mask_from.
  values: a DataSlice of values, when `items_or_keys` represents keys.
  key_schema: the schema of the dict keys. If not specified, it will be
    deduced from keys or defaulted to OBJECT.
  value_schema: the schema of the dict values. If not specified, it will be
    deduced from values or defaulted to OBJECT.
  schema: The schema to use for the newly created Dict. If specified, then
    key_schema and value_schema must not be specified.
  itemid: Optional ITEMID DataSlice used as ItemIds of the resulting dicts.

Returns:
  A DataSlice with the dicts.

DataBag.dict_schema(key_schema, value_schema)

Returns a dict schema from the schemas of the keys and values

DataBag.dict_shaped(self: DataBag, shape: _jagged_shape.JaggedShape, /, items_or_keys: dict[Any, Any] | _DataSlice | None = None, values: _DataSlice | None = None, *, key_schema: _DataSlice | None = None, value_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSlice

Creates new Koda dicts with the given shape.

If items_or_keys and values are not provided, creates empty dicts. Otherwise,
the function assigns the given keys and values to the newly created dicts. So
the keys and values must be either broadcastable to `shape` or one dimension
higher.

Args:
  self: the DataBag.
  shape: the desired shape.
  items_or_keys: either a Python dict (if `values` is None) or a DataSlice
    with keys. The Python dict case is supported only for scalar shape.
  values: a DataSlice of values, when `items_or_keys` represents keys.
  key_schema: the schema of the dict keys. If not specified, it will be
    deduced from keys or defaulted to OBJECT.
  value_schema: the schema of the dict values. If not specified, it will be
    deduced from values or defaulted to OBJECT.
  schema: The schema to use for the newly created Dict. If specified, then
    key_schema and value_schema must not be specified.
  itemid: Optional ITEMID DataSlice used as ItemIds of the resulting dicts.

Returns:
  A DataSlice with the dicts.

DataBag.empty()

Aliases:

Returns an empty immutable DataBag.

DataBag.empty_mutable()

Aliases:

Returns an empty mutable DataBag. Only works in eager mode.

DataBag.fork(mutable=True)

Returns a newly created DataBag with the same content as self.

Changes to either DataBag will not be reflected in the other.

Args:
  mutable: If true (default), returns a mutable DataBag. If false, the DataBag
    will be immutable.
Returns:
  data_bag.DataBag

DataBag.freeze(self: DataBag) -> DataBag

Returns a frozen DataBag equivalent to `self`.

DataBag.get_approx_byte_size()

Returns approximate size of the DataBag in bytes.

DataBag.get_approx_size()

Returns approximate size of the DataBag in triples.

DataBag.implode(self: DataBag, x: _DataSlice, /, ndim: int | _DataSlice = 1, itemid: _DataSlice | None = None) -> _DataSlice

Implodes a Dataslice `x` a specified number of times.

A single list "implosion" converts a rank-(K+1) DataSlice of T to a rank-K
DataSlice of LIST[T], by folding the items in the last dimension of the
original DataSlice into newly-created Lists.

If `ndim` is set to a non-negative integer, implodes recursively `ndim` times.

If `ndim` is set to a negative integer, implodes as many times as possible,
until the result is a DataItem (i.e. a rank-0 DataSlice) containing a single
nested List.

The specified `db` is used to create any new Lists, and is the DataBag of the
result DataSlice. If `db` is not specified, a new, empty DataBag is created
for this purpose.

Args:
  x: the DataSlice to implode
  ndim: the number of implosion operations to perform
  itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.
  db: optional DataBag where Lists are created from

Returns:
  DataSlice of nested Lists

DataBag.is_empty()

Returns True if the DataBag is empty.

DataBag.is_mutable()

Returns present iff this DataBag is mutable.

DataBag.list(self: DataBag, /, items: list[Any] | _DataSlice | None = None, *, item_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSlice

Creates list(s) by collapsing `items`.

If there is no argument, returns an empty Koda List.
If the argument is a Python list, creates a nested Koda List.

Examples:
list() -> a single empty Koda List
list([1, 2, 3]) -> Koda List with items 1, 2, 3
list([[1, 2, 3], [4, 5]]) -> nested Koda List [[1, 2, 3], [4, 5]]
  # items are Koda lists.

Args:
  items: The items to use. If not specified, an empty list of OBJECTs will be
    created.
  item_schema: the schema of the list items. If not specified, it will be
    deduced from `items` or defaulted to OBJECT.
  schema: The schema to use for the list. If specified, then item_schema must
    not be specified.
  itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.

Returns:
  A DataSlice with the list/lists.

DataBag.list_like(self: DataBag, shape_and_mask_from: _DataSlice, /, items: list[Any] | _DataSlice | None = None, *, item_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSlice

Creates new Koda lists with shape and sparsity of `shape_and_mask_from`.

Args:
  shape_and_mask_from: a DataSlice with the shape and sparsity for the desired
    lists.
  items: optional items to assign to the newly created lists. If not given,
    the function returns empty lists.
  item_schema: the schema of the list items. If not specified, it will be
    deduced from `items` or defaulted to OBJECT.
  schema: The schema to use for the list. If specified, then item_schema must
    not be specified.
  itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.

Returns:
  A DataSlice with the lists.

DataBag.list_schema(item_schema)

Returns a list schema from the schema of the items

DataBag.list_shaped(self: DataBag, shape: _jagged_shape.JaggedShape, /, items: list[Any] | _DataSlice | None = None, *, item_schema: _DataSlice | None = None, schema: _DataSlice | None = None, itemid: _DataSlice | None = None) -> _DataSlice

Creates new Koda lists with the given shape.

Args:
  shape: the desired shape.
  items: optional items to assign to the newly created lists. If not given,
    the function returns empty lists.
  item_schema: the schema of the list items. If not specified, it will be
    deduced from `items` or defaulted to OBJECT.
  schema: The schema to use for the list. If specified, then item_schema must
    not be specified.
  itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.

Returns:
  A DataSlice with the lists.

DataBag.merge_fallbacks()

Returns a new DataBag with all the fallbacks merged.

DataBag.merge_inplace(self: DataBag, other_bags: DataBag | Iterable[DataBag], /, *, overwrite: bool = True, allow_data_conflicts: bool = True, allow_schema_conflicts: bool = False) -> DataBag

Copies all data from `other_bags` to this DataBag.

Args:
  other_bags: Either a DataBag or a list of DataBags to merge into the current
    DataBag.
  overwrite: In case of conflicts, whether the new value (or the rightmost of
    the new values, if multiple) should be used instead of the old value. Note
    that this flag has no effect when allow_data_conflicts=False and
    allow_schema_conflicts=False. Note that db1.fork().inplace_merge(db2,
    overwrite=False) and db2.fork().inplace_merge(db1, overwrite=True) produce
    the same result.
  allow_data_conflicts: Whether we allow the same attribute to have different
    values in the bags being merged. When True, the overwrite= flag controls
    the behavior in case of a conflict. By default, both this flag and
    overwrite= are True, so we overwrite with the new values in case of a
    conflict.
  allow_schema_conflicts: Whether we allow the same attribute to have
    different types in an explicit schema. Note that setting this flag to True
    can be dangerous, as there might be some objects with the old schema that
    are not overwritten, and therefore will end up in an inconsistent state
    with their schema after the overwrite. When True, overwrite= flag controls
    the behavior in case of a conflict.

Returns:
  self, so that multiple DataBag modifications can be chained.

DataBag.named_schema(name, /, **attrs)

Creates a named schema with ItemId derived only from its name.

DataBag.new(*, schema=None, overwrite_schema=False, itemid=None, **attrs)

Creates Entities with given attrs.

Args:
  schema: optional DataSlice schema. If not specified, a new explicit schema
    will be automatically created based on the schemas of the passed **attrs.
  overwrite_schema: if schema attribute is missing and the attribute is being
    set through `attrs`, schema is successfully updated.
  itemid: optional ITEMID DataSlice used as ItemIds of the resulting entities.
    itemid will only be set when the args is not a primitive or primitive slice
    if args present.
  **attrs: attrs to set in the returned Entity.

Returns:
  data_slice.DataSlice with the given attrs.

DataBag.new_like(shape_and_mask_from, *, schema=None, overwrite_schema=False, itemid=None, **attrs)

Creates new Entities with the shape and sparsity from shape_and_mask_from.

Args:
  shape_and_mask_from: DataSlice, whose shape and sparsity the returned
    DataSlice will have.
  schema: optional DataSlice schema. If not specified, a new explicit schema
    will be automatically created based on the schemas of the passed **attrs.
  overwrite_schema: if schema attribute is missing and the attribute is being
    set through `attrs`, schema is successfully updated.
  itemid: optional ITEMID DataSlice used as ItemIds of the resulting entities.
  **attrs: attrs to set in the returned Entity.

Returns:
  data_slice.DataSlice with the given attrs.

DataBag.new_schema(**attrs)

Creates new schema object with given types of attrs.

DataBag.new_shaped(shape, *, schema=None, overwrite_schema=False, itemid=None, **attrs)

Creates new Entities with the given shape.

Args:
  shape: JaggedShape that the returned DataSlice will have.
  schema: optional DataSlice schema. If not specified, a new explicit schema
    will be automatically created based on the schemas of the passed **attrs.
  overwrite_schema: if schema attribute is missing and the attribute is being
    set through `attrs`, schema is successfully updated.
  itemid: optional ITEMID DataSlice used as ItemIds of the resulting entities.
  **attrs: attrs to set in the returned Entity.

Returns:
  data_slice.DataSlice with the given attrs.

DataBag.obj(arg, *, itemid=None, **attrs)

Creates new Objects with an implicit stored schema.

Returned DataSlice has OBJECT schema.

Args:
  arg: optional Koda object or Python primitive to be converted to an Object.
  itemid: optional ITEMID DataSlice used as ItemIds of the resulting obj(s).
    ItemIds will only be set when the arg is not provided, otherwise an error
      will be raised.
  **attrs: attrs to set on the returned object.

Returns:
  data_slice.DataSlice with the given attrs and kd.OBJECT schema.

DataBag.obj_like(shape_and_mask_from, *, itemid=None, **attrs)

Creates Objects with shape and sparsity from shape_and_mask_from.

Returned DataSlice has OBJECT schema.

Args:
  shape_and_mask_from: DataSlice, whose shape and sparsity the returned
    DataSlice will have.
  itemid: optional ITEMID DataSlice used as ItemIds of the resulting obj(s).
  db: optional DataBag where entities are created.
  **attrs: attrs to set in the returned Entity.

Returns:
  data_slice.DataSlice with the given attrs.

DataBag.obj_shaped(shape, *, itemid=None, **attrs)

Creates Objects with the given shape.

Returned DataSlice has OBJECT schema.

Args:
  shape: JaggedShape that the returned DataSlice will have.
  itemid: optional ITEMID DataSlice used as ItemIds of the resulting obj(s).
  **attrs: attrs to set in the returned Entity.

Returns:
  data_slice.DataSlice with the given attrs.

DataBag.overwriting_merge_update(other_db)

Returns a new DataBag with the update from other_db.

db.overwriting_merge_update(other_db) returns a DataBag tht can be passed
instead of `other_db` to `merge_inplace` to get the same result.
db.merge_inplace(other_db, allow_schema_conflicts=True).

Notes about the returned DataBag:
1. It may still contain data that is present in "self".
2. It may lack schema information and so it couldn't be used directly.

Args:
  other_db: DataBag to overwrite data and schema from.

Returns:
  DataBag with the update from other_db.

DataBag.schema_triples_repr(self: DataBag, *, triple_limit: int = 1000) -> ContentsReprWrapper

Returns a representation of schema triples in the DataBag.

DataBag.uu(seed='', *, schema=None, overwrite_schema=False, **kwargs)

Creates an item whose ids are uuid(s) with the set attributes.

In order to create a different "Type" from the same arguments, use
`seed` key with the desired value, e.g.

kd.uu(seed='type_1', x=kd.slice([1, 2, 3]), y=kd.slice([4, 5, 6]))

and

kd.uu(seed='type_2', x=kd.slice([1, 2, 3]), y=kd.slice([4, 5, 6]))

have different ids.

If 'schema' is provided, the resulting DataSlice has the provided schema.
Otherwise, uses the corresponding uuschema instead.

Args:
  seed: (str) Allows different item(s) to have different ids when created
    from the same inputs.
  schema: schema for the resulting DataSlice
  overwrite_schema: if true, will overwrite schema attributes in the schema's
    corresponding db from the argument values.
  **kwargs: key-value pairs of object attributes where values are DataSlices
    or can be converted to DataSlices using kd.new.

Returns:
  data_slice.DataSlice
    

DataBag.uu_schema(seed='', **attrs)

Creates new uuschema from given types of attrs.

DataBag.uuobj(seed='', **kwargs)

Creates object(s) whose ids are uuid(s) with the provided attributes.

In order to create a different "Type" from the same arguments, use
`seed` key with the desired value, e.g.

kd.uuobj(seed='type_1', x=kd.slice([1, 2, 3]), y=kd.slice([4, 5, 6]))

and

kd.uuobj(seed='type_2', x=kd.slice([1, 2, 3]), y=kd.slice([4, 5, 6]))

have different ids.

Args:
  seed: (str) Allows different uuobj(s) to have different ids when created
    from the same inputs.
  **kwargs: key-value pairs of object attributes where values are DataSlices
    or can be converted to DataSlices using kd.new.

Returns:
  data_slice.DataSlice
    

DataBag.with_name(obj: Any, name: str | Text) -> Any

Aliases:

Checks that the `name` is a string and returns `obj` unchanged.

This method is useful in tracing workflows: when tracing, we will assign
the given name to the subexpression computing `obj`. In eager mode, this
method is effectively a no-op.

Args:
  obj: Any object.
  name: The name to be used for this sub-expression when tracing this code.
    Must be a string.

Returns:
  obj unchanged.