koladata

Home
Overview
Fundamentals
Glossary
Cheatsheet
API Reference
Quick Recipes
Deep Dive
Common Pitfalls and Gotchas
Persistent Storage

View the Project on GitHub google/koladata

kd API

kd module is the common container for all operators whereas kd.eager and kd.lazy modules are containers for explicitly eager and lazy operators respectively.

While most of operators below have both eager and lazy versions (e.g. kd.eager.agg_sum vs kd.lazy.agg_sum), some functions (e.g. kd.sub(expr, *subs)) only have eager version. Such functions often take Exprs or Functors as inputs and does not make sense to have a lazy version.

Note that operators from extension modules (e.g. kd_ext.npkd) are not included in the kd.* namespace.

Subcategory Description
allocation Operators that allocate new ItemIds.
annotation Annotation operators.
assertion Operators that assert properties of DataSlices.
bags Operators that work on DataBags.
bitwise Bitwise operators.
comparison Operators that compare DataSlices.
core Core operators that are not part of other categories.
curves Operators working with curves.
dicts Operators working with dictionaries.
entities Operators that work solely with entities.
expr Expr utilities.
extension_types Extension type operators.
file_io Utilities for interacting with the file system.
functor Operators to create and call functors.
ids Operators that work with ItemIds.
iterables Operators that work with iterables.
json JSON serialization and parsing operators.
json_stream JSON stream processing operators.
lists Operators working with lists.
masking Masking operators.
math Arithmetic operators.
objs Operators that work solely with objects.
optools Operator definition and registration tooling.
parallel Tools to evaluate functors in parallel.
proto Protocol buffer serialization and parsing operators.
py Operators that call Python functions.
qtypes Koda QTypes.
random Random and sampling operators.
s11n Serialization and deserialization utilities.
schema Schema-related operators.
schema_filters Schema filter operators and constants.
shapes Operators that work on shapes.
slices Operators that perform DataSlice transformations.
streams Operators that work with streams of items.
strings Operators that work with strings data.
testing A front-end module for kd.testing.*.
tuples Operators to create tuples.
type_checking Utilities to annotatate functions with type checking.
types Types used as type annotations in users's code.

kd.BOOLEAN

SchemaItem representing booleans.

kd.BYTES

SchemaItem representing byte strings.

kd.EXPR

SchemaItem representing expressions.

kd.FLOAT32

SchemaItem representing 32-bit floats.

kd.FLOAT64

SchemaItem representing 64-bit floats.

kd.INT32

SchemaItem representing 32-bit integers.

kd.INT64

SchemaItem representing 64-bit integers.

kd.ITEMID

SchemaItem representing ItemIds.

kd.MASK

SchemaItem representing masks.

kd.NONE

SchemaItem representing the None schema.

kd.OBJECT

SchemaItem representing generic objects.

kd.SCHEMA

SchemaItem representing schemas.

kd.STRING

SchemaItem representing Unicode strings.

kd.SWITCH_DEFAULT

Marks the default case in a kd.switch() construct.

kd.agg_all(x, ndim=unspecified)

Alias for kd.masking.agg_all

kd.agg_any(x, ndim=unspecified)

Alias for kd.masking.agg_any

kd.agg_count(x, ndim=unspecified)

Alias for kd.slices.agg_count

kd.agg_has(x, ndim=unspecified)

Alias for kd.masking.agg_has

kd.agg_max(x, ndim=unspecified)

Alias for kd.math.agg_max

kd.agg_min(x, ndim=unspecified)

Alias for kd.math.agg_min

kd.agg_size(x, ndim=unspecified)

Alias for kd.slices.agg_size

kd.agg_sum(x, ndim=unspecified)

Alias for kd.math.agg_sum

kd.agg_uuid(x, ndim=unspecified)

Alias for kd.ids.agg_uuid

kd.align(*args)

Alias for kd.slices.align

kd.all(x)

Alias for kd.masking.all

kd.any(x)

Alias for kd.masking.any

kd.appended_list(x, append)

Alias for kd.lists.appended_list

kd.apply_mask(x, y)

Alias for kd.masking.apply_mask

kd.apply_py(fn, *args, return_type_as=unspecified, **kwargs)

Alias for kd.py.apply_py

kd.argmax(x, ndim=unspecified)

Alias for kd.math.argmax

kd.argmin(x, ndim=unspecified)

Alias for kd.math.argmin

kd.at(x, indices)

Alias for kd.slices.at

kd.attr(x, attr_name, value, overwrite_schema=False)

Alias for kd.core.attr

kd.attrs(x, /, *, overwrite_schema=False, **attrs)

Alias for kd.core.attrs

kd.bag()

Alias for kd.types.DataBag.empty

kd.bind(fn_def: DataItem, /, *args: Any, return_type_as: Any = <class 'koladata.types.data_slice.DataSlice'>, **kwargs: Any) -> DataItem

Alias for kd.functor.bind

kd.bitwise_and(x, y)

Alias for kd.bitwise.bitwise_and

kd.bitwise_count(x)

Alias for kd.bitwise.count

kd.bitwise_invert(x)

Alias for kd.bitwise.invert

kd.bitwise_or(x, y)

Alias for kd.bitwise.bitwise_or

kd.bitwise_xor(x, y)

Alias for kd.bitwise.bitwise_xor

kd.bool(x: Any) -> DataSlice

Alias for kd.slices.bool

kd.bytes(x: Any) -> DataSlice

Alias for kd.slices.bytes

kd.call(fn, *args, return_type_as=None, **kwargs)

Alias for kd.functor.call

kd.cast_to(x, schema)

Alias for kd.schema.cast_to

kd.check_inputs(**kw_constraints: TypeConstraint)

Alias for kd.type_checking.check_inputs

kd.check_output(constraint: TypeConstraint)

Alias for kd.type_checking.check_output

kd.cityhash(x, seed)

Alias for kd.random.cityhash

kd.clone(x, /, *, itemid=unspecified, schema=unspecified, **overrides)

Alias for kd.core.clone

kd.clone_as_full(x, /, *, itemid=unspecified, **overrides)

Alias for kd.core.clone_as_full

kd.coalesce(x, y)

Alias for kd.masking.coalesce

kd.collapse(x, ndim=unspecified)

Alias for kd.slices.collapse

kd.concat(*args, ndim=1)

Alias for kd.slices.concat

kd.concat_lists(*lists: DataSlice) -> DataSlice

Alias for kd.lists.concat

kd.cond(condition, yes, no=None)

Alias for kd.masking.cond

kd.count(x)

Alias for kd.slices.count

kd.cum_count(x, ndim=unspecified)

Alias for kd.slices.cum_count

kd.cum_max(x, ndim=unspecified)

Alias for kd.math.cum_max

kd.decode_itemid(ds)

Alias for kd.ids.decode_itemid

kd.deep_clone(x, /, schema=unspecified, **overrides)

Alias for kd.core.deep_clone

kd.deep_uuid(x, /, schema=unspecified, *, seed='')

Alias for kd.ids.deep_uuid

kd.del_attr(x: DataSlice, attr_name: str)

Deletes an attribute `attr_name` from `x`.

kd.dense_rank(x, descending=False, ndim=unspecified)

Alias for kd.slices.dense_rank

kd.dict(items_or_keys: Any | None = None, values: Any | None = None, *, key_schema: DataSlice | None = None, value_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSlice

Alias for kd.dicts.new

kd.dict_like(shape_and_mask_from: DataSlice, /, items_or_keys: Any | None = None, values: Any | None = None, *, key_schema: DataSlice | None = None, value_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSlice

Alias for kd.dicts.like

kd.dict_schema(key_schema, value_schema)

Alias for kd.schema.dict_schema

kd.dict_shaped(shape: JaggedShape, /, items_or_keys: Any | None = None, values: Any | None = None, key_schema: DataSlice | None = None, value_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSlice

Alias for kd.dicts.shaped

kd.dict_shaped_as(shape_from: DataSlice, /, items_or_keys: Any | None = None, values: Any | None = None, key_schema: DataSlice | None = None, value_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSlice

Alias for kd.dicts.shaped_as

kd.dict_size(dict_slice)

Alias for kd.dicts.size

kd.dict_update(x, keys, values=unspecified)

Alias for kd.dicts.dict_update

kd.dir(x: DataSlice, *, intersection: bool | None = None) -> list[str]

Returns a sorted list of unique attribute names of the given DataSlice.

In case of OBJECT schema, attribute names are fetched from the `__schema__`
attribute. In case of Entity schema, the attribute names are fetched from the
schema. In case of primitives, an empty list is returned.

Args:
  x: A DataSlice.
  intersection: If True, the intersection of all object attributes is
    returned. If False, the union is returned. If not specified, raises an
    error if objects have different attributes.

Returns:
  A list of unique attributes sorted by alphabetical order.

kd.disjoint_coalesce(x, y)

Alias for kd.masking.disjoint_coalesce

kd.duck_dict(key_constraint: TypeConstraint, value_constraint: TypeConstraint)

Alias for kd.type_checking.duck_dict

kd.duck_list(item_constraint: TypeConstraint)

Alias for kd.type_checking.duck_list

kd.duck_type(**kwargs: TypeConstraint)

Alias for kd.type_checking.duck_type

kd.dump(x: DataSlice | DataBag, path: str, /, *, overwrite: bool = False, riegeli_options: str | None = None, fs: Any | None = None) -> None

Alias for kd.s11n.dump

kd.dumps(x: DataSlice | DataBag, /, *, riegeli_options: str | None = None) -> bytes

Alias for kd.s11n.dumps

kd.embed_schema(x: DataSlice) -> DataSlice

Returns a DataSlice with OBJECT schema.

* For primitives no data change is done.
* For Entities schema is stored as '__schema__' attribute.
* Embedding Entities requires a DataSlice to be associated with a DataBag.

Args:
  x: (DataSlice) whose schema is embedded.

kd.empty_shaped(shape, schema=MASK)

Alias for kd.slices.empty_shaped

kd.empty_shaped_as(shape_from, schema=MASK)

Alias for kd.slices.empty_shaped_as

kd.encode_itemid(ds)

Alias for kd.ids.encode_itemid

kd.enriched(ds, *bag)

Alias for kd.core.enriched

kd.enriched_bag(*bags)

Alias for kd.bags.enriched

kd.equal(x, y)

Alias for kd.comparison.equal

kd.eval(expr: Any, self_input: Any = UNSPECIFIED_SELF_INPUT, /, **input_values: Any) -> Any

Returns the expr evaluated on the given `input_values`.

Only Koda Inputs from container `I` (e.g. `I.x`) can be evaluated. Other
input types must be substituted before calling this function.

Args:
  expr: Koda expression with inputs from container `I`.
  self_input: The value for I.self input. When not provided, it will still
    have a default value that can be passed to a subroutine.
  **input_values: Values to evaluate `expr` with. Note that all inputs in
    `expr` must be present in the input values. All input values should either
    be DataSlices or convertible to DataSlices.

kd.expand_to(x, target, ndim=unspecified)

Alias for kd.slices.expand_to

kd.expand_to_shape(x, shape, ndim=unspecified)

Alias for kd.shapes.expand_to_shape

kd.experimental_safer_loads(x: bytes) -> Any

Alias for kd.s11n.experimental_safer_loads

kd.explode(x, ndim=1)

Alias for kd.lists.explode

kd.expr_quote(x: Any) -> DataSlice

Alias for kd.slices.expr_quote

kd.extension_type(unsafe_override=False) -> Callable[[type[Any]], type[Any]]

Alias for kd.extension_types.extension_type

kd.extract(ds, schema=unspecified)

Alias for kd.core.extract

kd.extract_update(ds, schema=unspecified)

Alias for kd.core.extract_update

kd.flat_map_chain(iterable, fn, value_type_as=None)

Alias for kd.functor.flat_map_chain

kd.flat_map_interleaved(iterable, fn, value_type_as=None)

Alias for kd.functor.flat_map_interleaved

kd.flatten(x, from_dim=0, to_dim=unspecified)

Alias for kd.shapes.flatten

kd.flatten_end(x, n_times=1)

Alias for kd.shapes.flatten_end

kd.float32(x: Any) -> DataSlice

Alias for kd.slices.float32

kd.float64(x: Any) -> DataSlice

Alias for kd.slices.float64

kd.fn(f: Any, *, use_tracing: bool = True, **kwargs: Any) -> DataItem

Alias for kd.functor.fn

kd.follow(x)

Alias for kd.core.follow

kd.for_(iterable, body_fn, *, finalize_fn=unspecified, condition_fn=unspecified, returns=unspecified, yields=unspecified, yields_interleaved=unspecified, **initial_state)

Alias for kd.functor.for_

kd.format(fmt, /, **kwargs)

Alias for kd.strings.format

kd.freeze(x)

Alias for kd.core.freeze

kd.freeze_bag(x)

Alias for kd.core.freeze_bag

kd.from_json(x, /, schema=OBJECT, default_number_schema=OBJECT, *, on_invalid=[], keys_attr='json_object_keys', values_attr='json_object_values')

Alias for kd.json.from_json

kd.from_proto(messages: Message | list[_NestedMessageContainer] | tuple[_NestedMessageContainer, ...] | None, /, *, extensions: list[str] | None = None, itemid: DataSlice | None = None, schema: DataSlice | None = None) -> DataSlice

Returns a DataSlice representing proto data.

Messages, primitive fields, repeated fields, and maps are converted to
equivalent Koda structures: objects/entities, primitives, lists, and dicts,
respectively. Enums are converted to INT32. The attribute names on the Koda
objects match the field names in the proto definition. See below for methods
to convert proto extensions to attributes alongside regular fields.

If schema is not specified or schema is kd.OBJECT, only present fields in
`messages` are loaded and included in the converted schema. To get a schema
that contains all fields independent of the data, use `kd.schema_from_proto`.

Proto extensions are ignored by default unless `extensions` is specified, or
if an explicit entity schema with parenthesized attr names is specified. If
both are specified, we use the union of the two extension sets.

The format of each extension specified in `extensions` is a dot-separated
sequence of field names and/or extension names, where extension names are
fully-qualified extension paths surrounded by parentheses. This sequence of
fields and extensions is traversed during conversion, in addition to the
default behavior of traversing all fields. For example:

  "path.to.field.(package_name.some_extension)"
  "path.to.repeated_field.(package_name.some_extension)"
  "path.to.map_field.values.(package_name.some_extension)"
  "path.(package_name.some_extension).(package_name2.nested_extension)"

If an explicit entity schema attr name starts with "(" and ends with ")" it is
also interpreted as an extension name.

Extensions are looked up using the C++ generated descriptor pool, using
`DescriptorPool::FindExtensionByName`, which requires that all extensions are
compiled in as C++ protos. The Koda attribute names for the extension fields
are parenthesized fully-qualified extension paths (e.g.
"(package_name.some_extension)" or
"(package_name.SomeMessage.some_extension)".) As the names contain '()' and
'.' characters, they cannot be directly accessed using '.name' syntax but can
be accessed using `.get_attr(name)'. For example,

  ds.get_attr('(package_name.AbcExtension.abc_extension)')
  ds.optional_field.get_attr('(package_name.DefExtension.def_extension)')

If `messages` is a single proto Message, the result is a DataItem. If it is a
nested list of proto Messages, the result is a DataSlice with the same number
of dimensions as the nesting level.

Args:
  messages: Message or nested list/tuple of Message of the same type. Any of
    the messages may be None, which will produce missing items in the result.
  extensions: List of proto extension paths.
  itemid: The ItemId(s) to use for the root object(s). If not specified, will
    allocate new id(s). If specified, will also infer the ItemIds for all
    child items such as List items from this id, so that repeated calls to
    this method on the same input will produce the same id(s) for everything.
    Use this with care to avoid unexpected collisions.
  schema: The schema to use for the return value. Can be set to kd.OBJECT to
    (recursively) create an object schema. Can be set to None (default) to
    create an uuschema based on the proto descriptor. When set to an entity
    schema, some fields may be set to kd.OBJECT to create objects from that
    point.

Returns:
  A DataSlice representing the proto data.

kd.from_proto_any(messages: Any | list[_NestedAnyMessageContainer] | tuple[_NestedAnyMessageContainer, ...] | None, /, *, extensions: list[str] | None = None, itemid: DataSlice | None = None, schema: DataSlice | None = None, message_type: type[Message] | None = None, descriptor_pool: DescriptorPool | None = None) -> DataSlice

Returns a DataSlice converted from a nested list of proto Any messages.

This function is similar to `from_proto`, but it first unpacks the Any
messages before converting them. Only the top-level Any message is unpacked:
if there are Any fields inside of the unpacked message, they are treated the
same as in `from_proto`.

If `message_type` is provided, all Any messages are unpacked into this type.
Otherwise, the type is inferred from the type URL in each Any message, and
the messages are looked up in the `descriptor_pool`. If `descriptor_pool` is
not provided, the default descriptor pool is used.

If `schema` is not explicitly provided, the resulting DataSlice will have an
OBJECT schema so that inputs with differing message types can be represented.

Args:
  messages: google.protobuf.Any message or nested list/tuple of
    google.protobuf.Any messages. Any of the messages may be None, which will
    produce missing items in the result.
  extensions: See `from_proto` for more details.
  itemid: See `from_proto` for more details.
  schema: See `from_proto` for more details.
  message_type: The type to unpack the Any messages into. If None, the type is
    inferred from the Any messages.
  descriptor_pool: The descriptor pool to use for looking up message types. If
    None, the default descriptor pool is used.

Returns:
  A DataSlice representing the unpacked and converted proto data.

kd.from_proto_bytes(x, proto_path, /, *, extensions=unspecified, itemids=unspecified, schema=unspecified, on_invalid=unspecified)

Alias for kd.proto.from_proto_bytes

kd.from_proto_json(x, proto_path, /, *, extensions=unspecified, itemids=unspecified, schema=unspecified, on_invalid=unspecified)

Alias for kd.proto.from_proto_json

kd.from_py(py_obj: Any, *, dict_as_obj: bool = False, itemid: DataSlice | None = None, schema: DataSlice | None = OBJECT, from_dim: int = 0) -> DataSlice

Aliases:

Converts Python object into DataSlice.

Can convert nested lists/dicts into Koda objects recursively as well.

Args:
  py_obj: Python object to convert.
  dict_as_obj: If True, will convert dicts with string keys into Koda objects
    instead of Koda dicts.
  itemid: The ItemId to use for the root object. If not specified, will
    allocate a new id. If specified, will also infer the ItemIds for all child
    items such as list items from this id, so that repeated calls to this
    method on the same input will produce the same id for everything. Use this
    with care to avoid unexpected collisions.
  schema: The schema to use for the return value. When this schema or one of
    its attributes is OBJECT (which is also the default), recursively creates
    objects from that point on.
  from_dim: The dimension to start creating Koda objects/lists/dicts from.
    `py_obj` must be a nested list of at least from_dim depth, and the outer
    from_dim dimensions will become the returned DataSlice dimensions. When
    from_dim is 0, the return value is therefore a DataItem.

Returns:
  A DataItem with the converted data.

kd.from_pytree(py_obj: Any, *, dict_as_obj: bool = False, itemid: DataSlice | None = None, schema: DataSlice | None = OBJECT, from_dim: int = 0) -> DataSlice

Alias for kd.from_py

kd.fstr(x)

Alias for kd.strings.fstr

kd.full_equal(x, y)

Alias for kd.comparison.full_equal

kd.get_attr(x, attr_name, default=unspecified)

Alias for kd.core.get_attr

kd.get_attr_names(x)

Alias for kd.core.get_attr_names

kd.get_bag(ds)

Alias for kd.core.get_bag

kd.get_dtype(ds)

Alias for kd.schema.get_dtype

kd.get_item(x, key_or_index)

Alias for kd.core.get_item

kd.get_item_schema(list_schema)

Alias for kd.schema.get_item_schema

kd.get_itemid(x)

Alias for kd.schema.get_itemid

kd.get_key_schema(dict_schema)

Alias for kd.schema.get_key_schema

kd.get_keys(dict_ds)

Alias for kd.dicts.get_keys

kd.get_metadata(x)

Alias for kd.core.get_metadata

kd.get_ndim(x)

Alias for kd.slices.get_ndim

kd.get_nofollowed_schema(schema)

Alias for kd.schema.get_nofollowed_schema

kd.get_obj_schema(x)

Alias for kd.schema.get_obj_schema

kd.get_primitive_schema(ds)

Alias for kd.schema.get_dtype

kd.get_proto_attr(x, field_name)

Alias for kd.proto.get_proto_attr

kd.get_repr(x, /, *, depth=25, item_limit=200, item_limit_per_dimension=25, format_html=False, max_str_len=100, max_expr_quote_len=10000, show_attributes=True, show_databag_id=False, show_shape=False, show_schema=False, show_item_id=False, show_present_count=False)

Alias for kd.slices.get_repr

kd.get_schema(x)

Alias for kd.schema.get_schema

kd.get_shape(x)

Alias for kd.shapes.get_shape

kd.get_value_schema(dict_schema)

Alias for kd.schema.get_value_schema

kd.get_values(dict_ds, key_ds=unspecified)

Alias for kd.dicts.get_values

kd.greater(x, y)

Alias for kd.comparison.greater

kd.greater_equal(x, y)

Alias for kd.comparison.greater_equal

kd.group_by(x, *keys, sort=False)

Alias for kd.slices.group_by

kd.group_by_indices(*keys, sort=False)

Alias for kd.slices.group_by_indices

kd.has(x)

Alias for kd.masking.has

kd.has_attr(x, attr_name)

Alias for kd.core.has_attr

kd.has_bag(ds)

Alias for kd.core.has_bag

kd.has_dict(x)

Alias for kd.dicts.has_dict

kd.has_entity(x)

Alias for kd.core.has_entity

kd.has_fn(x)

Alias for kd.functor.has_fn

kd.has_list(x)

Alias for kd.lists.has_list

kd.has_not(x)

Alias for kd.masking.has_not

kd.has_primitive(x)

Alias for kd.core.has_primitive

kd.hash_itemid(x)

Alias for kd.ids.hash_itemid

kd.if_(cond, yes_fn, no_fn, *args, return_type_as=None, **kwargs)

Alias for kd.functor.if_

kd.implode(x: DataSlice, /, ndim: int | DataSlice = 1, itemid: DataSlice | None = None) -> DataSlice

Alias for kd.lists.implode

kd.index(x, dim=-1)

Alias for kd.slices.index

kd.int32(x: Any) -> DataSlice

Alias for kd.slices.int32

kd.int64(x: Any) -> DataSlice

Alias for kd.slices.int64

kd.inverse_mapping(x, ndim=unspecified)

Alias for kd.slices.inverse_mapping

kd.inverse_select(ds, fltr)

Alias for kd.slices.inverse_select

kd.is_dict(x)

Alias for kd.dicts.is_dict

kd.is_empty(x)

Alias for kd.slices.is_empty

kd.is_entity(x)

Alias for kd.core.is_entity

kd.is_expandable_to(x, target, ndim=unspecified)

Alias for kd.slices.is_expandable_to

kd.is_expr(obj: Any) -> DataSlice

Returns kd.present if the given object is an Expr and kd.missing otherwise.

kd.is_fn(obj: Any) -> DataItem

Alias for kd.functor.is_fn

kd.is_item(obj: Any) -> DataSlice

Returns kd.present if the given object is a scalar DataItem and kd.missing otherwise.

kd.is_list(x)

Alias for kd.lists.is_list

kd.is_nan(x)

Alias for kd.math.is_nan

kd.is_null_bag(bag)

Alias for kd.bags.is_null_bag

kd.is_primitive(x)

Alias for kd.core.is_primitive

kd.is_shape_compatible(x, y)

Alias for kd.slices.is_shape_compatible

kd.is_slice(obj: Any) -> DataSlice

Returns kd.present if the given object is a DataSlice and kd.missing otherwise.

kd.isin(x, y)

Alias for kd.slices.isin

kd.item(x, /, schema=None)

Alias for kd.types.DataItem.from_vals

kd.less(x, y)

Alias for kd.comparison.less

kd.less_equal(x, y)

Alias for kd.comparison.less_equal

kd.list(items: Any | None = None, *, item_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSlice

Creates list(s) by collapsing `items` into an immutable list.

If there is no argument, returns an empty Koda List.
If the argument is a Python list, creates a nested Koda List.

Examples:
list() -> a single empty Koda List
list([1, 2, 3]) -> Koda List with items 1, 2, 3
list([[1, 2, 3], [4, 5]]) -> nested Koda List [[1, 2, 3], [4, 5]]
  # items are Koda lists.

Args:
  items: The items to use. If not specified, an empty list of OBJECTs will be
    created.
  item_schema: the schema of the list items. If not specified, it will be
    deduced from `items` or defaulted to OBJECT.
  schema: The schema to use for the list. If specified, then item_schema must
    not be specified.
  itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.

Returns:
  The slice with list/lists.

kd.list_append_update(x, append)

Alias for kd.lists.list_append_update

kd.list_like(shape_and_mask_from: DataSlice, /, items: Any | None = None, *, item_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSlice

Alias for kd.lists.like

kd.list_schema(item_schema)

Alias for kd.schema.list_schema

kd.list_shaped(shape: JaggedShape, /, items: Any | None = None, *, item_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSlice

Alias for kd.lists.shaped

kd.list_shaped_as(shape_from: DataSlice, /, items: Any | None = None, *, item_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSlice

Alias for kd.lists.shaped_as

kd.list_size(list_slice)

Alias for kd.lists.size

kd.load(path: str, /, *, fs: Any | None = None) -> Any

Alias for kd.s11n.load

kd.loads(x: bytes) -> Any

Alias for kd.s11n.loads

kd.map(fn, *args, include_missing=False, **kwargs)

Alias for kd.functor.map

kd.map_py(fn, *args, schema=None, max_threads=1, ndim=0, include_missing=None, dict_as_obj=False, item_completed_callback=None, **kwargs)

Alias for kd.py.map_py

kd.map_py_on_cond(true_fn, false_fn, cond, *args, schema=None, max_threads=1, dict_as_obj=False, item_completed_callback=None, **kwargs)

Alias for kd.py.map_py_on_cond

kd.map_py_on_selected(fn, cond, *args, schema=None, max_threads=1, dict_as_obj=False, item_completed_callback=None, **kwargs)

Alias for kd.py.map_py_on_selected

kd.mask(x: Any) -> DataSlice

Alias for kd.slices.mask

kd.mask_and(x, y)

Alias for kd.masking.mask_and

kd.mask_equal(x, y)

Alias for kd.masking.mask_equal

kd.mask_not_equal(x, y)

Alias for kd.masking.mask_not_equal

kd.mask_or(x, y)

Alias for kd.masking.mask_or

kd.max(x)

Alias for kd.math.max

kd.maximum(x, y)

Alias for kd.math.maximum

kd.maybe(x, attr_name)

Alias for kd.core.maybe

kd.metadata(x, /, **attrs)

Alias for kd.core.metadata

kd.min(x)

Alias for kd.math.min

kd.minimum(x, y)

Alias for kd.math.minimum

kd.missing

A mask value representing absence.

kd.mutable_bag()

Alias for kd.types.DataBag.empty_mutable

kd.named_container()

A container that automatically names expressions.

In eager mode, non-expression inputs are stored as-is. In tracing mode,
they are converted to expressions (functions and lambdas are automatically
traced).

Example:
  c = kd.named_container()

  # 1. Non-tracing mode
  # Storing a value:
  c.foo = 5
  c.foo       # Returns 5

  # Storing an expression:
  c.x_plus_y = I.x + I.y
  c.x_plus_y  # Returns (I.x + I.y).with_name('x_plus_y')

  # Listing stored items:
  vars(c)  # Returns {'foo': 5, 'x_plus_y': (I.x + I.y).with_name('x_plus_y')}

  # 2. Tracing mode
  def my_fn(x):
    c = kd.named_container()
    c.a = 2
    c.b = 1
    return c.a * x + c.b

  fn = kd.fn(my_fn)
  fn.a  # Returns 2 (accessible because it was named by the container)
  fn(x=5)  # Returns 11

kd.named_schema(name, /, **kwargs)

Alias for kd.schema.named_schema

kd.namedtuple(**kwargs)

Alias for kd.tuples.namedtuple

kd.new(*, schema: DataSlice | str | None = None, overwrite_schema: bool = False, itemid: DataSlice | None = None, **attrs: Any) -> DataSlice

Alias for kd.entities.new

kd.new_dictid()

Alias for kd.allocation.new_dictid

kd.new_dictid_like(shape_and_mask_from)

Alias for kd.allocation.new_dictid_like

kd.new_dictid_shaped(shape)

Alias for kd.allocation.new_dictid_shaped

kd.new_dictid_shaped_as(shape_from)

Alias for kd.allocation.new_dictid_shaped_as

kd.new_itemid()

Alias for kd.allocation.new_itemid

kd.new_itemid_like(shape_and_mask_from)

Alias for kd.allocation.new_itemid_like

kd.new_itemid_shaped(shape)

Alias for kd.allocation.new_itemid_shaped

kd.new_itemid_shaped_as(shape_from)

Alias for kd.allocation.new_itemid_shaped_as

kd.new_like(shape_and_mask_from: DataSlice, /, *, schema: DataSlice | str | None = None, overwrite_schema: bool = False, itemid: DataSlice | None = None, **attrs: Any) -> DataSlice

Alias for kd.entities.like

kd.new_listid()

Alias for kd.allocation.new_listid

kd.new_listid_like(shape_and_mask_from)

Alias for kd.allocation.new_listid_like

kd.new_listid_shaped(shape)

Alias for kd.allocation.new_listid_shaped

kd.new_listid_shaped_as(shape_from)

Alias for kd.allocation.new_listid_shaped_as

kd.new_shaped(shape: JaggedShape, /, *, schema: DataSlice | str | None = None, overwrite_schema: bool = False, itemid: DataSlice | None = None, **attrs: Any) -> DataSlice

Alias for kd.entities.shaped

kd.new_shaped_as(shape_from: DataSlice, /, *, schema: DataSlice | str | None = None, overwrite_schema: bool = False, itemid: DataSlice | None = None, **attrs: Any) -> DataSlice

Alias for kd.entities.shaped_as

kd.no_bag(ds)

Alias for kd.core.no_bag

kd.nofollow(x)

Alias for kd.core.nofollow

kd.nofollow_schema(schema)

Alias for kd.schema.nofollow_schema

kd.not_equal(x, y)

Alias for kd.comparison.not_equal

kd.obj(arg: Any = unspecified, /, *, itemid: DataSlice | None = None, **attrs: Any) -> DataSlice

Alias for kd.objs.new

kd.obj_like(shape_and_mask_from: DataSlice, /, *, itemid: DataSlice | None = None, **attrs: Any) -> DataSlice

Alias for kd.objs.like

kd.obj_shaped(shape: JaggedShape, /, *, itemid: DataSlice | None = None, **attrs: Any) -> DataSlice

Alias for kd.objs.shaped

kd.obj_shaped_as(shape_from: DataSlice, /, *, itemid: DataSlice | None = None, **attrs: Any) -> DataSlice

Alias for kd.objs.shaped_as

kd.ordinal_rank(x, tie_breaker=unspecified, descending=False, ndim=unspecified)

Alias for kd.slices.ordinal_rank

kd.present

A mask value representing presence.

kd.present_like(x)

Alias for kd.masking.present_like

kd.present_shaped(shape)

Alias for kd.masking.present_shaped

kd.present_shaped_as(x)

Alias for kd.masking.present_shaped_as

kd.pwl_curve(p, adjustments)

Alias for kd.curves.pwl_curve

kd.py_fn(f: Callable[..., Any], *, return_type_as: Any = <class 'koladata.types.data_slice.DataSlice'>, **defaults: Any) -> DataItem

Alias for kd.functor.py_fn

kd.py_reference(obj: Any) -> PyObject

Wraps into a Arolla QValue using reference for serialization.

py_reference can be used to pass arbitrary python objects through
kd.apply_py/kd.py_fn.

Note that using reference for serialization means that the resulting
QValue (and Exprs created using it) will only be valid within the
same process. Trying to deserialize it in a different process
will result in an exception.

Args:
  obj: the python object to wrap.
Returns:
  The wrapped python object as Arolla QValue.

kd.randint_like(x, low=unspecified, high=unspecified, seed=unspecified)

Alias for kd.random.randint_like

kd.randint_shaped(shape, low=unspecified, high=unspecified, seed=unspecified)

Alias for kd.random.randint_shaped

kd.randint_shaped_as(x, low=unspecified, high=unspecified, seed=unspecified)

Alias for kd.random.randint_shaped_as

kd.range(start, end=unspecified)

Alias for kd.slices.range

kd.ref(ds)

Alias for kd.core.ref

kd.register_py_fn(f: Callable[..., Any], *, return_type_as: Any = <class 'koladata.types.data_slice.DataSlice'>, unsafe_override: bool = False, **defaults: Any) -> DataItem

Alias for kd.functor.register_py_fn

kd.reify(ds, source)

Alias for kd.core.reify

kd.repeat(x, sizes)

Alias for kd.slices.repeat

kd.repeat_present(x, sizes)

Alias for kd.slices.repeat_present

kd.reshape(x, shape)

Alias for kd.shapes.reshape

kd.reshape_as(x, shape_from)

Alias for kd.shapes.reshape_as

kd.reverse(ds)

Alias for kd.slices.reverse

kd.reverse_select(ds, fltr)

Alias for kd.slices.inverse_select

kd.sample(x, ratio, seed, key=unspecified)

Alias for kd.random.sample

kd.sample_n(x, n, seed, key=unspecified)

Alias for kd.random.sample_n

kd.schema_from_proto(message_class: type[Message], /, *, extensions: list[str] | None = None) -> SchemaItem

Returns a Koda schema representing a proto message class.

This is similar to `from_proto(x).get_schema()` when `x` is an instance of
`message_class`, except that it eagerly adds all non-extension fields to the
schema instead of only adding fields that have data populated in `x`.

The returned schema is a uuschema whose itemid is a function of the proto
message class' fully qualified name, and any child message classes' schemas
are also uuschemas derived in the same way. The returned schema has the same
itemid as `from_proto(message_class()).get_schema()`.

The format of each extension specified in `extensions` is a dot-separated
sequence of field names and/or extension names, where extension names are
fully-qualified extension paths surrounded by parentheses. For example:

  "path.to.field.(package_name.some_extension)"
  "path.to.repeated_field.(package_name.some_extension)"
  "path.to.map_field.values.(package_name.some_extension)"
  "path.(package_name.some_extension).(package_name2.nested_extension)"

Args:
  message_class: A proto message class to convert.
  extensions: List of proto extension paths.

Returns:
  A SchemaItem containing the converted schema.

kd.schema_from_proto_path(proto_path, /, *, extensions=Entity:#5ikYYvXepp19g47QDLnJR2)

Alias for kd.proto.schema_from_proto_path

kd.schema_from_py(tpe: type[Any]) -> SchemaItem

Alias for kd.schema.schema_from_py

kd.select(ds, fltr, expand_filter=True)

Alias for kd.slices.select

kd.select_items(ds, fltr)

Alias for kd.lists.select_items

kd.select_keys(ds, fltr)

Alias for kd.dicts.select_keys

kd.select_present(ds)

Alias for kd.slices.select_present

kd.select_values(ds, fltr)

Alias for kd.dicts.select_values

kd.set_attr(x: DataSlice, attr_name: str, value: Any, overwrite_schema: bool = False)

Sets an attribute `attr_name` to `value`.

If `overwrite_schema` is True and `x` is either an Entity with explicit schema
or an Object where some items are entities with explicit schema, it will get
updated with `value`'s schema first.

Args:
  x: a DataSlice on which to set the attribute. Must have DataBag attached.
  attr_name: attribute name
  value: a DataSlice or convertible to a DataSlice that will be assigned as an
    attribute.
  overwrite_schema: whether to overwrite the schema before setting an
    attribute.

kd.set_attrs(x: DataSlice, *, overwrite_schema: bool = False, **attrs: Any)

Sets multiple attributes on an object / entity.

Args:
  x: a DataSlice on which attributes are set. Must have DataBag attached.
  overwrite_schema: whether to overwrite the schema before setting an
    attribute.
  **attrs: attribute values that are converted to DataSlices with DataBag
    adoption.

kd.set_schema(x: DataSlice, schema: DataSlice) -> DataSlice

Returns a copy of `x` with the provided `schema`.

If `schema` is an Entity schema and has a different DataBag than `x`, it is
merged into the DataBag of `x`.

It only changes the schemas of `x` and does not change the items in `x`. To
change the items in `x`, use `kd.cast_to` instead. For example,

  kd.set_schema(kd.ds([1, 2, 3]), kd.FLOAT32) -> fails because the items in
      `x` are not compatible with FLOAT32.
  kd.cast_to(kd.ds([1, 2, 3]), kd.FLOAT32) -> kd.ds([1.0, 2.0, 3.0])

When items in `x` are primitives or `schemas` is a primitive schema, it checks
items and schema are compatible. When items are ItemIds and `schema` is a
non-primitive schema, it does not check the underlying data matches the
schema. For example,

  kd.set_schema(kd.ds([1, 2, 3], schema=kd.OBJECT), kd.INT32)
    -> kd.ds([1, 2, 3])
  kd.set_schema(kd.ds([1, 2, 3]), kd.INT64) -> fail
  kd.set_schema(kd.ds(1).with_bag(kd.bag()), kd.schema.new_schema(x=kd.INT32))
  ->
  fail
  kd.set_schema(kd.new(x=1), kd.INT32) -> fail
  kd.set_schema(kd.new(x=1), kd.schema.new_schema(x=kd.INT64)) -> work

Args:
  x: DataSlice to change the schema of.
  schema: DataSlice containing the new schema.

Returns:
  DataSlice with the new schema.

kd.shallow_clone(x, /, *, itemid=unspecified, schema=unspecified, **overrides)

Alias for kd.core.shallow_clone

kd.shuffle(x, /, ndim=unspecified, seed=unspecified)

Alias for kd.random.shuffle

kd.size(x)

Alias for kd.slices.size

kd.slice(x, /, schema=None)

Alias for kd.types.DataSlice.from_vals

kd.sort(x, sort_by=unspecified, descending=False)

Alias for kd.slices.sort

kd.stack(*args, ndim=0)

Alias for kd.slices.stack

kd.static_when_tracing(base_type: TypeConstraint | None = None) -> _StaticWhenTraced

Alias for kd.type_checking.static_when_tracing

kd.str(x: Any) -> DataSlice

Alias for kd.slices.str

kd.strict_attrs(x, /, **attrs)

Alias for kd.core.strict_attrs

kd.strict_new(*, schema, overwrite_schema=False, itemid=unspecified, **attrs)

Alias for kd.entities.strict_new

kd.strict_with_attrs(x, /, **attrs)

Alias for kd.core.strict_with_attrs

kd.stub(x, attrs=[])

Alias for kd.core.stub

kd.subslice(x, *slices)

Alias for kd.slices.subslice

kd.sum(x)

Alias for kd.math.sum

kd.switch(key, cases, *args, return_type_as=None, **kwargs)

Alias for kd.functor.switch

kd.take(x, indices)

Alias for kd.slices.at

kd.tile(x, shape)

Alias for kd.slices.tile

kd.to_expr(x)

Alias for kd.schema.to_expr

kd.to_itemid(x)

Alias for kd.schema.get_itemid

kd.to_json(x, /, *, indent=None, ensure_ascii=True, keys_attr='json_object_keys', values_attr='json_object_values', include_missing_values=True)

Alias for kd.json.to_json

kd.to_none(x)

Alias for kd.schema.to_none

kd.to_object(x)

Alias for kd.schema.to_object

kd.to_proto(x: DataSlice, /, message_class: type[Message]) -> Message | list[_NestedMessageList] | None

Converts a DataSlice or DataItem to one or more proto messages.

If `x` is a DataItem, this returns a single proto message object. Otherwise,
this returns a nested list of proto message objects with the same size and
shape as the input. Missing items in the input are returned as python None in
place of a message.

Koda data structures are converted to equivalent proto messages, primitive
fields, repeated fields, maps, and enums, based on the proto schema. Koda
entity attributes are converted to message fields with the same name, if
those fields exist, otherwise they are ignored.

Koda slices with mixed underlying dtypes are tolerated wherever the proto
conversion is defined for all dtypes, regardless of schema.

Koda entity attributes that are parenthesized fully-qualified extension
paths (e.g. "(package_name.some_extension)") are converted to extensions,
if those extensions exist in the descriptor pool of the messages' common
descriptor, otherwise they are ignored.

Args:
  x: DataSlice to convert.
  message_class: A proto message class.

Returns:
  A converted proto message or list of converted proto messages.

kd.to_proto_any(x: DataSlice, *, descriptor_pool: DescriptorPool | None = None, deterministic: bool = False) -> Any | list[_NestedAnyMessageList] | None

Converts a DataSlice or DataItem to proto Any messages.

The schemas of all present values in `x` must have been derived from a proto
schema using `from_proto` or `schema_from_proto`, so that the original names
of the message types are embedded in the schema. Otherwise, this will fail.

Args:
  x: DataSlice to convert.
  descriptor_pool: Overrides the descriptor pool used to look up python proto
    message classes based on proto message type full name. If None, the
    default descriptor pool is used.
  deterministic: Passed to Any.Pack.

Returns:
  A proto Any message or nested list of proto Any messages with the same
  shape as the input. Missing elements in the input are None in the output.

kd.to_proto_bytes(x, proto_path, /)

Alias for kd.proto.to_proto_bytes

kd.to_proto_json(x, proto_path, /)

Alias for kd.proto.to_proto_json

kd.to_py(ds: DataSlice, *, max_depth: int = 2, obj_as_dict: bool = False, include_missing_attrs: bool = True, output_class: type[Any] | None = None) -> Any

Returns a readable python object from a DataSlice.

Attributes, lists, and dicts are recursively converted to Python objects.

Args:
  ds: A DataSlice
  max_depth: Maximum depth for recursive conversion. Each attribute, list item
    and dict keys / values access represent 1 depth increment. Use -1 for
    unlimited depth. If output_class is set, this is ignored and the depth is
    determined by the output_class.
  obj_as_dict: Whether to convert objects to python dicts. By default objects
    are converted to automatically constructed 'Obj' dataclass instances.
  include_missing_attrs: whether to include attributes with None value in
    objects.
  output_class: If not None, will be used recursively as the output type.

kd.to_pylist(x: DataSlice) -> list[Any]

Expands the outermost DataSlice dimension into a list of DataSlices.

kd.to_pytree(ds: DataSlice, max_depth: int = 2, include_missing_attrs: bool = True) -> Any

No description

kd.to_schema(x)

Alias for kd.schema.to_schema

kd.trace_as_fn(*, name: str | None = None, return_type_as: Any = None, functor_factory: FunctorFactory | None = None)

Alias for kd.functor.trace_as_fn

kd.trace_py_fn(f: Callable[..., Any], *, auto_variables: bool = True, **defaults: Any) -> DataItem

Alias for kd.functor.trace_py_fn

kd.translate(keys_to, keys_from, values_from)

Alias for kd.slices.translate

kd.translate_group(keys_to, keys_from, values_from)

Alias for kd.slices.translate_group

kd.tuple(*args)

Alias for kd.tuples.tuple

kd.unique(x, sort=False)

Alias for kd.slices.unique

kd.update_schema(obj: DataSlice, **attr_schemas: Any) -> DataSlice

Updates the schema of `obj` DataSlice using given schemas for attrs.

kd.updated(ds, *bag)

Alias for kd.core.updated

kd.updated_bag(*bags)

Alias for kd.bags.updated

kd.uu(seed: str | None = None, *, schema: DataSlice | None = None, overwrite_schema: bool = False, **attrs: Any) -> DataSlice

Alias for kd.entities.uu

kd.uu_schema(seed='', **kwargs)

Alias for kd.schema.uu_schema

kd.uudict(items_or_keys, values=unspecified, *, key_schema=unspecified, value_schema=unspecified, schema=unspecified, seed='')

Alias for kd.dicts.uu

kd.uuid(seed='', **kwargs)

Alias for kd.ids.uuid

kd.uuid_for_dict(seed='', **kwargs)

Alias for kd.ids.uuid_for_dict

kd.uuid_for_list(seed='', **kwargs)

Alias for kd.ids.uuid_for_list

kd.uuids_with_allocation_size(seed='', *, size)

Alias for kd.ids.uuids_with_allocation_size

kd.uulist(items=unspecified, *, item_schema=unspecified, schema=unspecified, seed='')

Alias for kd.lists.uu

kd.uuobj(seed: str | None = None, **attrs: Any) -> DataSlice

Alias for kd.objs.uu

kd.val_like(x, val)

Alias for kd.slices.val_like

kd.val_shaped(shape, val)

Alias for kd.slices.val_shaped

kd.val_shaped_as(x, val)

Alias for kd.slices.val_shaped_as

kd.while_(condition_fn, body_fn, *, returns=unspecified, yields=unspecified, yields_interleaved=unspecified, **initial_state)

Alias for kd.functor.while_

kd.with_attr(x, attr_name, value, overwrite_schema=False)

Alias for kd.core.with_attr

kd.with_attrs(x, /, *, overwrite_schema=False, **attrs)

Alias for kd.core.with_attrs

kd.with_bag(ds, bag)

Alias for kd.core.with_bag

kd.with_dict_update(x, keys, values=unspecified)

Alias for kd.dicts.with_dict_update

kd.with_list_append_update(x, append)

Alias for kd.lists.with_list_append_update

kd.with_merged_bag(ds)

Alias for kd.core.with_merged_bag

kd.with_metadata(x, /, **attrs)

Alias for kd.core.with_metadata

kd.with_name(obj: Any, name: str | Text) -> Any

Alias for kd.types.DataBag.with_name

kd.with_print(x, *args, sep=' ', end='\n')

Alias for kd.core.with_print

kd.with_schema(x, schema)

Alias for kd.schema.with_schema

kd.with_schema_from_obj(x)

Alias for kd.schema.with_schema_from_obj

kd.with_timestamp(x)

Alias for kd.core.with_timestamp

kd.xor(x, y)

Alias for kd.masking.xor

kd.zip(*args)

Alias for kd.slices.zip