Home
Overview
Fundamentals
Glossary
Cheatsheet
API Reference
Quick Recipes
Deep Dive
Common Pitfalls and Gotchas
Persistent Storage
kd module is the common container for all operators whereas kd.eager and
kd.lazy modules are containers for explicitly eager and lazy operators
respectively.
While most of operators below have both eager and lazy versions (e.g.
kd.eager.agg_sum vs kd.lazy.agg_sum), some functions
(e.g. kd.sub(expr, *subs)) only have eager version. Such functions often take
Exprs or Functors as inputs and does not make sense to have a lazy version.
Note that operators from extension modules (e.g. kd_ext.npkd) are not
included in the kd.* namespace.
| Subcategory | Description |
|---|---|
| allocation | Operators that allocate new ItemIds. |
| annotation | Annotation operators. |
| assertion | Operators that assert properties of DataSlices. |
| bags | Operators that work on DataBags. |
| bitwise | Bitwise operators. |
| comparison | Operators that compare DataSlices. |
| core | Core operators that are not part of other categories. |
| curves | Operators working with curves. |
| dicts | Operators working with dictionaries. |
| entities | Operators that work solely with entities. |
| expr | Expr utilities. |
| extension_types | Extension type operators. |
| file_io | Utilities for interacting with the file system. |
| functor | Operators to create and call functors. |
| ids | Operators that work with ItemIds. |
| iterables | Operators that work with iterables. |
| json | JSON serialization and parsing operators. |
| json_stream | JSON stream processing operators. |
| lists | Operators working with lists. |
| masking | Masking operators. |
| math | Arithmetic operators. |
| objs | Operators that work solely with objects. |
| optools | Operator definition and registration tooling. |
| parallel | Tools to evaluate functors in parallel. |
| proto | Protocol buffer serialization and parsing operators. |
| py | Operators that call Python functions. |
| qtypes | Koda QTypes. |
| random | Random and sampling operators. |
| s11n | Serialization and deserialization utilities. |
| schema | Schema-related operators. |
| schema_filters | Schema filter operators and constants. |
| shapes | Operators that work on shapes. |
| slices | Operators that perform DataSlice transformations. |
| streams | Operators that work with streams of items. |
| strings | Operators that work with strings data. |
| testing | A front-end module for kd.testing.*. |
| tuples | Operators to create tuples. |
| type_checking | Utilities to annotatate functions with type checking. |
| types | Types used as type annotations in users's code. |
kd.BOOLEANSchemaItem representing booleans.
kd.BYTESSchemaItem representing byte strings.
kd.EXPRSchemaItem representing expressions.
kd.FLOAT32SchemaItem representing 32-bit floats.
kd.FLOAT64SchemaItem representing 64-bit floats.
kd.INT32SchemaItem representing 32-bit integers.
kd.INT64SchemaItem representing 64-bit integers.
kd.ITEMIDSchemaItem representing ItemIds.
kd.MASKSchemaItem representing masks.
kd.NONESchemaItem representing the None schema.
kd.OBJECTSchemaItem representing generic objects.
kd.SCHEMASchemaItem representing schemas.
kd.STRINGSchemaItem representing Unicode strings.
kd.SWITCH_DEFAULTMarks the default case in a kd.switch() construct.
kd.agg_all(x, ndim=unspecified)Alias for kd.masking.agg_all
kd.agg_any(x, ndim=unspecified)Alias for kd.masking.agg_any
kd.agg_count(x, ndim=unspecified)Alias for kd.slices.agg_count
kd.agg_has(x, ndim=unspecified)Alias for kd.masking.agg_has
kd.agg_max(x, ndim=unspecified)Alias for kd.math.agg_max
kd.agg_min(x, ndim=unspecified)Alias for kd.math.agg_min
kd.agg_size(x, ndim=unspecified)Alias for kd.slices.agg_size
kd.agg_sum(x, ndim=unspecified)Alias for kd.math.agg_sum
kd.agg_uuid(x, ndim=unspecified)Alias for kd.ids.agg_uuid
kd.align(*args)Alias for kd.slices.align
kd.all(x)Alias for kd.masking.all
kd.any(x)Alias for kd.masking.any
kd.appended_list(x, append)Alias for kd.lists.appended_list
kd.apply_mask(x, y)Alias for kd.masking.apply_mask
kd.apply_py(fn, *args, return_type_as=unspecified, **kwargs)Alias for kd.py.apply_py
kd.argmax(x, ndim=unspecified)Alias for kd.math.argmax
kd.argmin(x, ndim=unspecified)Alias for kd.math.argmin
kd.at(x, indices)Alias for kd.slices.at
kd.attr(x, attr_name, value, overwrite_schema=False)Alias for kd.core.attr
kd.attrs(x, /, *, overwrite_schema=False, **attrs)Alias for kd.core.attrs
kd.bag()Alias for kd.types.DataBag.empty
kd.bind(fn_def: DataItem, /, *args: Any, return_type_as: Any = <class 'koladata.types.data_slice.DataSlice'>, **kwargs: Any) -> DataItemAlias for kd.functor.bind
kd.bitwise_and(x, y)Alias for kd.bitwise.bitwise_and
kd.bitwise_count(x)Alias for kd.bitwise.count
kd.bitwise_invert(x)Alias for kd.bitwise.invert
kd.bitwise_or(x, y)Alias for kd.bitwise.bitwise_or
kd.bitwise_xor(x, y)Alias for kd.bitwise.bitwise_xor
kd.bool(x: Any) -> DataSliceAlias for kd.slices.bool
kd.bytes(x: Any) -> DataSliceAlias for kd.slices.bytes
kd.call(fn, *args, return_type_as=None, **kwargs)Alias for kd.functor.call
kd.cast_to(x, schema)Alias for kd.schema.cast_to
kd.check_inputs(**kw_constraints: TypeConstraint)Alias for kd.type_checking.check_inputs
kd.check_output(constraint: TypeConstraint)Alias for kd.type_checking.check_output
kd.cityhash(x, seed)Alias for kd.random.cityhash
kd.clone(x, /, *, itemid=unspecified, schema=unspecified, **overrides)Alias for kd.core.clone
kd.clone_as_full(x, /, *, itemid=unspecified, **overrides)Alias for kd.core.clone_as_full
kd.coalesce(x, y)Alias for kd.masking.coalesce
kd.collapse(x, ndim=unspecified)Alias for kd.slices.collapse
kd.concat(*args, ndim=1)Alias for kd.slices.concat
kd.concat_lists(*lists: DataSlice) -> DataSliceAlias for kd.lists.concat
kd.cond(condition, yes, no=None)Alias for kd.masking.cond
kd.count(x)Alias for kd.slices.count
kd.cum_count(x, ndim=unspecified)Alias for kd.slices.cum_count
kd.cum_max(x, ndim=unspecified)Alias for kd.math.cum_max
kd.decode_itemid(ds)Alias for kd.ids.decode_itemid
kd.deep_clone(x, /, schema=unspecified, **overrides)Alias for kd.core.deep_clone
kd.deep_uuid(x, /, schema=unspecified, *, seed='')Alias for kd.ids.deep_uuid
kd.del_attr(x: DataSlice, attr_name: str)Deletes an attribute `attr_name` from `x`.
kd.dense_rank(x, descending=False, ndim=unspecified)Alias for kd.slices.dense_rank
kd.dict(items_or_keys: Any | None = None, values: Any | None = None, *, key_schema: DataSlice | None = None, value_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSliceAlias for kd.dicts.new
kd.dict_like(shape_and_mask_from: DataSlice, /, items_or_keys: Any | None = None, values: Any | None = None, *, key_schema: DataSlice | None = None, value_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSliceAlias for kd.dicts.like
kd.dict_schema(key_schema, value_schema)Alias for kd.schema.dict_schema
kd.dict_shaped(shape: JaggedShape, /, items_or_keys: Any | None = None, values: Any | None = None, key_schema: DataSlice | None = None, value_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSliceAlias for kd.dicts.shaped
kd.dict_shaped_as(shape_from: DataSlice, /, items_or_keys: Any | None = None, values: Any | None = None, key_schema: DataSlice | None = None, value_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSliceAlias for kd.dicts.shaped_as
kd.dict_size(dict_slice)Alias for kd.dicts.size
kd.dict_update(x, keys, values=unspecified)Alias for kd.dicts.dict_update
kd.dir(x: DataSlice, *, intersection: bool | None = None) -> list[str]Returns a sorted list of unique attribute names of the given DataSlice.
In case of OBJECT schema, attribute names are fetched from the `__schema__`
attribute. In case of Entity schema, the attribute names are fetched from the
schema. In case of primitives, an empty list is returned.
Args:
x: A DataSlice.
intersection: If True, the intersection of all object attributes is
returned. If False, the union is returned. If not specified, raises an
error if objects have different attributes.
Returns:
A list of unique attributes sorted by alphabetical order.
kd.disjoint_coalesce(x, y)Alias for kd.masking.disjoint_coalesce
kd.duck_dict(key_constraint: TypeConstraint, value_constraint: TypeConstraint)Alias for kd.type_checking.duck_dict
kd.duck_list(item_constraint: TypeConstraint)Alias for kd.type_checking.duck_list
kd.duck_type(**kwargs: TypeConstraint)Alias for kd.type_checking.duck_type
kd.dump(x: DataSlice | DataBag, path: str, /, *, overwrite: bool = False, riegeli_options: str | None = None, fs: Any | None = None) -> NoneAlias for kd.s11n.dump
kd.dumps(x: DataSlice | DataBag, /, *, riegeli_options: str | None = None) -> bytesAlias for kd.s11n.dumps
kd.embed_schema(x: DataSlice) -> DataSliceReturns a DataSlice with OBJECT schema.
* For primitives no data change is done.
* For Entities schema is stored as '__schema__' attribute.
* Embedding Entities requires a DataSlice to be associated with a DataBag.
Args:
x: (DataSlice) whose schema is embedded.
kd.empty_shaped(shape, schema=MASK)Alias for kd.slices.empty_shaped
kd.empty_shaped_as(shape_from, schema=MASK)Alias for kd.slices.empty_shaped_as
kd.encode_itemid(ds)Alias for kd.ids.encode_itemid
kd.enriched(ds, *bag)Alias for kd.core.enriched
kd.enriched_bag(*bags)Alias for kd.bags.enriched
kd.equal(x, y)Alias for kd.comparison.equal
kd.eval(expr: Any, self_input: Any = UNSPECIFIED_SELF_INPUT, /, **input_values: Any) -> AnyReturns the expr evaluated on the given `input_values`.
Only Koda Inputs from container `I` (e.g. `I.x`) can be evaluated. Other
input types must be substituted before calling this function.
Args:
expr: Koda expression with inputs from container `I`.
self_input: The value for I.self input. When not provided, it will still
have a default value that can be passed to a subroutine.
**input_values: Values to evaluate `expr` with. Note that all inputs in
`expr` must be present in the input values. All input values should either
be DataSlices or convertible to DataSlices.
kd.expand_to(x, target, ndim=unspecified)Alias for kd.slices.expand_to
kd.expand_to_shape(x, shape, ndim=unspecified)Alias for kd.shapes.expand_to_shape
kd.experimental_safer_loads(x: bytes) -> AnyAlias for kd.s11n.experimental_safer_loads
kd.explode(x, ndim=1)Alias for kd.lists.explode
kd.expr_quote(x: Any) -> DataSliceAlias for kd.slices.expr_quote
kd.extension_type(unsafe_override=False) -> Callable[[type[Any]], type[Any]]Alias for kd.extension_types.extension_type
kd.extract(ds, schema=unspecified)Alias for kd.core.extract
kd.extract_update(ds, schema=unspecified)Alias for kd.core.extract_update
kd.flat_map_chain(iterable, fn, value_type_as=None)Alias for kd.functor.flat_map_chain
kd.flat_map_interleaved(iterable, fn, value_type_as=None)Alias for kd.functor.flat_map_interleaved
kd.flatten(x, from_dim=0, to_dim=unspecified)Alias for kd.shapes.flatten
kd.flatten_end(x, n_times=1)Alias for kd.shapes.flatten_end
kd.float32(x: Any) -> DataSliceAlias for kd.slices.float32
kd.float64(x: Any) -> DataSliceAlias for kd.slices.float64
kd.fn(f: Any, *, use_tracing: bool = True, **kwargs: Any) -> DataItemAlias for kd.functor.fn
kd.follow(x)Alias for kd.core.follow
kd.for_(iterable, body_fn, *, finalize_fn=unspecified, condition_fn=unspecified, returns=unspecified, yields=unspecified, yields_interleaved=unspecified, **initial_state)Alias for kd.functor.for_
kd.format(fmt, /, **kwargs)Alias for kd.strings.format
kd.freeze(x)Alias for kd.core.freeze
kd.freeze_bag(x)Alias for kd.core.freeze_bag
kd.from_json(x, /, schema=OBJECT, default_number_schema=OBJECT, *, on_invalid=[], keys_attr='json_object_keys', values_attr='json_object_values')Alias for kd.json.from_json
kd.from_proto(messages: Message | list[_NestedMessageContainer] | tuple[_NestedMessageContainer, ...] | None, /, *, extensions: list[str] | None = None, itemid: DataSlice | None = None, schema: DataSlice | None = None) -> DataSliceReturns a DataSlice representing proto data.
Messages, primitive fields, repeated fields, and maps are converted to
equivalent Koda structures: objects/entities, primitives, lists, and dicts,
respectively. Enums are converted to INT32. The attribute names on the Koda
objects match the field names in the proto definition. See below for methods
to convert proto extensions to attributes alongside regular fields.
If schema is not specified or schema is kd.OBJECT, only present fields in
`messages` are loaded and included in the converted schema. To get a schema
that contains all fields independent of the data, use `kd.schema_from_proto`.
Proto extensions are ignored by default unless `extensions` is specified, or
if an explicit entity schema with parenthesized attr names is specified. If
both are specified, we use the union of the two extension sets.
The format of each extension specified in `extensions` is a dot-separated
sequence of field names and/or extension names, where extension names are
fully-qualified extension paths surrounded by parentheses. This sequence of
fields and extensions is traversed during conversion, in addition to the
default behavior of traversing all fields. For example:
"path.to.field.(package_name.some_extension)"
"path.to.repeated_field.(package_name.some_extension)"
"path.to.map_field.values.(package_name.some_extension)"
"path.(package_name.some_extension).(package_name2.nested_extension)"
If an explicit entity schema attr name starts with "(" and ends with ")" it is
also interpreted as an extension name.
Extensions are looked up using the C++ generated descriptor pool, using
`DescriptorPool::FindExtensionByName`, which requires that all extensions are
compiled in as C++ protos. The Koda attribute names for the extension fields
are parenthesized fully-qualified extension paths (e.g.
"(package_name.some_extension)" or
"(package_name.SomeMessage.some_extension)".) As the names contain '()' and
'.' characters, they cannot be directly accessed using '.name' syntax but can
be accessed using `.get_attr(name)'. For example,
ds.get_attr('(package_name.AbcExtension.abc_extension)')
ds.optional_field.get_attr('(package_name.DefExtension.def_extension)')
If `messages` is a single proto Message, the result is a DataItem. If it is a
nested list of proto Messages, the result is a DataSlice with the same number
of dimensions as the nesting level.
Args:
messages: Message or nested list/tuple of Message of the same type. Any of
the messages may be None, which will produce missing items in the result.
extensions: List of proto extension paths.
itemid: The ItemId(s) to use for the root object(s). If not specified, will
allocate new id(s). If specified, will also infer the ItemIds for all
child items such as List items from this id, so that repeated calls to
this method on the same input will produce the same id(s) for everything.
Use this with care to avoid unexpected collisions.
schema: The schema to use for the return value. Can be set to kd.OBJECT to
(recursively) create an object schema. Can be set to None (default) to
create an uuschema based on the proto descriptor. When set to an entity
schema, some fields may be set to kd.OBJECT to create objects from that
point.
Returns:
A DataSlice representing the proto data.
kd.from_proto_any(messages: Any | list[_NestedAnyMessageContainer] | tuple[_NestedAnyMessageContainer, ...] | None, /, *, extensions: list[str] | None = None, itemid: DataSlice | None = None, schema: DataSlice | None = None, message_type: type[Message] | None = None, descriptor_pool: DescriptorPool | None = None) -> DataSliceReturns a DataSlice converted from a nested list of proto Any messages.
This function is similar to `from_proto`, but it first unpacks the Any
messages before converting them. Only the top-level Any message is unpacked:
if there are Any fields inside of the unpacked message, they are treated the
same as in `from_proto`.
If `message_type` is provided, all Any messages are unpacked into this type.
Otherwise, the type is inferred from the type URL in each Any message, and
the messages are looked up in the `descriptor_pool`. If `descriptor_pool` is
not provided, the default descriptor pool is used.
If `schema` is not explicitly provided, the resulting DataSlice will have an
OBJECT schema so that inputs with differing message types can be represented.
Args:
messages: google.protobuf.Any message or nested list/tuple of
google.protobuf.Any messages. Any of the messages may be None, which will
produce missing items in the result.
extensions: See `from_proto` for more details.
itemid: See `from_proto` for more details.
schema: See `from_proto` for more details.
message_type: The type to unpack the Any messages into. If None, the type is
inferred from the Any messages.
descriptor_pool: The descriptor pool to use for looking up message types. If
None, the default descriptor pool is used.
Returns:
A DataSlice representing the unpacked and converted proto data.
kd.from_proto_bytes(x, proto_path, /, *, extensions=unspecified, itemids=unspecified, schema=unspecified, on_invalid=unspecified)Alias for kd.proto.from_proto_bytes
kd.from_proto_json(x, proto_path, /, *, extensions=unspecified, itemids=unspecified, schema=unspecified, on_invalid=unspecified)Alias for kd.proto.from_proto_json
kd.from_py(py_obj: Any, *, dict_as_obj: bool = False, itemid: DataSlice | None = None, schema: DataSlice | None = OBJECT, from_dim: int = 0) -> DataSliceAliases:
Converts Python object into DataSlice.
Can convert nested lists/dicts into Koda objects recursively as well.
Args:
py_obj: Python object to convert.
dict_as_obj: If True, will convert dicts with string keys into Koda objects
instead of Koda dicts.
itemid: The ItemId to use for the root object. If not specified, will
allocate a new id. If specified, will also infer the ItemIds for all child
items such as list items from this id, so that repeated calls to this
method on the same input will produce the same id for everything. Use this
with care to avoid unexpected collisions.
schema: The schema to use for the return value. When this schema or one of
its attributes is OBJECT (which is also the default), recursively creates
objects from that point on.
from_dim: The dimension to start creating Koda objects/lists/dicts from.
`py_obj` must be a nested list of at least from_dim depth, and the outer
from_dim dimensions will become the returned DataSlice dimensions. When
from_dim is 0, the return value is therefore a DataItem.
Returns:
A DataItem with the converted data.
kd.from_pytree(py_obj: Any, *, dict_as_obj: bool = False, itemid: DataSlice | None = None, schema: DataSlice | None = OBJECT, from_dim: int = 0) -> DataSliceAlias for kd.from_py
kd.fstr(x)Alias for kd.strings.fstr
kd.full_equal(x, y)Alias for kd.comparison.full_equal
kd.get_attr(x, attr_name, default=unspecified)Alias for kd.core.get_attr
kd.get_attr_names(x)Alias for kd.core.get_attr_names
kd.get_bag(ds)Alias for kd.core.get_bag
kd.get_dtype(ds)Alias for kd.schema.get_dtype
kd.get_item(x, key_or_index)Alias for kd.core.get_item
kd.get_item_schema(list_schema)Alias for kd.schema.get_item_schema
kd.get_itemid(x)Alias for kd.schema.get_itemid
kd.get_key_schema(dict_schema)Alias for kd.schema.get_key_schema
kd.get_keys(dict_ds)Alias for kd.dicts.get_keys
kd.get_metadata(x)Alias for kd.core.get_metadata
kd.get_ndim(x)Alias for kd.slices.get_ndim
kd.get_nofollowed_schema(schema)Alias for kd.schema.get_nofollowed_schema
kd.get_obj_schema(x)Alias for kd.schema.get_obj_schema
kd.get_primitive_schema(ds)Alias for kd.schema.get_dtype
kd.get_proto_attr(x, field_name)Alias for kd.proto.get_proto_attr
kd.get_repr(x, /, *, depth=25, item_limit=200, item_limit_per_dimension=25, format_html=False, max_str_len=100, max_expr_quote_len=10000, show_attributes=True, show_databag_id=False, show_shape=False, show_schema=False, show_item_id=False, show_present_count=False)Alias for kd.slices.get_repr
kd.get_schema(x)Alias for kd.schema.get_schema
kd.get_shape(x)Alias for kd.shapes.get_shape
kd.get_value_schema(dict_schema)Alias for kd.schema.get_value_schema
kd.get_values(dict_ds, key_ds=unspecified)Alias for kd.dicts.get_values
kd.greater(x, y)Alias for kd.comparison.greater
kd.greater_equal(x, y)Alias for kd.comparison.greater_equal
kd.group_by(x, *keys, sort=False)Alias for kd.slices.group_by
kd.group_by_indices(*keys, sort=False)Alias for kd.slices.group_by_indices
kd.has(x)Alias for kd.masking.has
kd.has_attr(x, attr_name)Alias for kd.core.has_attr
kd.has_bag(ds)Alias for kd.core.has_bag
kd.has_dict(x)Alias for kd.dicts.has_dict
kd.has_entity(x)Alias for kd.core.has_entity
kd.has_fn(x)Alias for kd.functor.has_fn
kd.has_list(x)Alias for kd.lists.has_list
kd.has_not(x)Alias for kd.masking.has_not
kd.has_primitive(x)Alias for kd.core.has_primitive
kd.hash_itemid(x)Alias for kd.ids.hash_itemid
kd.if_(cond, yes_fn, no_fn, *args, return_type_as=None, **kwargs)Alias for kd.functor.if_
kd.implode(x: DataSlice, /, ndim: int | DataSlice = 1, itemid: DataSlice | None = None) -> DataSliceAlias for kd.lists.implode
kd.index(x, dim=-1)Alias for kd.slices.index
kd.int32(x: Any) -> DataSliceAlias for kd.slices.int32
kd.int64(x: Any) -> DataSliceAlias for kd.slices.int64
kd.inverse_mapping(x, ndim=unspecified)Alias for kd.slices.inverse_mapping
kd.inverse_select(ds, fltr)Alias for kd.slices.inverse_select
kd.is_dict(x)Alias for kd.dicts.is_dict
kd.is_empty(x)Alias for kd.slices.is_empty
kd.is_entity(x)Alias for kd.core.is_entity
kd.is_expandable_to(x, target, ndim=unspecified)Alias for kd.slices.is_expandable_to
kd.is_expr(obj: Any) -> DataSliceReturns kd.present if the given object is an Expr and kd.missing otherwise.
kd.is_fn(obj: Any) -> DataItemAlias for kd.functor.is_fn
kd.is_item(obj: Any) -> DataSliceReturns kd.present if the given object is a scalar DataItem and kd.missing otherwise.
kd.is_list(x)Alias for kd.lists.is_list
kd.is_nan(x)Alias for kd.math.is_nan
kd.is_null_bag(bag)Alias for kd.bags.is_null_bag
kd.is_primitive(x)Alias for kd.core.is_primitive
kd.is_shape_compatible(x, y)Alias for kd.slices.is_shape_compatible
kd.is_slice(obj: Any) -> DataSliceReturns kd.present if the given object is a DataSlice and kd.missing otherwise.
kd.isin(x, y)Alias for kd.slices.isin
kd.item(x, /, schema=None)Alias for kd.types.DataItem.from_vals
kd.less(x, y)Alias for kd.comparison.less
kd.less_equal(x, y)Alias for kd.comparison.less_equal
kd.list(items: Any | None = None, *, item_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSliceCreates list(s) by collapsing `items` into an immutable list.
If there is no argument, returns an empty Koda List.
If the argument is a Python list, creates a nested Koda List.
Examples:
list() -> a single empty Koda List
list([1, 2, 3]) -> Koda List with items 1, 2, 3
list([[1, 2, 3], [4, 5]]) -> nested Koda List [[1, 2, 3], [4, 5]]
# items are Koda lists.
Args:
items: The items to use. If not specified, an empty list of OBJECTs will be
created.
item_schema: the schema of the list items. If not specified, it will be
deduced from `items` or defaulted to OBJECT.
schema: The schema to use for the list. If specified, then item_schema must
not be specified.
itemid: Optional ITEMID DataSlice used as ItemIds of the resulting lists.
Returns:
The slice with list/lists.
kd.list_append_update(x, append)Alias for kd.lists.list_append_update
kd.list_like(shape_and_mask_from: DataSlice, /, items: Any | None = None, *, item_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSliceAlias for kd.lists.like
kd.list_schema(item_schema)Alias for kd.schema.list_schema
kd.list_shaped(shape: JaggedShape, /, items: Any | None = None, *, item_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSliceAlias for kd.lists.shaped
kd.list_shaped_as(shape_from: DataSlice, /, items: Any | None = None, *, item_schema: DataSlice | None = None, schema: DataSlice | None = None, itemid: DataSlice | None = None) -> DataSliceAlias for kd.lists.shaped_as
kd.list_size(list_slice)Alias for kd.lists.size
kd.load(path: str, /, *, fs: Any | None = None) -> AnyAlias for kd.s11n.load
kd.loads(x: bytes) -> AnyAlias for kd.s11n.loads
kd.map(fn, *args, include_missing=False, **kwargs)Alias for kd.functor.map
kd.map_py(fn, *args, schema=None, max_threads=1, ndim=0, include_missing=None, dict_as_obj=False, item_completed_callback=None, **kwargs)Alias for kd.py.map_py
kd.map_py_on_cond(true_fn, false_fn, cond, *args, schema=None, max_threads=1, dict_as_obj=False, item_completed_callback=None, **kwargs)Alias for kd.py.map_py_on_cond
kd.map_py_on_selected(fn, cond, *args, schema=None, max_threads=1, dict_as_obj=False, item_completed_callback=None, **kwargs)Alias for kd.py.map_py_on_selected
kd.mask(x: Any) -> DataSliceAlias for kd.slices.mask
kd.mask_and(x, y)Alias for kd.masking.mask_and
kd.mask_equal(x, y)Alias for kd.masking.mask_equal
kd.mask_not_equal(x, y)Alias for kd.masking.mask_not_equal
kd.mask_or(x, y)Alias for kd.masking.mask_or
kd.max(x)Alias for kd.math.max
kd.maximum(x, y)Alias for kd.math.maximum
kd.maybe(x, attr_name)Alias for kd.core.maybe
kd.metadata(x, /, **attrs)Alias for kd.core.metadata
kd.min(x)Alias for kd.math.min
kd.minimum(x, y)Alias for kd.math.minimum
kd.missingA mask value representing absence.
kd.mutable_bag()Alias for kd.types.DataBag.empty_mutable
kd.named_container()A container that automatically names expressions.
In eager mode, non-expression inputs are stored as-is. In tracing mode,
they are converted to expressions (functions and lambdas are automatically
traced).
Example:
c = kd.named_container()
# 1. Non-tracing mode
# Storing a value:
c.foo = 5
c.foo # Returns 5
# Storing an expression:
c.x_plus_y = I.x + I.y
c.x_plus_y # Returns (I.x + I.y).with_name('x_plus_y')
# Listing stored items:
vars(c) # Returns {'foo': 5, 'x_plus_y': (I.x + I.y).with_name('x_plus_y')}
# 2. Tracing mode
def my_fn(x):
c = kd.named_container()
c.a = 2
c.b = 1
return c.a * x + c.b
fn = kd.fn(my_fn)
fn.a # Returns 2 (accessible because it was named by the container)
fn(x=5) # Returns 11
kd.named_schema(name, /, **kwargs)Alias for kd.schema.named_schema
kd.namedtuple(**kwargs)Alias for kd.tuples.namedtuple
kd.new(*, schema: DataSlice | str | None = None, overwrite_schema: bool = False, itemid: DataSlice | None = None, **attrs: Any) -> DataSliceAlias for kd.entities.new
kd.new_dictid()Alias for kd.allocation.new_dictid
kd.new_dictid_like(shape_and_mask_from)Alias for kd.allocation.new_dictid_like
kd.new_dictid_shaped(shape)Alias for kd.allocation.new_dictid_shaped
kd.new_dictid_shaped_as(shape_from)Alias for kd.allocation.new_dictid_shaped_as
kd.new_itemid()Alias for kd.allocation.new_itemid
kd.new_itemid_like(shape_and_mask_from)Alias for kd.allocation.new_itemid_like
kd.new_itemid_shaped(shape)Alias for kd.allocation.new_itemid_shaped
kd.new_itemid_shaped_as(shape_from)Alias for kd.allocation.new_itemid_shaped_as
kd.new_like(shape_and_mask_from: DataSlice, /, *, schema: DataSlice | str | None = None, overwrite_schema: bool = False, itemid: DataSlice | None = None, **attrs: Any) -> DataSliceAlias for kd.entities.like
kd.new_listid()Alias for kd.allocation.new_listid
kd.new_listid_like(shape_and_mask_from)Alias for kd.allocation.new_listid_like
kd.new_listid_shaped(shape)Alias for kd.allocation.new_listid_shaped
kd.new_listid_shaped_as(shape_from)Alias for kd.allocation.new_listid_shaped_as
kd.new_shaped(shape: JaggedShape, /, *, schema: DataSlice | str | None = None, overwrite_schema: bool = False, itemid: DataSlice | None = None, **attrs: Any) -> DataSliceAlias for kd.entities.shaped
kd.new_shaped_as(shape_from: DataSlice, /, *, schema: DataSlice | str | None = None, overwrite_schema: bool = False, itemid: DataSlice | None = None, **attrs: Any) -> DataSliceAlias for kd.entities.shaped_as
kd.no_bag(ds)Alias for kd.core.no_bag
kd.nofollow(x)Alias for kd.core.nofollow
kd.nofollow_schema(schema)Alias for kd.schema.nofollow_schema
kd.not_equal(x, y)Alias for kd.comparison.not_equal
kd.obj(arg: Any = unspecified, /, *, itemid: DataSlice | None = None, **attrs: Any) -> DataSliceAlias for kd.objs.new
kd.obj_like(shape_and_mask_from: DataSlice, /, *, itemid: DataSlice | None = None, **attrs: Any) -> DataSliceAlias for kd.objs.like
kd.obj_shaped(shape: JaggedShape, /, *, itemid: DataSlice | None = None, **attrs: Any) -> DataSliceAlias for kd.objs.shaped
kd.obj_shaped_as(shape_from: DataSlice, /, *, itemid: DataSlice | None = None, **attrs: Any) -> DataSliceAlias for kd.objs.shaped_as
kd.ordinal_rank(x, tie_breaker=unspecified, descending=False, ndim=unspecified)Alias for kd.slices.ordinal_rank
kd.presentA mask value representing presence.
kd.present_like(x)Alias for kd.masking.present_like
kd.present_shaped(shape)Alias for kd.masking.present_shaped
kd.present_shaped_as(x)Alias for kd.masking.present_shaped_as
kd.pwl_curve(p, adjustments)Alias for kd.curves.pwl_curve
kd.py_fn(f: Callable[..., Any], *, return_type_as: Any = <class 'koladata.types.data_slice.DataSlice'>, **defaults: Any) -> DataItemAlias for kd.functor.py_fn
kd.py_reference(obj: Any) -> PyObjectWraps into a Arolla QValue using reference for serialization.
py_reference can be used to pass arbitrary python objects through
kd.apply_py/kd.py_fn.
Note that using reference for serialization means that the resulting
QValue (and Exprs created using it) will only be valid within the
same process. Trying to deserialize it in a different process
will result in an exception.
Args:
obj: the python object to wrap.
Returns:
The wrapped python object as Arolla QValue.
kd.randint_like(x, low=unspecified, high=unspecified, seed=unspecified)Alias for kd.random.randint_like
kd.randint_shaped(shape, low=unspecified, high=unspecified, seed=unspecified)Alias for kd.random.randint_shaped
kd.randint_shaped_as(x, low=unspecified, high=unspecified, seed=unspecified)Alias for kd.random.randint_shaped_as
kd.range(start, end=unspecified)Alias for kd.slices.range
kd.ref(ds)Alias for kd.core.ref
kd.register_py_fn(f: Callable[..., Any], *, return_type_as: Any = <class 'koladata.types.data_slice.DataSlice'>, unsafe_override: bool = False, **defaults: Any) -> DataItemAlias for kd.functor.register_py_fn
kd.reify(ds, source)Alias for kd.core.reify
kd.repeat(x, sizes)Alias for kd.slices.repeat
kd.repeat_present(x, sizes)Alias for kd.slices.repeat_present
kd.reshape(x, shape)Alias for kd.shapes.reshape
kd.reshape_as(x, shape_from)Alias for kd.shapes.reshape_as
kd.reverse(ds)Alias for kd.slices.reverse
kd.reverse_select(ds, fltr)Alias for kd.slices.inverse_select
kd.sample(x, ratio, seed, key=unspecified)Alias for kd.random.sample
kd.sample_n(x, n, seed, key=unspecified)Alias for kd.random.sample_n
kd.schema_from_proto(message_class: type[Message], /, *, extensions: list[str] | None = None) -> SchemaItemReturns a Koda schema representing a proto message class.
This is similar to `from_proto(x).get_schema()` when `x` is an instance of
`message_class`, except that it eagerly adds all non-extension fields to the
schema instead of only adding fields that have data populated in `x`.
The returned schema is a uuschema whose itemid is a function of the proto
message class' fully qualified name, and any child message classes' schemas
are also uuschemas derived in the same way. The returned schema has the same
itemid as `from_proto(message_class()).get_schema()`.
The format of each extension specified in `extensions` is a dot-separated
sequence of field names and/or extension names, where extension names are
fully-qualified extension paths surrounded by parentheses. For example:
"path.to.field.(package_name.some_extension)"
"path.to.repeated_field.(package_name.some_extension)"
"path.to.map_field.values.(package_name.some_extension)"
"path.(package_name.some_extension).(package_name2.nested_extension)"
Args:
message_class: A proto message class to convert.
extensions: List of proto extension paths.
Returns:
A SchemaItem containing the converted schema.
kd.schema_from_proto_path(proto_path, /, *, extensions=Entity:#5ikYYvXepp19g47QDLnJR2)Alias for kd.proto.schema_from_proto_path
kd.schema_from_py(tpe: type[Any]) -> SchemaItemAlias for kd.schema.schema_from_py
kd.select(ds, fltr, expand_filter=True)Alias for kd.slices.select
kd.select_items(ds, fltr)Alias for kd.lists.select_items
kd.select_keys(ds, fltr)Alias for kd.dicts.select_keys
kd.select_present(ds)Alias for kd.slices.select_present
kd.select_values(ds, fltr)Alias for kd.dicts.select_values
kd.set_attr(x: DataSlice, attr_name: str, value: Any, overwrite_schema: bool = False)Sets an attribute `attr_name` to `value`.
If `overwrite_schema` is True and `x` is either an Entity with explicit schema
or an Object where some items are entities with explicit schema, it will get
updated with `value`'s schema first.
Args:
x: a DataSlice on which to set the attribute. Must have DataBag attached.
attr_name: attribute name
value: a DataSlice or convertible to a DataSlice that will be assigned as an
attribute.
overwrite_schema: whether to overwrite the schema before setting an
attribute.
kd.set_attrs(x: DataSlice, *, overwrite_schema: bool = False, **attrs: Any)Sets multiple attributes on an object / entity.
Args:
x: a DataSlice on which attributes are set. Must have DataBag attached.
overwrite_schema: whether to overwrite the schema before setting an
attribute.
**attrs: attribute values that are converted to DataSlices with DataBag
adoption.
kd.set_schema(x: DataSlice, schema: DataSlice) -> DataSliceReturns a copy of `x` with the provided `schema`.
If `schema` is an Entity schema and has a different DataBag than `x`, it is
merged into the DataBag of `x`.
It only changes the schemas of `x` and does not change the items in `x`. To
change the items in `x`, use `kd.cast_to` instead. For example,
kd.set_schema(kd.ds([1, 2, 3]), kd.FLOAT32) -> fails because the items in
`x` are not compatible with FLOAT32.
kd.cast_to(kd.ds([1, 2, 3]), kd.FLOAT32) -> kd.ds([1.0, 2.0, 3.0])
When items in `x` are primitives or `schemas` is a primitive schema, it checks
items and schema are compatible. When items are ItemIds and `schema` is a
non-primitive schema, it does not check the underlying data matches the
schema. For example,
kd.set_schema(kd.ds([1, 2, 3], schema=kd.OBJECT), kd.INT32)
-> kd.ds([1, 2, 3])
kd.set_schema(kd.ds([1, 2, 3]), kd.INT64) -> fail
kd.set_schema(kd.ds(1).with_bag(kd.bag()), kd.schema.new_schema(x=kd.INT32))
->
fail
kd.set_schema(kd.new(x=1), kd.INT32) -> fail
kd.set_schema(kd.new(x=1), kd.schema.new_schema(x=kd.INT64)) -> work
Args:
x: DataSlice to change the schema of.
schema: DataSlice containing the new schema.
Returns:
DataSlice with the new schema.
kd.shallow_clone(x, /, *, itemid=unspecified, schema=unspecified, **overrides)Alias for kd.core.shallow_clone
kd.shuffle(x, /, ndim=unspecified, seed=unspecified)Alias for kd.random.shuffle
kd.size(x)Alias for kd.slices.size
kd.slice(x, /, schema=None)Alias for kd.types.DataSlice.from_vals
kd.sort(x, sort_by=unspecified, descending=False)Alias for kd.slices.sort
kd.stack(*args, ndim=0)Alias for kd.slices.stack
kd.static_when_tracing(base_type: TypeConstraint | None = None) -> _StaticWhenTracedAlias for kd.type_checking.static_when_tracing
kd.str(x: Any) -> DataSliceAlias for kd.slices.str
kd.strict_attrs(x, /, **attrs)Alias for kd.core.strict_attrs
kd.strict_new(*, schema, overwrite_schema=False, itemid=unspecified, **attrs)Alias for kd.entities.strict_new
kd.strict_with_attrs(x, /, **attrs)Alias for kd.core.strict_with_attrs
kd.stub(x, attrs=[])Alias for kd.core.stub
kd.subslice(x, *slices)Alias for kd.slices.subslice
kd.sum(x)Alias for kd.math.sum
kd.switch(key, cases, *args, return_type_as=None, **kwargs)Alias for kd.functor.switch
kd.take(x, indices)Alias for kd.slices.at
kd.tile(x, shape)Alias for kd.slices.tile
kd.to_expr(x)Alias for kd.schema.to_expr
kd.to_itemid(x)Alias for kd.schema.get_itemid
kd.to_json(x, /, *, indent=None, ensure_ascii=True, keys_attr='json_object_keys', values_attr='json_object_values', include_missing_values=True)Alias for kd.json.to_json
kd.to_none(x)Alias for kd.schema.to_none
kd.to_object(x)Alias for kd.schema.to_object
kd.to_proto(x: DataSlice, /, message_class: type[Message]) -> Message | list[_NestedMessageList] | NoneConverts a DataSlice or DataItem to one or more proto messages.
If `x` is a DataItem, this returns a single proto message object. Otherwise,
this returns a nested list of proto message objects with the same size and
shape as the input. Missing items in the input are returned as python None in
place of a message.
Koda data structures are converted to equivalent proto messages, primitive
fields, repeated fields, maps, and enums, based on the proto schema. Koda
entity attributes are converted to message fields with the same name, if
those fields exist, otherwise they are ignored.
Koda slices with mixed underlying dtypes are tolerated wherever the proto
conversion is defined for all dtypes, regardless of schema.
Koda entity attributes that are parenthesized fully-qualified extension
paths (e.g. "(package_name.some_extension)") are converted to extensions,
if those extensions exist in the descriptor pool of the messages' common
descriptor, otherwise they are ignored.
Args:
x: DataSlice to convert.
message_class: A proto message class.
Returns:
A converted proto message or list of converted proto messages.
kd.to_proto_any(x: DataSlice, *, descriptor_pool: DescriptorPool | None = None, deterministic: bool = False) -> Any | list[_NestedAnyMessageList] | NoneConverts a DataSlice or DataItem to proto Any messages.
The schemas of all present values in `x` must have been derived from a proto
schema using `from_proto` or `schema_from_proto`, so that the original names
of the message types are embedded in the schema. Otherwise, this will fail.
Args:
x: DataSlice to convert.
descriptor_pool: Overrides the descriptor pool used to look up python proto
message classes based on proto message type full name. If None, the
default descriptor pool is used.
deterministic: Passed to Any.Pack.
Returns:
A proto Any message or nested list of proto Any messages with the same
shape as the input. Missing elements in the input are None in the output.
kd.to_proto_bytes(x, proto_path, /)Alias for kd.proto.to_proto_bytes
kd.to_proto_json(x, proto_path, /)Alias for kd.proto.to_proto_json
kd.to_py(ds: DataSlice, *, max_depth: int = 2, obj_as_dict: bool = False, include_missing_attrs: bool = True, output_class: type[Any] | None = None) -> AnyReturns a readable python object from a DataSlice.
Attributes, lists, and dicts are recursively converted to Python objects.
Args:
ds: A DataSlice
max_depth: Maximum depth for recursive conversion. Each attribute, list item
and dict keys / values access represent 1 depth increment. Use -1 for
unlimited depth. If output_class is set, this is ignored and the depth is
determined by the output_class.
obj_as_dict: Whether to convert objects to python dicts. By default objects
are converted to automatically constructed 'Obj' dataclass instances.
include_missing_attrs: whether to include attributes with None value in
objects.
output_class: If not None, will be used recursively as the output type.
kd.to_pylist(x: DataSlice) -> list[Any]Expands the outermost DataSlice dimension into a list of DataSlices.
kd.to_pytree(ds: DataSlice, max_depth: int = 2, include_missing_attrs: bool = True) -> AnyNo description
kd.to_schema(x)Alias for kd.schema.to_schema
kd.trace_as_fn(*, name: str | None = None, return_type_as: Any = None, functor_factory: FunctorFactory | None = None)Alias for kd.functor.trace_as_fn
kd.trace_py_fn(f: Callable[..., Any], *, auto_variables: bool = True, **defaults: Any) -> DataItemAlias for kd.functor.trace_py_fn
kd.translate(keys_to, keys_from, values_from)Alias for kd.slices.translate
kd.translate_group(keys_to, keys_from, values_from)Alias for kd.slices.translate_group
kd.tuple(*args)Alias for kd.tuples.tuple
kd.unique(x, sort=False)Alias for kd.slices.unique
kd.update_schema(obj: DataSlice, **attr_schemas: Any) -> DataSliceUpdates the schema of `obj` DataSlice using given schemas for attrs.
kd.updated(ds, *bag)Alias for kd.core.updated
kd.updated_bag(*bags)Alias for kd.bags.updated
kd.uu(seed: str | None = None, *, schema: DataSlice | None = None, overwrite_schema: bool = False, **attrs: Any) -> DataSliceAlias for kd.entities.uu
kd.uu_schema(seed='', **kwargs)Alias for kd.schema.uu_schema
kd.uudict(items_or_keys, values=unspecified, *, key_schema=unspecified, value_schema=unspecified, schema=unspecified, seed='')Alias for kd.dicts.uu
kd.uuid(seed='', **kwargs)Alias for kd.ids.uuid
kd.uuid_for_dict(seed='', **kwargs)Alias for kd.ids.uuid_for_dict
kd.uuid_for_list(seed='', **kwargs)Alias for kd.ids.uuid_for_list
kd.uuids_with_allocation_size(seed='', *, size)Alias for kd.ids.uuids_with_allocation_size
kd.uulist(items=unspecified, *, item_schema=unspecified, schema=unspecified, seed='')Alias for kd.lists.uu
kd.uuobj(seed: str | None = None, **attrs: Any) -> DataSliceAlias for kd.objs.uu
kd.val_like(x, val)Alias for kd.slices.val_like
kd.val_shaped(shape, val)Alias for kd.slices.val_shaped
kd.val_shaped_as(x, val)Alias for kd.slices.val_shaped_as
kd.while_(condition_fn, body_fn, *, returns=unspecified, yields=unspecified, yields_interleaved=unspecified, **initial_state)Alias for kd.functor.while_
kd.with_attr(x, attr_name, value, overwrite_schema=False)Alias for kd.core.with_attr
kd.with_attrs(x, /, *, overwrite_schema=False, **attrs)Alias for kd.core.with_attrs
kd.with_bag(ds, bag)Alias for kd.core.with_bag
kd.with_dict_update(x, keys, values=unspecified)Alias for kd.dicts.with_dict_update
kd.with_list_append_update(x, append)Alias for kd.lists.with_list_append_update
kd.with_merged_bag(ds)Alias for kd.core.with_merged_bag
kd.with_metadata(x, /, **attrs)Alias for kd.core.with_metadata
kd.with_name(obj: Any, name: str | Text) -> AnyAlias for kd.types.DataBag.with_name
kd.with_print(x, *args, sep=' ', end='\n')Alias for kd.core.with_print
kd.with_schema(x, schema)Alias for kd.schema.with_schema
kd.with_schema_from_obj(x)Alias for kd.schema.with_schema_from_obj
kd.with_timestamp(x)Alias for kd.core.with_timestamp
kd.xor(x, y)Alias for kd.masking.xor
kd.zip(*args)Alias for kd.slices.zip