Home
Overview
Fundamentals
Glossary
Cheatsheet
API Reference
Quick Recipes
Deep Dive
Common Pitfalls and Gotchas
Persistent Storage
Schema-related operators.
kd.schema.agg_common_schema(x, ndim=unspecified)Returns the common schema of `x` along the last `ndim` dimensions.
The "common schema" is defined according to go/koda-type-promotion.
Examples:
kd.agg_common_schema(kd.slice([kd.INT32, None, kd.FLOAT32]))
# -> kd.FLOAT32
kd.agg_common_schema(kd.slice([[kd.INT32, None], [kd.FLOAT32, kd.FLOAT64]]))
# -> kd.slice([kd.INT32, kd.FLOAT64])
kd.agg_common_schema(
kd.slice([[kd.INT32, None], [kd.FLOAT32, kd.FLOAT64]]), ndim=2)
# -> kd.FLOAT64
Args:
x: DataSlice of schemas.
ndim: The number of last dimensions to aggregate over.
kd.schema.cast_to(x, schema)Aliases:
Returns `x` casted to the provided `schema` using explicit casting rules.
Dispatches to the relevant `kd.to_...` operator. Performs permissive casting,
e.g. allowing FLOAT32 -> INT32 casting through `kd.cast_to(slice, INT32)`.
Note that `x` must be correctly typed with its schema. Thus, if provided
`schema` is equal to `x.get_schema()`, operator does nothing.
Args:
x: DataSlice to cast.
schema: Schema to cast to.
kd.schema.cast_to_implicit(x, schema)Returns `x` casted to the provided `schema` using implicit casting rules.
Note that `schema` must be the common schema of `schema` and `x.get_schema()`
according to go/koda-type-promotion.
Note that `x` must be correctly typed with its schema. Thus, if provided
`schema` is equal to `x.get_schema()`, operator does nothing.
Args:
x: DataSlice to cast.
schema: Schema to cast to. Must be a scalar.
kd.schema.cast_to_narrow(x, schema)Returns `x` casted to the provided `schema`.
Allows for schema narrowing, where OBJECT types can be casted to primitive
schemas as long as the data is implicitly castable to the schema. Follows the
casting rules of `kd.cast_to_implicit` for the narrowed schema.
Note that `x` must be correctly typed with its schema. Thus, if provided
`schema` is equal to `x.get_schema()`, operator does nothing.
Args:
x: DataSlice to cast.
schema: Schema to cast to. Must be a scalar.
kd.schema.common_schema(x)Returns the common schema as a scalar DataItem of `x`.
The "common schema" is defined according to go/koda-type-promotion.
Args:
x: DataSlice of schemas.
kd.schema.deep_cast_to(x, schema, allow_removing_attrs=False, allow_new_attrs=False)Returns `x` casted to provided `schema` using explicit casting rules.
In contrast to `kd.cast_to`, this operator always performs deep casting - even
when x.get_schema().get_itemid() == schema.get_itemid().
Args:
x: DataSlice to cast.
schema: Schema to cast to.
allow_removing_attrs: If True, the `schema` may omit attributes that are
present in `x.get_schema()`. The values of such attributes would be
omitted from the result.
allow_new_attrs: If True, the `schema` may have additional attributes that
are not present in `x.get_schema()`. Additional attributes are set to
missing values.
kd.schema.dict_schema(key_schema, value_schema)Aliases:
Returns a Dict schema with the provided `key_schema` and `value_schema`.
kd.schema.get_dtype(ds)Aliases:
Returns a primitive schema representing the underlying items' dtype.
If `ds` has a primitive schema, this returns that primitive schema, even if
all items in `ds` are missing. If `ds` has an OBJECT schema but contains
primitive values of a single dtype, it returns the schema for that primitive
dtype.
In case of items in `ds` have non-primitive types or mixed dtypes, returns
a missing schema (i.e. `kd.item(None, kd.SCHEMA)`).
Examples:
kd.get_primitive_schema(kd.slice([1, 2, 3])) -> kd.INT32
kd.get_primitive_schema(kd.slice([None, None, None], kd.INT32)) -> kd.INT32
kd.get_primitive_schema(kd.slice([1, 2, 3], kd.OBJECT)) -> kd.INT32
kd.get_primitive_schema(kd.slice([1, 'a', 3], kd.OBJECT)) -> missing schema
kd.get_primitive_schema(kd.obj())) -> missing schema
Args:
ds: DataSlice to get dtype from.
Returns:
a primitive schema DataSlice.
kd.schema.get_item_schema(list_schema)Aliases:
Returns the item schema of a List schema`.
kd.schema.get_itemid(x)Aliases:
Casts `x` to ITEMID using explicit (permissive) casting rules.
kd.schema.get_key_schema(dict_schema)Aliases:
Returns the key schema of a Dict schema`.
kd.schema.get_nofollowed_schema(schema)Aliases:
Returns the original schema from nofollow schema.
Requires `nofollow_schema` to be a nofollow schema, i.e. that it wraps some
other schema.
Args:
schema: nofollow schema DataSlice.
kd.schema.get_obj_schema(x)Aliases:
Returns a DataSlice of schemas for Objects and primitives in `x`.
DataSlice `x` must have OBJECT schema.
Examples:
db = kd.bag()
s = db.new_schema(a=kd.INT32)
obj = s(a=1).embed_schema()
kd.get_obj_schema(kd.slice([1, None, 2.0, obj]))
-> kd.slice([kd.INT32, NONE, kd.FLOAT32, s])
Args:
x: OBJECT DataSlice
Returns:
A DataSlice of schemas.
kd.schema.get_primitive_schema(ds)Alias for kd.schema.get_dtype
kd.schema.get_repr(schema)Returns a string representation of the schema.
Named schemas are only represented by their name. Other schemas are
represented by their content.
Args:
schema: A scalar schema DataSlice.
Returns:
A scalar string DataSlice. A repr of the given schema.
kd.schema.get_schema(x)Aliases:
Returns the schema of `x`.
kd.schema.get_value_schema(dict_schema)Aliases:
Returns the value schema of a Dict schema`.
kd.schema.internal_maybe_named_schema(name_or_schema)Converts a string to a named schema, passes through schema otherwise.
The operator also passes through arolla.unspecified, and raises when
it receives anything else except unspecified, string or schema DataItem.
This operator exists to support kd.core.new* family of operators.
Args:
name_or_schema: The input name or schema.
Returns:
The schema unchanged, or a named schema with the given name.
kd.schema.is_dict_schema(x)Returns true iff `x` is a Dict schema DataItem.
kd.schema.is_entity_schema(x)Returns true iff `x` is an Entity schema DataItem.
kd.schema.is_list_schema(x)Returns true iff `x` is a List schema DataItem.
kd.schema.is_primitive_schema(x)Returns true iff `x` is a primitive schema DataItem.
kd.schema.is_struct_schema(x)Returns true iff `x` is a Struct schema DataItem.
kd.schema.list_schema(item_schema)Aliases:
Returns a List schema with the provided `item_schema`.
kd.schema.named_schema(name, /, **kwargs)Aliases:
Creates a named entity schema.
A named schema will have its item id derived only from its name, which means
that two named schemas with the same name will have the same item id, even in
different DataBags, or with different kwargs passed to this method.
Args:
name: The name to use to derive the item id of the schema.
**kwargs: a named tuple mapping attribute names to DataSlices. The DataSlice
values must be schemas themselves.
Returns:
data_slice.DataSlice with the item id of the required schema and kd.SCHEMA
schema, with a new immutable DataBag attached containing the provided
kwargs.
kd.schema.new_schema(**kwargs)Creates a new allocated schema.
Args:
**kwargs: a named tuple mapping attribute names to DataSlices. The DataSlice
values must be schemas themselves.
Returns:
(DataSlice) containing the schema id.
kd.schema.nofollow_schema(schema)Aliases:
Returns a NoFollow schema of the provided schema.
`nofollow_schema` is reversible with `get_actual_schema`.
`nofollow_schema` can only be called on implicit and explicit schemas and
OBJECT. It raises an Error if called on primitive schemas, ITEMID, etc.
Args:
schema: Schema DataSlice to wrap.
kd.schema.schema_from_py(tpe: type[Any]) -> SchemaItemAliases:
Creates a Koda entity schema corresponding to the given Python type.
This method supports the following Python types / type annotations
recursively:
- Primitive types: int, float, bool, str, bytes.
- Collections: list[...], dict[...], Sequence[...], Mapping[...], ect.
- Unions: only "smth | None" or "Optional[smth]" is supported.
- Dataclasses.
This can be used in conjunction with kd.from_py to convert lists of Python
objects to efficient Koda DataSlices. Because of the 'efficient' goal, we
create an entity schema and do not use kd.OBJECT inside, which also results
in strict type checking. If you do not care
about efficiency or type safety, you can use kd.from_py(..., schema=kd.OBJECT)
directly.
Args:
tpe: The Python type to create a schema for.
Returns:
A Koda entity schema corresponding to the given Python type. The returned
schema is a uu-schema, in other words we always return the same output for
the same input. For dataclasses, we use the module name and the class name
to derive the itemid for the uu-schema.
kd.schema.to_bool(x)Casts `x` to BOOLEAN using explicit (permissive) casting rules.
kd.schema.to_bytes(x)Casts `x` to BYTES using explicit (permissive) casting rules.
kd.schema.to_expr(x)Aliases:
Casts `x` to EXPR using explicit (permissive) casting rules.
kd.schema.to_float32(x)Casts `x` to FLOAT32 using explicit (permissive) casting rules.
kd.schema.to_float64(x)Casts `x` to FLOAT64 using explicit (permissive) casting rules.
kd.schema.to_int32(x)Casts `x` to INT32 using explicit (permissive) casting rules.
kd.schema.to_int64(x)Casts `x` to INT64 using explicit (permissive) casting rules.
kd.schema.to_itemid(x)Alias for kd.schema.get_itemid
kd.schema.to_mask(x)Casts `x` to MASK using explicit (permissive) casting rules.
kd.schema.to_none(x)Aliases:
Casts `x` to NONE using explicit (permissive) casting rules.
kd.schema.to_object(x)Aliases:
Casts `x` to OBJECT using explicit (permissive) casting rules.
kd.schema.to_schema(x)Aliases:
Casts `x` to SCHEMA using explicit (permissive) casting rules.
kd.schema.to_str(x)Casts `x` to STRING using explicit (permissive) casting rules.
kd.schema.uu_schema(seed='', **kwargs)Aliases:
Creates a UUSchema, i.e. a schema keyed by a uuid.
In order to create a different id from the same arguments, use
`seed` argument with the desired value, e.g.
kd.uu_schema(seed='type_1', x=kd.INT32, y=kd.FLOAT32)
and
kd.uu_schema(seed='type_2', x=kd.INT32, y=kd.FLOAT32)
have different ids.
Args:
seed: string seed for the uuid computation.
**kwargs: a named tuple mapping attribute names to DataSlices. The DataSlice
values must be schemas themselves.
Returns:
(DataSlice) containing the schema uuid.
kd.schema.with_schema(x, schema)Aliases:
Returns a copy of `x` with the provided `schema`.
If `schema` is an Entity schema, it must have no DataBag or the same DataBag
as `x`. To set schema with a different DataBag, use `kd.set_schema` instead.
It only changes the schemas of `x` and does not change the items in `x`. To
change the items in `x`, use `kd.cast_to` instead. For example,
kd.with_schema(kd.ds([1, 2, 3]), kd.FLOAT32) -> fails because the items in
`x` are not compatible with FLOAT32.
kd.cast_to(kd.ds([1, 2, 3]), kd.FLOAT32) -> kd.ds([1.0, 2.0, 3.0])
When items in `x` are primitives or `schemas` is a primitive schema, it checks
items and schema are compatible. When items are ItemIds and `schema` is a
non-primitive schema, it does not check the underlying data matches the
schema. For example,
kd.with_schema(kd.ds([1, 2, 3], schema=kd.OBJECT), kd.INT32) ->
kd.ds([1, 2, 3])
kd.with_schema(kd.ds([1, 2, 3]), kd.INT64) -> fail
db = kd.bag()
kd.with_schema(kd.ds(1).with_bag(db), db.new_schema(x=kd.INT32)) -> fail due
to incompatible schema
kd.with_schema(db.new(x=1), kd.INT32) -> fail due to incompatible schema
kd.with_schema(db.new(x=1), kd.schema.new_schema(x=kd.INT32)) -> fail due to
different DataBag
kd.with_schema(db.new(x=1), kd.schema.new_schema(x=kd.INT32).no_bag()) ->
work
kd.with_schema(db.new(x=1), db.new_schema(x=kd.INT64)) -> work
Args:
x: DataSlice to change the schema of.
schema: DataSlice containing the new schema.
Returns:
DataSlice with the new schema.
kd.schema.with_schema_from_obj(x)Aliases:
Returns `x` with its embedded common schema set as the schema.
* `x` must have OBJECT schema.
* All items in `x` must have a common schema.
* If `x` is empty, the schema is set to NONE.
* If `x` contains mixed primitives without a common primitive type, the output
will have OBJECT schema.
Args:
x: An OBJECT DataSlice.