koladata

Home
Overview
Fundamentals
Glossary
Cheatsheet
API Reference
Quick Recipes
Deep Dive
Common Pitfalls and Gotchas
Persistent Storage

View the Project on GitHub google/koladata

kd.math API

Arithmetic operators.

kd.math.abs(x)

Computes pointwise absolute value of the input.

kd.math.add(x, y)

Computes pointwise x + y.

kd.math.agg_inverse_cdf(x, cdf_arg, ndim=unspecified)

Returns the value with CDF (in [0, 1]) approximately equal to the input.

The value is computed along the last ndim dimensions.

The return value will have an offset of floor((cdf - 1e-6) * size()) in the
(ascendingly) sorted array.

Args:
  x: a DataSlice of numbers.
  cdf_arg: (float) CDF value.
  ndim: The number of dimensions to compute inverse CDF over. Requires 0 <=
    ndim <= get_ndim(x).

kd.math.agg_max(x, ndim=unspecified)

Aliases:

Returns the maximum of items along the last ndim dimensions.

The resulting slice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Example:
  ds = kd.slice([[2, None, 1], [3, 4], [None, None]])
  kd.agg_max(ds)  # -> kd.slice([2, 4, None])
  kd.agg_max(ds, ndim=1)  # -> kd.slice([2, 4, None])
  kd.agg_max(ds, ndim=2)  # -> kd.slice(4)

Args:
  x: A DataSlice of numbers.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.agg_mean(x, ndim=unspecified)

Returns the means along the last ndim dimensions.

The resulting slice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Example:
  ds = kd.slice([[1, None, None], [3, 4], [None, None]])
  kd.agg_mean(ds)  # -> kd.slice([1, 3.5, None])
  kd.agg_mean(ds, ndim=1)  # -> kd.slice([1, 3.5, None])
  kd.agg_mean(ds, ndim=2)  # -> kd.slice(2.6666666666666) # (1 + 3 + 4) / 3)

Args:
  x: A DataSlice of numbers.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.agg_median(x, ndim=unspecified)

Returns the medians along the last ndim dimensions.

The resulting slice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Please note that for even number of elements, the median is the next value
down from the middle, p.ex.: median([1, 2]) == 1.
That is made by design to fulfill the following property:
1. type of median(x) == type of elements of x;
2. median(x) ∈ x.

Args:
  x: A DataSlice of numbers.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.agg_min(x, ndim=unspecified)

Aliases:

Returns the minimum of items along the last ndim dimensions.

The resulting slice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Example:
  ds = kd.slice([[2, None, 1], [3, 4], [None, None]])
  kd.agg_min(ds)  # -> kd.slice([1, 3, None])
  kd.agg_min(ds, ndim=1)  # -> kd.slice([1, 3, None])
  kd.agg_min(ds, ndim=2)  # -> kd.slice(1)

Args:
  x: A DataSlice of numbers.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.agg_std(x, unbiased=True, ndim=unspecified)

Returns the standard deviation along the last ndim dimensions.

The resulting slice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Example:
  ds = kd.slice([10, 9, 11])
  kd.agg_std(ds)  # -> kd.slice(1.0)
  kd.agg_std(ds, unbiased=False)  # -> kd.slice(0.8164966)

Args:
  x: A DataSlice of numbers.
  unbiased: A boolean flag indicating whether to substract 1 from the number
    of elements in the denominator.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.agg_sum(x, ndim=unspecified)

Aliases:

Returns the sums along the last ndim dimensions.

The resulting slice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Example:
  ds = kd.slice([[1, None, 1], [3, 4], [None, None]])
  kd.agg_sum(ds)  # -> kd.slice([2, 7, None])
  kd.agg_sum(ds, ndim=1)  # -> kd.slice([2, 7, None])
  kd.agg_sum(ds, ndim=2)  # -> kd.slice(9)

Args:
  x: A DataSlice of numbers.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.agg_var(x, unbiased=True, ndim=unspecified)

Returns the variance along the last ndim dimensions.

The resulting slice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Example:
  ds = kd.slice([10, 9, 11])
  kd.agg_var(ds)  # -> kd.slice(1.0)
  kd.agg_var(ds, unbiased=False)  # -> kd.slice([0.6666667])

Args:
  x: A DataSlice of numbers.
  unbiased: A boolean flag indicating whether to substract 1 from the number
    of elements in the denominator.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.argmax(x, ndim=unspecified)

Aliases:

Returns indices of the maximum of items along the last ndim dimensions.

The resulting DataSlice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Returns the index of NaN in case there is a NaN present.

Example:
  ds = kd.slice([[2, None, 1], [3, 4], [None, None], [2, NaN, 1]])
  kd.argmax(ds)  # -> kd.slice([0, 1, None, 1])
  kd.argmax(ds, ndim=1)  # -> kd.slice([0, 1, None, 1])
  kd.argmax(ds, ndim=2)  # -> kd.slice(8) # index of NaN

Args:
  x: A DataSlice of numbers.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.argmin(x, ndim=unspecified)

Aliases:

Returns indices of the minimum of items along the last ndim dimensions.

The resulting DataSlice has `rank = rank - ndim` and shape: `shape =
shape[:-ndim]`.

Returns the index of NaN in case there is a NaN present.

Example:
  ds = kd.slice([[2, None, 1], [3, 4], [None, None], [2, NaN, 1]])
  kd.argmin(ds)  # -> kd.slice([2, 0, None, 1])
  kd.argmin(ds, ndim=1)  # -> kd.slice([2, 0, None, 1])
  kd.argmin(ds, ndim=2)  # -> kd.slice(8) # index of NaN

Args:
  x: A DataSlice of numbers.
  ndim: The number of dimensions to compute indices over. Requires 0 <= ndim
    <= get_ndim(x).

kd.math.cdf(x, weights=unspecified, ndim=unspecified)

Returns the CDF of x in the last ndim dimensions of x element-wise.

The CDF is an array of floating-point values of the same shape as x and
weights, where each element represents which percentile the corresponding
element in x is situated at in its sorted group, i.e. the percentage of values
in the group that are smaller than or equal to it.

Args:
  x: a DataSlice of numbers.
  weights: if provided, will compute weighted CDF: each output value will
    correspond to the weight percentage of values smaller than or equal to x.
  ndim: The number of dimensions to compute CDF over.

kd.math.ceil(x)

Computes pointwise ceiling of the input, e.g.

rounding up: returns the smallest integer value that is not less than the
input.

kd.math.cum_max(x, ndim=unspecified)

Aliases:

Returns the cumulative max of items along the last ndim dimensions.

kd.math.cum_min(x, ndim=unspecified)

Returns the cumulative minimum of items along the last ndim dimensions.

kd.math.cum_sum(x, ndim=unspecified)

Returns the cumulative sum of items along the last ndim dimensions.

kd.math.divide(x, y)

Computes pointwise x / y.

kd.math.exp(x)

Computes pointwise exponential of the input.

kd.math.floor(x)

Computes pointwise floor of the input, e.g.

rounding down: returns the largest integer value that is not greater than the
input.

kd.math.floordiv(x, y)

Computes pointwise x // y.

kd.math.inverse_cdf(x, cdf_arg)

Returns the value with CDF (in [0, 1]) approximately equal to the input.

The return value is computed over all dimensions. It will have an offset of
floor((cdf - 1e-6) * size()) in the (ascendingly) sorted array.

Args:
  x: a DataSlice of numbers.
  cdf_arg: (float) CDF value.

kd.math.is_nan(x)

Aliases:

Returns pointwise `kd.present|missing` if the input is NaN or not.

kd.math.log(x)

Computes pointwise natural logarithm of the input.

kd.math.log10(x)

Computes pointwise logarithm in base 10 of the input.

kd.math.max(x)

Aliases:

Returns the maximum of items over all dimensions.

The result is a zero-dimensional DataItem.

Args:
  x: A DataSlice of numbers.

kd.math.maximum(x, y)

Aliases:

Computes pointwise max(x, y).

kd.math.mean(x)

Returns the mean of elements over all dimensions.

The result is a zero-dimensional DataItem.

Args:
  x: A DataSlice of numbers.

kd.math.median(x)

Returns the median of elements over all dimensions.

The result is a zero-dimensional DataItem.

Please note that for even number of elements, the median is the next value
down from the middle, p.ex.: median([1, 2]) == 1.
That is made by design to fulfill the following property:
1. type of median(x) == type of elements of x;
2. median(x) ∈ x.

Args:
  x: A DataSlice of numbers.

kd.math.min(x)

Aliases:

Returns the minimum of items over all dimensions.

The result is a zero-dimensional DataItem.

Args:
  x: A DataSlice of numbers.

kd.math.minimum(x, y)

Aliases:

Computes pointwise min(x, y).

kd.math.mod(x, y)

Computes pointwise x % y.

kd.math.multiply(x, y)

Computes pointwise x * y.

kd.math.neg(x)

Computes pointwise negation of the input, i.e. -x.

kd.math.normal_distribution_inverse_cdf(x)

Inverse CDF of the standard normal distribution.

Args:
  x: A DataSlice of numbers.

Returns:
  The quantiles corresponding to the probabilities in `x`.

kd.math.pos(x)

Computes pointwise positive of the input, i.e. +x.

kd.math.pow(x, y)

Computes pointwise x ** y.

kd.math.round(x)

Computes pointwise rounding of the input.

Please note that this is NOT bankers rounding, unlike Python built-in or
Tensorflow round(). If the first decimal is exactly  0.5, the result is
rounded to the number with a higher absolute value:
round(1.4) == 1.0
round(1.5) == 2.0
round(1.6) == 2.0
round(2.5) == 3.0 # not 2.0
round(-1.4) == -1.0
round(-1.5) == -2.0
round(-1.6) == -2.0
round(-2.5) == -3.0 # not -2.0

kd.math.sigmoid(x, half=0.0, slope=1.0)

Computes sigmoid of the input.

sigmoid(x) = 1 / (1 + exp(-slope * (x - half)))

Args:
  x: A DataSlice of numbers.
  half: A DataSlice of numbers.
  slope: A DataSlice of numbers.

Returns:
  sigmoid(x) computed with the formula above.

kd.math.sign(x)

Computes the sign of the input.

Args:
  x: A DataSlice of numbers.

Returns:
  A dataslice of with {-1, 0, 1} of the same shape and type as the input.

kd.math.softmax(x, beta=1.0, ndim=unspecified)

Returns the softmax of x alon the last ndim dimensions.

The softmax represents Exp(x * beta) / Sum(Exp(x * beta)) over last ndim
dimensions of x.

Args:
  x: An array of numbers.
  beta: A floating point scalar number that controls the smooth of the
    softmax.
  ndim: The number of last dimensions to compute softmax over.

kd.math.sqrt(x)

Computes pointwise sqrt of the input.

kd.math.subtract(x, y)

Computes pointwise x - y.

kd.math.sum(x)

Aliases:

Returns the sum of elements over all dimensions.

The result is a zero-dimensional DataItem.

Args:
  x: A DataSlice of numbers.

kd.math.t_distribution_inverse_cdf(x, degrees_of_freedom)

Inverse CDF of the Student's t-distribution.

Args:
  x: A DataSlice of numbers.
  degrees_of_freedom: A DataSlice of numbers.

Returns:
  The quantiles corresponding to the probabilities in `x` for the Student's
  t-distribution with the given degrees of freedom.