Expectation Reference#

This page gives an overview of all available datapact expectations.

class datapact.Asserter(parent: SeriesTest, critical: bool)#
be_between(minimum: float, maximum: float)#

checks the value range.

Parameters:
  • minimum – if there’s a value lower than this, it will fail.

  • maximum – if there’s a value higher than this, it will fail.

Examples

>>> dp.age.should.be_between(0, 150)
be_positive()#

checks if all values are 0 or higher.

Examples

>>> dp.age.should.be_positive()
be_negative()#

checks if all values are 0 or smaller.

Examples

>>> dp.debt.should.be_negative()
not_be_na()#

checks if all values are non-na.

Examples

>>> dp.user_id.must.not_be_na()
be_one_of(*allowed_values)#

checks if values are in the allowed values.

Examples

>>> dp.state.must.be_one_of("active", "sleeping", "inactive")
be_date()#

checks if all values are ISO8601-compliant dates. datetimes will be rejected.

Examples

>>> dp.day.must.be_date()
be_datetime()#

checks if all values are ISO8601-compliant datetimes.

Examples

>>> dp.timestamp.must.be_datetime()
be_unix_epoch()#

checks if all values are unix epoch-compliant timestamps.

Examples

>>> dp.timestamp.must.be_unix_epoch()
be_normal_distributed(alpha: float = 0.05)#

performs a normaltest.

uses scipy.stats.normaltest under the hood.

Parameters:

alpha – sensitivity of the test. low value = more sensitive.

Examples

>>> dp.salary.should.be_normal(alpha=0.1)
match_sample(sample, alpha=0.05)#

checks if series is from same distribution as sample using a kolmogorov-smirnoff-test.

Parameters:

sample – list-like sample to compare to

Examples

>>> dp.age.should.match_sample(reference_sample)
match_cdf(cdf: Callable, args, N=20, alpha=0.05)#

checks if series is from distribution as given by cdf using a kolmogorov-smirnoff-test.

Parameters:
  • cdf – Callable Used to calculate the cdf.

  • args – Distribution parameters, given to cdf.

Examples

>>> dp.wins.should.match_sample(scipy.stats.binom)
be_binomial_distributed(n, p=0.5, N=20, alpha=0.05)#

checks if series is binomial distributed using a kolmogorov-smirnoff-test.

Parameters:
  • n – number number of draws

  • p – probability of success

Examples

>>> dp.heads.should.be_binomial_distributed()
be_poisson_distributed(l, N=20, alpha=0.05)#

checks if series is poisson distributed using a kolmogorov-smirnoff-test.

Parameters:

l – number lambda for poission distribution

Examples

>>> dp.new_covid_cases.should.be_poisson_distributed(10)
have_average_between(minimum: float, maximum: float)#

checks if average is between min and max.

Examples

>>> dp.size.should.have_average_between(4, 5)
have_variance_between(minimum: float, maximum: float)#

checks if average is in given range.

Examples

>>> dp.size.should.have_variance_between(150, 170)
have_median_between(minimum: float, maximum: float)#

checks if median is in given range.

Examples

>>> dp.size.should.have_median_between(150, 170)
have_percentile_between(p: float, minimum: float, maximum: float)#

checks if percentile / quantile is in given range.

Examples

>>> dp.size.should.have_percentile_between(.95, 10, 20)
have_no_outliers(alpha=0.05)#

verifies that series has no outliers using Grubbs test.

Examples

>>> dp.size.should.have_no_outliers()
fulfill(custom_assertion: Callable[[Series], Optional[str]])#

checks if series passes your custom validator

Examples

>>> def custom_assertion(series: pandas.Series):
...     if series.max() > 100:
...         return "too high"
>>> dp.user_id.must.fulfill(custom_assertion)