Check functions
Documentation for the pandas_contract.checks module.
Pandas Check functions.
Check functions for DataFrames and Series that are not handled by Pandera checks.
This involves checks against multiple arguments, i.e., ensure that the length or indices of two arguments match, or that an argument is changed in-place (or a copy is created).
- class pandas_contract.checks.extends(arg: str | None, /, modified: BaseSchema | None)[source]
Bases:
CheckEnsures the resulting DataFrame extends another dataframe.
Check that the resulting dataframe extends another dataframe (provided via argument name). It ensures that a) only the columns are added that are also provided in modified and b) Any other columns have not been modified.
Example Simple example, output must set a column “x”
>>> import pandera.pandas as pa >>> import pandas_contract as pc >>> @pc.result(pc.checks.extends("df", pa.DataFrameSchema({"x": pa.Column(int)}))) ... def my_fn(df: pd.DataFrame) -> pd.DataFrame: ... return df.assign(x=1)
Example Define a function that requires a column “a” and adds a column “x” to the data frame
>>> import pandas_contract as pc >>> @pc.argument("df", pa.DataFrameSchema({"a": pa.Column()})) ... @pc.result(pc.checks.extends("df", pa.DataFrameSchema({"x": pa.Column(int)}))) ... def my_fn(df): ... return df.assign(x=df["a"] + 1) >>> my_fn(pd.DataFrame({"a": [1]})) a x 0 1 2
>>> my_fn(pd.DataFrame({"y": [1]})) Traceback (most recent call last): ValueError: my_fn: Argument df: ...
Example Define a function that adds a column x_col to the DataFrame.
>>> import pandas_contract as pc >>> @pc.result( ... pc.checks.extends( ... "df", ... pa.DataFrameSchema({pc.from_arg("col"): pa.Column(int)}), ... ) ... ) ... def my_fn(df, col="x"): ... return df.assign(**{col: 1}) >>> my_fn(pd.DataFrame(index=[0])) x 0 1
- pandas_contract.checks.is_(arg: str, /) Check | None[source]
Ensure that the result is identical (is operator) to another dataframe.
This check is most useful for the
@resultdecorator as it ensures that the output is changed in-place. It is the opposite of theis_not()check.Example Ensure that the result is the same object as the input argument df, i.e. the function operats in-place.
>>> import pandas_contract as pc >>> @pc.result(pc.checks.is_("df")) ... def fn(df): ... df["x"] = 1 # change df in-place ... return df
- pandas_contract.checks.is_not(args: Sequence[str] | str, /) Check | None[source]
Ensure that the result is not identical (is not operator) to others.
This check is most useful for the
@resultdecorator as it ensures that the output is not changed in-place. It is the opposite of theis_()check.- Parameters:
args – Argument that the result should not be identical to. It can be either a string or an iterable of strings. If it is a string, it will be split by commas.
Example Simple example, ensure that a copy is created.
>>> import pandas_contract as pc >>> @pc.result(pc.checks.is_not("df")) ... def fn(df): ... return df.assign(x=1) # .assign creates a copy
- pandas_contract.checks.removed(columns: list[Any]) Check | None[source]
Ensure given columns are removed.
- Parameters:
columns – List of columns that must not exist in the DataFrame. They can also be dynamically created via
from_arg().
Example Mark drop_x as dropping column x
>>> import pandas_contract as pc >>> @pc.result(pc.checks.removed(["x"])) ... def drop_x(df: pd.DataFrame): ... return df.drop(columns=["x"])
Example Mark drop_cols as dropping columns from function argument arg.
>>> @pc.result(pc.checks.removed([pc.from_arg("cols")])) ... def drop_cols(df: pd.DataFrame, cols: list[str]): ... return df.drop(columns=cols) >>> df = pd.DataFrame([[0, 1, 2]], columns=["a", "b", "c"]) >>> drop_cols(df, cols=["a", "b"]) c 0 2
- pandas_contract.checks.same_index_as(args_: str | Iterable[str] | None, /) Check | None[source]
Check that the DataFrame index is the same as another DataFrame.
This check ensures that the index of the data-frame is identical to the dataframe of another argument (or a list of arguments). This can be useful for both arguments and results.
The argument arg can be either a single argument name, a comma-separated list of argument names or an iterable of argument names.
- Parameters:
args – Argument that the result should have the same index as. It can be either a string or an iterable of strings. If it is a string, it will be split by commas.
Example Simple example, checking that the result has the same index as both df1 and df2
>>> import pandas_contract as pc >>> @pc.result(pc.checks.same_index_as("df, df2")) ... def my_fn(df: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame: ... return df.join(df2)
The following will check the same, but by first ensuring that the indices of the inputs are the same, and then that the resulting index is the same as the input index of df.
>>> @pc.argument("df", pc.checks.same_index_as("df2")) ... @pc.result(pc.checks.same_index_as("df")) ... def my_fn(df: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame: ... return df.join(df2)
- pandas_contract.checks.same_length_as(args_: str | Iterable[str] | None, /) Check | None[source]
Check that the DataFrame length is the same as another DataFrame.
This check ensures that the lenth of the data-frames are identical. This can be useful for both arguments and results.
The argument arg can be either a single argument name, a comma-separated list of argument names or an iterable of argument names.
- Parameters:
args – Argument that the result should have the same length as. It can be either a string or an iterable of strings. If it is a string, it will be split by commas.
Example Simple check that the result length is the same as both df1 and df2.
>>> import pandas_contract as pc >>> @pc.result(pc.checks.same_length_as("df, df2")) ... def my_fn(df: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame: ... return df.join(df2)
The following will check the same, but by first ensuring that the lengths of the inputs are the same, and then that the resulting length is the same as the input length of df.
>>> @pc.argument("df", pc.checks.same_length_as("df2")) ... @pc.result(pc.checks.same_length_as("df")) ... def my_fn(df: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame: ... return df.join(df2)
Check Protocol
- class pandas_contract._private_checks.Check(*args, **kwargs)[source]
Protocol for a DataFrame or Series check class.
A check is a callable that returns a check function.
The check function gets the wrapped function fn, its arguments and kwargs as input. It returns a function that takes a single argument, the value to check (the DataFrame/Series object) and yields a list of errors as strings.