evaluate_contingency_table()¶

datasafari.evaluator.evaluate_contingency_table( contingency_table: DataFrame, min_sample_size_yates: int = 40, pipeline: bool = False, quiet: bool = False, ) → dict | tuple[source]¶

Evaluate the suitability of statistical tests for a given contingency table by analyzing its characteristics and guiding the selection of appropriate tests.

This function assesses the contingency table’s suitability for chi-square tests, exact tests (Barnard’s, Boschloo’s, and Fisher’s), and the application of Yates’ correction within the chi-square test. It examines expected and observed frequencies, sample size, and table shape to guide the choice of appropriate statistical tests for hypothesis testing.

Parameters:¶

contingency_tablepd.DataFrame

A contingency table generated from two categorical variables.

min_sample_size_yatesint, optional, default: 40

The minimum sample size below which Yates’ correction should be considered.

pipelinebool, optional, default: False

Determines the format of the output.

True Outputs a tuple of boolean values representing the viability of each test.
False Outputs a dictionary with the test names as keys and their viabilities as boolean values.

quietbool, optional, default: False

Determines if output is printed to the console.

True Output is printed.
False Output is not printed.

Returns:¶

dict or tuple

Depending on the ‘pipeline’ parameter:

dict If pipeline=False, returns a dictionary with keys as test names (‘chi2_contingency’, ‘yates_correction’, ‘barnard_exact’, ‘boschloo_exact’, ‘fisher_exact’) and values as boolean indicators of their viability.
tuple If pipeline=True, returns a tuple of boolean values in the order: (chi2_viability, yates_correction_viability, barnard_viability, boschloo_viability, fisher_viability).

Raises:¶

TypeErrors:

If contingency_table is not a pandas DataFrame.
If min_sample_size_yates is not an integer.
If pipeline or quiet is not a boolean.

ValueErrors:

If the contingency_table is empty.
If min_sample_size_yates is not a positive integer.

Examples:¶

Creating a contingency table from a small dataset and evaluating it:

>>> import datasafari
>>> import pandas as pd
>>> data = {
...     'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
...     'Preference': ['Tea', 'Coffee', 'Coffee', 'Tea', 'Tea']
... }
>>> df_small = pd.DataFrame(data)
>>> contingency_small = pd.crosstab(df_small['Gender'], df_small['Preference'])
>>> viability_dict_small = evaluate_contingency_table(contingency_small)

Using a larger dataset to demonstrate the effect of sample size on test viability:

>>> import datasafari
>>> import pandas as pd
>>> import numpy as np
>>> data_large = {
...     'Gender': np.random.choice(['Male', 'Female'], 200),
...     'Preference': np.random.choice(['Tea', 'Coffee'], 200)
... }
>>> df_large = pd.DataFrame(data_large)
>>> contingency_large = pd.crosstab(df_large['Gender'], df_large['Preference'])
>>> viability_dict_large = evaluate_contingency_table(contingency_large)
...
>>> # Applying the function in a pipeline to make further decisions:
>>> contingency_pipeline = pd.crosstab(df_large['Gender'], df_large['Preference'])
>>> chi2, yates, barnard, boschloo, fisher = evaluate_contingency_table(contingency_pipeline, pipeline=True)
>>> if chi2:
>>>     print("Chi-square test is viable for this dataset.")
>>> else:
>>>     print("Consider alternative tests such as Fisher's exact test.")