utils.preprocess

Functions

preprocess_data(data, ref, threshold, depth)

Runs the entire preprocessing pipeline for the given data.

utils.preprocess.preprocess_data(data, ref, threshold, depth, sections=[], exclude_cols=[], nan_equivalents={}, verbosity: int = 0)

Runs the entire preprocessing pipeline for the given data.

Parameters:

data (dataframe) – The perovskite data.
ref (dataframe) – The reference data for features.
threshold (float) – Threshold (%) for the feature density. Used to remove sparce data.
depth (float) – Threshold (%) for the feature layer density. Determines how many feature layers are extracted.
sections (list of str, optional) – List of sections to be included. Defaults to [].
exclude_cols (list of str, optional) – List of columns to be excluded. Defaults to [].
nan_equivalents (dict, optional) – Equivalent values for NaN in the dataset. Defaults to {}.
verbosity (int, optional) – Verbosity level. Defaults to 0.

Returns:

The preprocessed data.

Return type:

dataframe