utils.preprocess

Functions

preprocess_data(data, ref, threshold, depth)

Runs the entire preprocessing pipeline for the given data.

utils.preprocess.preprocess_data(data, ref, threshold, depth, sections=[], exclude_cols=[], nan_equivalents={}, verbosity: int = 0)

Runs the entire preprocessing pipeline for the given data.

Parameters:
  • data (dataframe) – The perovskite data.

  • ref (dataframe) – The reference data for features.

  • threshold (float) – Threshold (%) for the feature density. Used to remove sparce data.

  • depth (float) – Threshold (%) for the feature layer density. Determines how many feature layers are extracted.

  • sections (list of str, optional) – List of sections to be included. Defaults to [].

  • exclude_cols (list of str, optional) – List of columns to be excluded. Defaults to [].

  • nan_equivalents (dict, optional) – Equivalent values for NaN in the dataset. Defaults to {}.

  • verbosity (int, optional) – Verbosity level. Defaults to 0.

Returns:

The preprocessed data.

Return type:

dataframe