utils.database

Module Attributes

SECTION_KEYS

Dictionary of section names and their corresponding shorthand.

NAN_EQUIVALENTS

Keys are equivalent to missing or nan values in the dataset.

DATABASE

An instance of the Perovskite Dataset.

Classes

PerovskiteData([ref, data, ref_file, ...])

Stores the unprocessed perovskite data.

utils.database.DATABASE = <utils.database.PerovskiteData object>

An instance of the Perovskite Dataset.

utils.database.NAN_EQUIVALENTS = {'NAN': None, 'NAN; NAN': None, 'NULL': None, 'NaN': None, 'NaN; NaN': None, 'Nan': None, 'Nan; Nan': None, 'None': None, 'Null': None, 'Unknown': None, 'nan': None, 'nan; nan': None, 'none': None, 'null': None, 'unknown': None}

Keys are equivalent to missing or nan values in the dataset.

Type:

dict

class utils.database.PerovskiteData(ref=None, data=None, ref_file=None, database_file=None, nan_equivalents={}, section_keys={})

Stores the unprocessed perovskite data.

data

The unprocessed data.

Type:

dataframe

ref

The reference data for features. - Field - Type - Default - Unit - Pattern - Implemented - Description - Concerns

Type:

dataframe

database_file

The name of the database file.

Type:

str

ref_file

The name of the reference data file.

Type:

str

nan_equivalents

Equivalent values for NaN in the dataset.

Type:

dict

section_keys

Dictionary of section names and their corresponding shorthand.

Type:

dict

X

The preprocessed and masked data. Stored after preprocess() is called.

Type:

dataframe

y

The masked target Stored after preprocess() is called.

Type:

series

Raises:
  • ValueError – If both ref and ref_file are None.

  • ValueError – If both data and database_file are None.

get_Xy(data, target)

Returns a masked version of the data and target series.

Masks data against the target series excluding NaN target values.

Parameters:
  • data (dataframe) – The perovskite data.

  • target (str) – The name of the target feature.

Returns:

The masked data. series: The masked target.

Return type:

dataframe

preprocess(target, threshold, depth, exclude_sections=[], exclude_cols=[], save: bool = True, verbosity: int = 0)

Generates a preprocessed version of the dataset.

If an unseen set of hyperparameters is used to generate the preprocessed dataset, it is saved for future use. Otherwise, the previously generated file is loaded and returned instead.

Parameters:
  • target (str) – Name of the target feature

  • threshold (float) – Threshold (%) for the feature density. Used to remove sparce data.

  • depth (float) – Threshold (%) for the feature layer density. Determines how many feature layers are extracted.

  • exclude_sections (list of str, optional) – List of sections to be excluded. Defaults to [].

  • exclude_cols (list of str, optional) – List of columns to be excluded. Defaults to [].

  • save (bool, optional) – Whether to save the preprocessed data. Defaults to True.

  • verbosity (int, optional) – Verbosity level. Defaults to 0.

Returns:

The preprocessed data. series: The target data.

Return type:

dataframe

set_Xy(data, target)

Calls getXy(data, target) and stores the output.

utils.database.SECTION_KEYS = {'Additional layers': 'Add_lay', 'Back contact': 'Backcontact', 'Cell definition': 'Cell', 'Electron transport layer': 'ETL', 'Encapsulation': 'Encapsulation', 'Hole transport layer': 'HTL', 'JV data': 'JV', 'Module definition': 'Module', 'Outdoor testing': 'Outdoor', 'Perovskite deposition': 'Perovskite', 'Quantum efficiency': 'EQE', 'Reference information': 'Ref', 'Stabilised efficiency': 'Stabilised', 'Stability': 'Stability', 'Substrate': 'Substrate', 'The perovskite': 'Perovskite'}

Dictionary of section names and their corresponding shorthand.

The shorhand is used as a prefix for feature names.

Type:

dict