utils.database
Module Attributes
Dictionary of section names and their corresponding shorthand. |
|
Keys are equivalent to missing or nan values in the dataset. |
|
An instance of the Perovskite Dataset. |
Classes
|
Stores the unprocessed perovskite data. |
- utils.database.DATABASE = <utils.database.PerovskiteData object>
An instance of the Perovskite Dataset.
- utils.database.NAN_EQUIVALENTS = {'NAN': None, 'NAN; NAN': None, 'NULL': None, 'NaN': None, 'NaN; NaN': None, 'Nan': None, 'Nan; Nan': None, 'None': None, 'Null': None, 'Unknown': None, 'nan': None, 'nan; nan': None, 'none': None, 'null': None, 'unknown': None}
Keys are equivalent to missing or nan values in the dataset.
- Type:
dict
- class utils.database.PerovskiteData(ref=None, data=None, ref_file=None, database_file=None, nan_equivalents={}, section_keys={})
Stores the unprocessed perovskite data.
- data
The unprocessed data.
- Type:
dataframe
- ref
The reference data for features. - Field - Type - Default - Unit - Pattern - Implemented - Description - Concerns
- Type:
dataframe
- database_file
The name of the database file.
- Type:
str
- ref_file
The name of the reference data file.
- Type:
str
- nan_equivalents
Equivalent values for NaN in the dataset.
- Type:
dict
- section_keys
Dictionary of section names and their corresponding shorthand.
- Type:
dict
- X
The preprocessed and masked data. Stored after preprocess() is called.
- Type:
dataframe
- y
The masked target Stored after preprocess() is called.
- Type:
series
- Raises:
ValueError – If both ref and ref_file are None.
ValueError – If both data and database_file are None.
- get_Xy(data, target)
Returns a masked version of the data and target series.
Masks data against the target series excluding NaN target values.
- Parameters:
data (dataframe) – The perovskite data.
target (str) – The name of the target feature.
- Returns:
The masked data. series: The masked target.
- Return type:
dataframe
- preprocess(target, threshold, depth, exclude_sections=[], exclude_cols=[], save: bool = True, verbosity: int = 0)
Generates a preprocessed version of the dataset.
If an unseen set of hyperparameters is used to generate the preprocessed dataset, it is saved for future use. Otherwise, the previously generated file is loaded and returned instead.
- Parameters:
target (str) – Name of the target feature
threshold (float) – Threshold (%) for the feature density. Used to remove sparce data.
depth (float) – Threshold (%) for the feature layer density. Determines how many feature layers are extracted.
exclude_sections (list of str, optional) – List of sections to be excluded. Defaults to [].
exclude_cols (list of str, optional) – List of columns to be excluded. Defaults to [].
save (bool, optional) – Whether to save the preprocessed data. Defaults to True.
verbosity (int, optional) – Verbosity level. Defaults to 0.
- Returns:
The preprocessed data. series: The target data.
- Return type:
dataframe
- set_Xy(data, target)
Calls getXy(data, target) and stores the output.
- utils.database.SECTION_KEYS = {'Additional layers': 'Add_lay', 'Back contact': 'Backcontact', 'Cell definition': 'Cell', 'Electron transport layer': 'ETL', 'Encapsulation': 'Encapsulation', 'Hole transport layer': 'HTL', 'JV data': 'JV', 'Module definition': 'Module', 'Outdoor testing': 'Outdoor', 'Perovskite deposition': 'Perovskite', 'Quantum efficiency': 'EQE', 'Reference information': 'Ref', 'Stabilised efficiency': 'Stabilised', 'Stability': 'Stability', 'Substrate': 'Substrate', 'The perovskite': 'Perovskite'}
Dictionary of section names and their corresponding shorthand.
The shorhand is used as a prefix for feature names.
- Type:
dict