utils.dataset
Functions
|
Generates an expanded dataset for a given target feature. |
|
Generates a list of groups for a given target feature and group feature. |
|
Generates a database reference. |
|
Checks if a dataset exists for a given target feature. |
|
Checks if the database reference exists. |
|
Loads the data and metadata for a given target feature. |
|
Loads the database reference. |
|
Saves the data and metadata for a given target feature. |
|
Saves a database reference. |
Classes
|
Stores the expanded perovskite data and metadata for a given target feature. |
- class utils.dataset.DataSet(target, group_by=None)
Stores the expanded perovskite data and metadata for a given target feature.
- target
Name of the target feature.
- Type:
str
- data
The expanded data.
- Type:
dataframe
- reference
The database reference.
- Type:
dict
- all_features
The full set of features for the expanded data.
- Type:
dict
- features
A reduced set of features generated during preprocessing.
- Type:
dict
- groups
A list of groups for the target feature.
- Type:
list
- collect_features()
Collects the features from the reduced set of features.
- Returns:
The list of features.
- Return type:
list
- get_Xy()
Returns the reduced data and the target series.
- Returns:
The reduced data. series: The target series.
- Return type:
dataframe
- get_dataset(database=<utils.database.PerovskiteData object>, save: bool = True)
Loads the data and metadata for a given target feature.
Stores the data and metadata in the class attributes.
- Parameters:
database (PerovskiteDatabase, optional) – An instance of the Perovskite Dataset. Defaults to DATABASE.
save (bool, optional) – Whether to save the dataset. Defaults to True.
- Returns:
None
- preprocess(threshold=None, exclude_sections=[], exclude_cols=[])
Preprocesses the data.
If an unseen target is used to generate the preprocessed dataset, it is saved for future use. Otherwise, the previously generated file is loaded and returned instead.
- Parameters:
threshold (float, optional) – The sparsity threshold. Defaults to None.
exclude_sections (list, optional) – The list of sections to exclude. Defaults to [].
exclude_cols (list, optional) – The list of columns to exclude. Defaults to [].
- Returns:
The preprocessed data. series: The target series.
- Return type:
dataframe
- prune_by_sparsity(threshold)
Prunes the reduced set of features by sparsity.
- Parameters:
threshold (float) – The sparsity threshold.
- Returns:
None
- remove(sections=[], features=[])
Removes both sections and features from the reduced set of features.
- Parameters:
sections (list) – The list of sections to remove. Defaults to [].
features (list) – The list of features to remove. Defaults to [].
- Returns:
None
- remove_features(features)
Removes features from the reduced set of features.
- Parameters:
features (list) – The list of features to remove.
- Returns:
None
- remove_sections(sections)
Removes an entire section of features from the reduced set of features.
- Parameters:
sections (list) – The list of sections to remove.
- Returns:
None
- reset_features()
Resets the reduced set of features to the full set of features.
- Returns:
None
- utils.dataset.generate_dataset(target, database=<utils.database.PerovskiteData object>, save=True)
Generates an expanded dataset for a given target feature.
- Parameters:
target (str) – Name of the target feature.
database (PerovskiteDatabase, optional) – An instance of the Perovskite Dataset. Defaults to DATABASE.
save (bool, optional) – Whether to save the dataset. Defaults to True.
- Returns:
The data. dict: The data features. dict: The database reference.
- Return type:
dataframe
- utils.dataset.generate_groups(target, group, database=<utils.database.PerovskiteData object>)
Generates a list of groups for a given target feature and group feature.
- Parameters:
target (str) – Name of the target feature.
group (str) – Name of the feature to group by.
- Returns:
The list of groups.
- Return type:
list
- utils.dataset.generate_reference(refs)
Generates a database reference.
- Parameters:
refs (dataframe) – An expanded representation of the database reference.
- Returns:
The reference.
- Return type:
dict
- utils.dataset.has_dataset(target, in_path='C:\\Users\\Violet\\Documents\\GitHub\\PerovskiteML\\data\\expanded')
Checks if a dataset exists for a given target feature.
- Parameters:
target (str) – Name of the target feature.
in_path (str, optional) – Path to the directory containing the dataset. Defaults to EXPAND_DIR.
- Returns:
True if the dataset exists. False otherwise.
- Return type:
bool
- utils.dataset.has_reference(path='C:\\Users\\Violet\\Documents\\GitHub\\PerovskiteML\\data\\expanded')
Checks if the database reference exists.
- Parameters:
path (str, optional) – Path to the directory containing the reference. Defaults to EXPAND_DIR.
- Returns:
True if the reference exists. False otherwise.
- Return type:
bool
- utils.dataset.load_dataset(target: str, in_path='C:\\Users\\Violet\\Documents\\GitHub\\PerovskiteML\\data\\expanded')
Loads the data and metadata for a given target feature.
- Parameters:
target (str) – Name of the target feature.
in_path (str, optional) – Path to the directory to load the dataset. Defaults to EXPAND_DIR.
- Returns:
The data. dict: The features.
- Return type:
dataframe
- utils.dataset.load_reference(in_path='C:\\Users\\Violet\\Documents\\GitHub\\PerovskiteML\\data\\expanded')
Loads the database reference.
- Parameters:
in_path (str, optional) – Path to the directory to load the reference. Defaults to EXPAND_DIR.
- Returns:
The reference.
- Return type:
dict
- utils.dataset.save_dataset(data, features: dict, target: str, out_path='C:\\Users\\Violet\\Documents\\GitHub\\PerovskiteML\\data\\expanded')
Saves the data and metadata for a given target feature.
See the README.md in ./data for more information about the file structure.
- Parameters:
data (dataframe) – The data.
features (dict) – The features.
target (str) – Name of the target feature.
out_path (str, optional) – Path to the directory to save the dataset. Defaults to EXPAND_DIR.
- Returns:
None
- utils.dataset.save_reference(ref, out_path='C:\\Users\\Violet\\Documents\\GitHub\\PerovskiteML\\data\\expanded')
Saves a database reference.
See the README.md in ./data for more information about the file structure.
- Parameters:
ref (dict) – The reference.
out_path (str, optional) – Path to the directory to save the reference. Defaults to EXPAND_DIR.
- Returns:
None