utils.reduction

Functions

`collect_features`(features)	Collects the expanded features from a dictionary of features.
`find_sparsity`(column)	Finds the sparsity of a column of data.
`get_valid_patterns`(ref[, return_invalid])	Finds the features which encode layer data with patterns.
`has_concentrations`(string)	Returns True if the given string has concentrations in it.
`is_pattern`(string)	Returns True if the given string is a pattern.
`is_valid_pattern`(ref)	Returns a mask of the features which encode layer data with patterns.
`matches_regex`(rgx, string)	Returns True if the given string matches the given regex string.
`partition_by_pattern`(refs, keys)	Given a subset of features, partition the features into a patterned and non-patterned set.
`passes_sparsity`(column[, percent])	Checks if a column of data passes the sparsity threshold.
`prune_by_sparsity`(features, threshold)	Prunes a dictionary of features by their sparsity.
`reduce_data`(data[, percent])	Given a dataset return the columns which pass the given sparsity threshold.
`remove_features`(features, remove)	Removes features from a dictionary of features.
`section_features`(sections, ref)	Gets the features for a given section from a reference dictionary.
`sort_by_sparsity`(features)	Sorts a dictionary of features by their sparsity.

utils.reduction.collect_features(features)

Collects the expanded features from a dictionary of features.

Parameters:: features (dict) – Dictionary of features.
Returns:: List of expanded features.
Return type:: list

utils.reduction.find_sparsity(column)

Finds the sparsity of a column of data.

Parameters:: column (series) – Column to be checked.
Returns:: The sparsity of the data.
Return type:: float

utils.reduction.get_valid_patterns(ref, return_invalid: bool = True)

Finds the features which encode layer data with patterns. Returns the feature names.

Parameters:

ref (dataframe) – The reference data for features.
return_invalid (bool) – Option to return a second set containing the names of the features which don’t contain patterns. Default True.

Returns:

patterned feature names. list: nonpatterned features names if return_invalid is True.

Return type:

list

utils.reduction.has_concentrations(string)

Returns True if the given string has concentrations in it.

Used to check which features have concentrations encoded in them.

utils.reduction.is_pattern(string)

Returns True if the given string is a pattern.

Used to check which features have layer data encoded in them.

Examples of valid patterns:: “[Mat.1; Mat.2; … | Mat.3; … | Mat.4 | …]” “[Gas1; Gas2 >> Gas3; … >> … | Gas4 >> … | Gas5 | … ]”

utils.reduction.is_valid_pattern(ref)

Returns a mask of the features which encode layer data with patterns.

Parameters:: ref (dataframe) – The reference data for features.
Returns:: A mask of the features which encode layer data with patterns.
Return type:: series

utils.reduction.matches_regex(rgx, string): Returns True if the given string matches the given regex string.

utils.reduction.partition_by_pattern(refs, keys)

Given a subset of features, partition the features into a patterned and non-patterned set.

Parameters:

refs (dict of dataframe) – A set of references for each section of features. Sections include “Hole transport layer”, “The perovskite”, etc.
keys (list) – List of section names to be included in the partitioning.

Returns:

patterned features. list: nonpatterned features.

Return type:

list

utils.reduction.passes_sparsity(column, percent=0.0)

Checks if a column of data passes the sparsity threshold.

Parameters:

column (series) – Column to be checked.
percent (float) – Percentile threshold for the sparcity of the data. Defaults to 0.95.

Returns:

True if the data sparcity passes the threshold. False otherwise.

Return type:

bool

utils.reduction.prune_by_sparsity(features, threshold)

Prunes a dictionary of features by their sparsity.

Parameters:

features (dict) – Dictionary of features to be pruned.
threshold (float) – Threshold for the sparsity of the data.

Returns:

Pruned dictionary of features.

Return type:

dict

utils.reduction.reduce_data(data, percent=0.0)

Given a dataset return the columns which pass the given sparsity threshold.

Parameters:

data (dataframe) – Data to be reduced.
percent (float) – Percentile threshold for the sparcity of the data.

Returns:

The reduced data.

Return type:

dataframe

utils.reduction.remove_features(features, remove)

Removes features from a dictionary of features.

Parameters:

features (dict) – Dictionary of features.
remove (list) – List of features to be removed.

Returns:

Dictionary of features without the removed features.

Return type:

dict

utils.reduction.section_features(sections, ref)

Gets the features for a given section from a reference dictionary.

Parameters:

sections (list) – List of sections to be included.
ref (dict) – The reference data for features.

Returns:

List of features for the given section.

Return type:

list

utils.reduction.sort_by_sparsity(features)

Sorts a dictionary of features by their sparsity.

Parameters:: features (dict) – Dictionary of features to be sorted.
Returns:: Sorted dictionary of features.
Return type:: dict