utils.reduction

Functions

collect_features(features)

Collects the expanded features from a dictionary of features.

find_sparsity(column)

Finds the sparsity of a column of data.

get_valid_patterns(ref[, return_invalid])

Finds the features which encode layer data with patterns.

has_concentrations(string)

Returns True if the given string has concentrations in it.

is_pattern(string)

Returns True if the given string is a pattern.

is_valid_pattern(ref)

Returns a mask of the features which encode layer data with patterns.

matches_regex(rgx, string)

Returns True if the given string matches the given regex string.

partition_by_pattern(refs, keys)

Given a subset of features, partition the features into a patterned and non-patterned set.

passes_sparsity(column[, percent])

Checks if a column of data passes the sparsity threshold.

prune_by_sparsity(features, threshold)

Prunes a dictionary of features by their sparsity.

reduce_data(data[, percent])

Given a dataset return the columns which pass the given sparsity threshold.

remove_features(features, remove)

Removes features from a dictionary of features.

section_features(sections, ref)

Gets the features for a given section from a reference dictionary.

sort_by_sparsity(features)

Sorts a dictionary of features by their sparsity.

utils.reduction.collect_features(features)

Collects the expanded features from a dictionary of features.

Parameters:

features (dict) – Dictionary of features.

Returns:

List of expanded features.

Return type:

list

utils.reduction.find_sparsity(column)

Finds the sparsity of a column of data.

Parameters:

column (series) – Column to be checked.

Returns:

The sparsity of the data.

Return type:

float

utils.reduction.get_valid_patterns(ref, return_invalid: bool = True)

Finds the features which encode layer data with patterns. Returns the feature names.

Parameters:
  • ref (dataframe) – The reference data for features.

  • return_invalid (bool) – Option to return a second set containing the names of the features which don’t contain patterns. Default True.

Returns:

patterned feature names. list: nonpatterned features names if return_invalid is True.

Return type:

list

utils.reduction.has_concentrations(string)

Returns True if the given string has concentrations in it.

Used to check which features have concentrations encoded in them.

utils.reduction.is_pattern(string)

Returns True if the given string is a pattern.

Used to check which features have layer data encoded in them.

Examples of valid patterns:

“[Mat.1; Mat.2; … | Mat.3; … | Mat.4 | …]” “[Gas1; Gas2 >> Gas3; … >> … | Gas4 >> … | Gas5 | … ]”

utils.reduction.is_valid_pattern(ref)

Returns a mask of the features which encode layer data with patterns.

Parameters:

ref (dataframe) – The reference data for features.

Returns:

A mask of the features which encode layer data with patterns.

Return type:

series

utils.reduction.matches_regex(rgx, string)

Returns True if the given string matches the given regex string.

utils.reduction.partition_by_pattern(refs, keys)

Given a subset of features, partition the features into a patterned and non-patterned set.

Parameters:
  • refs (dict of dataframe) – A set of references for each section of features. Sections include “Hole transport layer”, “The perovskite”, etc.

  • keys (list) – List of section names to be included in the partitioning.

Returns:

patterned features. list: nonpatterned features.

Return type:

list

utils.reduction.passes_sparsity(column, percent=0.0)

Checks if a column of data passes the sparsity threshold.

Parameters:
  • column (series) – Column to be checked.

  • percent (float) – Percentile threshold for the sparcity of the data. Defaults to 0.95.

Returns:

True if the data sparcity passes the threshold. False otherwise.

Return type:

bool

utils.reduction.prune_by_sparsity(features, threshold)

Prunes a dictionary of features by their sparsity.

Parameters:
  • features (dict) – Dictionary of features to be pruned.

  • threshold (float) – Threshold for the sparsity of the data.

Returns:

Pruned dictionary of features.

Return type:

dict

utils.reduction.reduce_data(data, percent=0.0)

Given a dataset return the columns which pass the given sparsity threshold.

Parameters:
  • data (dataframe) – Data to be reduced.

  • percent (float) – Percentile threshold for the sparcity of the data.

Returns:

The reduced data.

Return type:

dataframe

utils.reduction.remove_features(features, remove)

Removes features from a dictionary of features.

Parameters:
  • features (dict) – Dictionary of features.

  • remove (list) – List of features to be removed.

Returns:

Dictionary of features without the removed features.

Return type:

dict

utils.reduction.section_features(sections, ref)

Gets the features for a given section from a reference dictionary.

Parameters:
  • sections (list) – List of sections to be included.

  • ref (dict) – The reference data for features.

Returns:

List of features for the given section.

Return type:

list

utils.reduction.sort_by_sparsity(features)

Sorts a dictionary of features by their sparsity.

Parameters:

features (dict) – Dictionary of features to be sorted.

Returns:

Sorted dictionary of features.

Return type:

dict