Transform functions

Helper functions to process profile data.

cytominer_eval.transform.transform

cytominer_eval.transform.transform.get_pairwise_metric(df: pandas.core.frame.DataFrame, similarity_metric: str) → pandas.core.frame.DataFrame
cytominer_eval.transform.transform.metric_melt(df: pandas.core.frame.DataFrame, features: List[str], metadata_features: List[str], eval_metric: str = 'replicate_reproducibility', similarity_metric: str = 'pearson') → pandas.core.frame.DataFrame
cytominer_eval.transform.transform.process_melt(df: pandas.core.frame.DataFrame, meta_df: pandas.core.frame.DataFrame, eval_metric: str = 'replicate_reproducibility') → pandas.core.frame.DataFrame

cytominer_eval.transform.util

cytominer_eval.transform.util.assert_eval_metric(eval_metric: str) → None

Helper function to ensure that we support the input eval metric

Parameters

eval_metric (str) – The user input eval metric

Returns

Assertion will fail if we don’t support the input eval metric

Return type

None

cytominer_eval.transform.util.assert_melt(df: pandas.core.frame.DataFrame, eval_metric: str = 'replicate_reproducibility') → None

Helper function to ensure that we properly melted the pairwise correlation matrix

Downstream functions depend on how we process the pairwise correlation matrix. The processing is different depending on the evaluation metric.

Parameters
  • df (pandas.DataFrame) – A melted pairwise correlation matrix

  • eval_metric (str) – The user input eval metric

Returns

Assertion will fail if we incorrectly melted the matrix

Return type

None

cytominer_eval.transform.util.assert_pandas_dtypes(df: pandas.core.frame.DataFrame, col_fix: type = <class 'numpy.float64'>) → pandas.core.frame.DataFrame

Helper funtion to ensure pandas columns have compatible columns

Parameters
  • df (pandas.DataFrame) – A pandas dataframe to convert columns

  • col_fix ({np.float64, np.str}, optional) – A column type to convert the input dataframe.

Returns

A dataframe with converted columns

Return type

pd.DataFrame

cytominer_eval.transform.util.check_grit_replicate_summary_method(replicate_summary_method: str) → None

Helper function to ensure that we support the user input replicate summary

Parameters

replicate_summary_method (str) – The user input replicate summary method

Returns

Assertion will fail if the user inputs an incorrect replicate summary method

Return type

None

cytominer_eval.transform.util.check_replicate_groups(eval_metric: str, replicate_groups: Union[List[str], dict]) → None

Helper function checking that the user correctly constructed the input replicate groups argument

The package will not calculate evaluation metrics with incorrectly constructed replicate_groups. See cytominer_eval.evaluate.evaluate().

Parameters
Returns

Assertion will fail for improperly constructed replicate_groups

Return type

None

cytominer_eval.transform.util.convert_pandas_dtypes(df: pandas.core.frame.DataFrame, col_fix: type = <class 'numpy.float64'>) → pandas.core.frame.DataFrame

Helper funtion to convert pandas column dtypes

Parameters
  • df (pandas.DataFrame) – A pandas dataframe to convert columns

  • col_fix ({np.float64, np.str}, optional) – A column type to convert the input dataframe.

Returns

A dataframe with converted columns

Return type

pd.DataFrame

cytominer_eval.transform.util.get_available_eval_metrics()

Output the available eval metrics in the cytominer_eval library

cytominer_eval.transform.util.get_available_grit_summary_methods()

Output the available metrics for calculating pairwise similarity in the cytominer_eval library

cytominer_eval.transform.util.get_available_similarity_metrics()

Output the available metrics for calculating pairwise similarity in the cytominer_eval library

cytominer_eval.transform.util.get_upper_matrix(df: pandas.core.frame.DataFrame) → numpy.array

Helper function to return only an upper matrix of the size of the input

Parameters

df (pandas.DataFrame) – Any dataframe with a shape

Returns

An upper triangle matrix the same shape as the input dataframe

Return type

np.array

cytominer_eval.transform.util.set_grit_column_info(profile_col: str, replicate_group_col: str) → dict

Transform column names to be used in calculating grit

In calculating grit, the data must have a metadata feature describing the core replicate perturbation (profile_col) and a separate metadata feature(s) describing the larger group (replicate_group_col) that the perturbation belongs to (e.g. gene, MOA).

Parameters
  • profile_col (str) – the metadata column storing profile ids. The column can have unique or replicate identifiers.

  • replicate_group_col (str) – the metadata column indicating a higher order structure (group) than the profile column. E.g. target gene vs. guide in a CRISPR experiment.

Returns

A nested dictionary of renamed columns indicating how to determine replicates

Return type

dict

cytominer_eval.transform.util.set_pair_ids()

Helper function to ensure consistent melted pairiwise column names

Returns

A length two dictionary of suffixes and indeces of two pairs.

Return type

collections.OrderedDict

Module contents