Transform functions¶
Helper functions to process profile data.
cytominer_eval.transform.transform¶
-
cytominer_eval.transform.transform.get_pairwise_metric(df: pandas.core.frame.DataFrame, similarity_metric: str) → pandas.core.frame.DataFrame¶
-
cytominer_eval.transform.transform.metric_melt(df: pandas.core.frame.DataFrame, features: List[str], metadata_features: List[str], eval_metric: str = 'replicate_reproducibility', similarity_metric: str = 'pearson') → pandas.core.frame.DataFrame¶
-
cytominer_eval.transform.transform.process_melt(df: pandas.core.frame.DataFrame, meta_df: pandas.core.frame.DataFrame, eval_metric: str = 'replicate_reproducibility') → pandas.core.frame.DataFrame¶
cytominer_eval.transform.util¶
-
cytominer_eval.transform.util.assert_eval_metric(eval_metric: str) → None¶ Helper function to ensure that we support the input eval metric
- Parameters
eval_metric (str) – The user input eval metric
- Returns
Assertion will fail if we don’t support the input eval metric
- Return type
None
-
cytominer_eval.transform.util.assert_melt(df: pandas.core.frame.DataFrame, eval_metric: str = 'replicate_reproducibility') → None¶ Helper function to ensure that we properly melted the pairwise correlation matrix
Downstream functions depend on how we process the pairwise correlation matrix. The processing is different depending on the evaluation metric.
- Parameters
df (pandas.DataFrame) – A melted pairwise correlation matrix
eval_metric (str) – The user input eval metric
- Returns
Assertion will fail if we incorrectly melted the matrix
- Return type
None
-
cytominer_eval.transform.util.assert_pandas_dtypes(df: pandas.core.frame.DataFrame, col_fix: type = <class 'numpy.float64'>) → pandas.core.frame.DataFrame¶ Helper funtion to ensure pandas columns have compatible columns
- Parameters
df (pandas.DataFrame) – A pandas dataframe to convert columns
col_fix ({np.float64, np.str}, optional) – A column type to convert the input dataframe.
- Returns
A dataframe with converted columns
- Return type
pd.DataFrame
-
cytominer_eval.transform.util.check_grit_replicate_summary_method(replicate_summary_method: str) → None¶ Helper function to ensure that we support the user input replicate summary
- Parameters
replicate_summary_method (str) – The user input replicate summary method
- Returns
Assertion will fail if the user inputs an incorrect replicate summary method
- Return type
None
-
cytominer_eval.transform.util.check_replicate_groups(eval_metric: str, replicate_groups: Union[List[str], dict]) → None¶ Helper function checking that the user correctly constructed the input replicate groups argument
The package will not calculate evaluation metrics with incorrectly constructed replicate_groups. See
cytominer_eval.evaluate.evaluate().- Parameters
eval_metric (str) – Which evaluation metric to calculate. See
cytominer_eval.transform.util.get_available_eval_metrics().replicate_groups ({list, dict}) – The tentative data structure listing replicate groups
- Returns
Assertion will fail for improperly constructed replicate_groups
- Return type
None
-
cytominer_eval.transform.util.convert_pandas_dtypes(df: pandas.core.frame.DataFrame, col_fix: type = <class 'numpy.float64'>) → pandas.core.frame.DataFrame¶ Helper funtion to convert pandas column dtypes
- Parameters
df (pandas.DataFrame) – A pandas dataframe to convert columns
col_fix ({np.float64, np.str}, optional) – A column type to convert the input dataframe.
- Returns
A dataframe with converted columns
- Return type
pd.DataFrame
-
cytominer_eval.transform.util.get_available_eval_metrics()¶ Output the available eval metrics in the cytominer_eval library
-
cytominer_eval.transform.util.get_available_grit_summary_methods()¶ Output the available metrics for calculating pairwise similarity in the cytominer_eval library
-
cytominer_eval.transform.util.get_available_similarity_metrics()¶ Output the available metrics for calculating pairwise similarity in the cytominer_eval library
-
cytominer_eval.transform.util.get_upper_matrix(df: pandas.core.frame.DataFrame) → numpy.array¶ Helper function to return only an upper matrix of the size of the input
- Parameters
df (pandas.DataFrame) – Any dataframe with a shape
- Returns
An upper triangle matrix the same shape as the input dataframe
- Return type
np.array
-
cytominer_eval.transform.util.set_grit_column_info(profile_col: str, replicate_group_col: str) → dict¶ Transform column names to be used in calculating grit
In calculating grit, the data must have a metadata feature describing the core replicate perturbation (profile_col) and a separate metadata feature(s) describing the larger group (replicate_group_col) that the perturbation belongs to (e.g. gene, MOA).
- Parameters
profile_col (str) – the metadata column storing profile ids. The column can have unique or replicate identifiers.
replicate_group_col (str) – the metadata column indicating a higher order structure (group) than the profile column. E.g. target gene vs. guide in a CRISPR experiment.
- Returns
A nested dictionary of renamed columns indicating how to determine replicates
- Return type
dict
-
cytominer_eval.transform.util.set_pair_ids()¶ Helper function to ensure consistent melted pairiwise column names
- Returns
A length two dictionary of suffixes and indeces of two pairs.
- Return type
collections.OrderedDict