cytominer_eval¶

The primary way to use cytominer-eval is through evaluate.py. The operation argument controls which metric to calculate.

evaluate.py¶

Calculate evaluation metrics from profiling experiments.

The primary entrypoint into quickly evaluating profile quality.

cytominer_eval.evaluate.evaluate(profiles: pandas.core.frame.DataFrame, features: List[str], meta_features: List[str], replicate_groups: Union[List[str], dict], operation: str = 'replicate_reproducibility', groupby_columns: List[str] = ['Metadata_broad_sample'], similarity_metric: str = 'pearson', replicate_reproducibility_quantile: float = 0.95, replicate_reproducibility_return_median_cor: bool = False, precision_recall_k: Union[int, List[int]] = 10, grit_control_perts: List[str] = ['None'], grit_replicate_summary_method: str = 'mean', mp_value_params: dict = {}, enrichment_percentile: Union[float, List[float]] = 0.99, hitk_percent_list=[2, 5, 10])¶

Evaluate profile quality and strength.

For a given profile dataframe containing both metadata and feature measurement columns, use this function to calculate profile quality metrics. The function contains all the necessary arguments for specific evaluation operations.

Parameters

profiles (pandas.DataFrame) – profiles must be a pandas DataFrame with profile samples as rows and profile features as columns. The columns should contain both metadata and feature measurements.
features (list) – A list of strings corresponding to feature measurement column names in the profiles DataFrame. All features listed must be found in profiles.
meta_features (list) – A list of strings corresponding to metadata column names in the profiles DataFrame. All features listed must be found in profiles.
replicate_groups ({str, list, dict}) – An important variable indicating which metadata columns denote replicate information. All metric operations require replicate profiles. replicate_groups indicates a str or list of columns to use. For operation=”grit”, replicate_groups is a dict with two keys: “profile_col” and “replicate_group_col”. “profile_col” is the column name that stores identifiers for each profile (can be unique), while “replicate_group_col” is the column name indicating a higher order replicate information. E.g. “replicate_group_col” can be a gene column in a CRISPR experiment with multiple guides targeting the same genes. See also cytominer_eval.operations.grit() and cytominer_eval.transform.util.check_replicate_groups().
operation ({'replicate_reproducibility', 'precision_recall', 'grit', 'mp_value'}, optional) – The specific evaluation metric to calculate. The default is “replicate_reproducibility”.
groupby_columns (List of str) – Only used for operation = ‘precision_recall’ and ‘hitk’ Column by which the similarity matrix is grouped and by which the operation is calculated. For example, if groupby_column = “Metadata_broad_sample” then precision/recall is calculated for each sample. Note that it makes sense for these columns to be unique or to span a unique space since precision and hitk may otherwise stop making sense.
similarity_metric ({'pearson', 'spearman', 'kendall'}, optional) – How to calculate pairwise similarity. Defaults to “pearson”. We use the input in pandas.DataFrame.cor(). The default is “pearson”.

Returns

The resulting evaluation metric. The return is either a single value or a pandas DataFrame summarizing the metric as specified in operation.

Return type

float, pd.DataFrame

Other Parameters

replicate_reproducibility_quantile ({0.95, …}, optional) – Only used when operation=’replicate_reproducibility’. This indicates the percentile of the non-replicate pairwise similarity to consider a reproducible phenotype. Defaults to 0.95.
replicate_reproducibility_return_median_cor (bool, optional) – Only used when operation=’replicate_reproducibility’. If True, then also return pairwise correlations as defined by replicate_groups and similarity metric
precision_recall_k (int or list of ints {10, …}, optional) – Only used when operation=’precision_recall’. Used to calculate precision and recall considering the top k profiles according to pairwise similarity.
grit_control_perts ({None, …}, optional) – Only used when operation=’grit’. Specific profile identifiers used as a reference when calculating grit. The list entries must be found in the replicate_groups[replicate_id] column.
grit_replicate_summary_method ({“mean”, “median”}, optional) – Only used when operation=’grit’. Defines how the replicate z scores are summarized. see cytominer_eval.operations.util.calculate_grit()
mp_value_params ({{}, …}, optional) – Only used when operation=’mp_value’. A key, item pair of optional parameters for calculating mp value. See also cytominer_eval.operations.util.default_mp_value_parameters()
enrichment_percentile (float or list of floats, optional) – Only used when operation=’enrichment’. Determines the percentage of top connections used for the enrichment calculation.
hitk_percent_list (list or “all”) – Only used when operation=’hitk’. Default : [2,5,10] A list of percentages at which to calculate the percent scores, ie the amount of indexes below this percentage. If percent_list == “all” a full dict with the length of classes will be created. Percentages are given as integers, ie 50 means 50 %.

cytominer_eval.operations¶

Metrics