CountsQC (opennano.qc)¶
The CountsQC class performs quality control (QC) checks on NanoString GeoMx data.
- class opennano.qc.CountsQC(adata=None, dcc_directory=None, pkc_file=None, metadata_file=None, minSegmentReads=1000, percentTrimmed=80, percentStitched=80, percentAligned=75, percentSaturation=50, minNegativeCount=10, maxNTCCount=9000, minNuclei=20, minArea=1000, negative_probe_cutoff=1.1)¶
Bases:
object
A class for performing quality control (QC) checks on an AnnData object.
- __init__(adata=None, dcc_directory=None, pkc_file=None, metadata_file=None, minSegmentReads=1000, percentTrimmed=80, percentStitched=80, percentAligned=75, percentSaturation=50, minNegativeCount=10, maxNTCCount=9000, minNuclei=20, minArea=1000, negative_probe_cutoff=1.1)¶
Initialize the CountsQC class for performing quality control on GeoMx data.
This constructor initializes a CountsQC object either from an existing AnnData object or by processing .dcc, .pkc, and metadata files to generate an AnnData object. It also sets quality control thresholds for various metrics.
- Parameters:
adata (AnnData, optional) – An existing AnnData object to initialize the QC process. If not provided, the dcc_directory, pkc_file, and metadata_file parameters must be specified to create the AnnData object.
dcc_directory (str, optional) – Path to the directory containing .dcc files. Required if adata is not provided.
pkc_file (str, optional) – Path to the .pkc file. Required if adata is not provided.
metadata_file (str, optional) – Path to the GEO SOFT metadata file. Required if adata is not provided.
minSegmentReads (int, default=1000) – Minimum number of reads required for a segment to pass QC.
percentTrimmed (int, default=80) – Minimum percentage of trimmed reads required for a segment to pass QC.
percentStitched (int, default=80) – Minimum percentage of stitched reads required for a segment to pass QC.
percentAligned (int, default=75) – Minimum percentage of aligned reads required for a segment to pass QC.
percentSaturation (int, default=50) – Minimum sequencing saturation percentage required for a segment to pass QC.
minNegativeCount (int, default=10) – Minimum count of negative probes required for a segment to pass QC.
maxNTCCount (int, default=9000) – Maximum count for no-template control (NTC) probes allowed for a segment to pass QC.
minNuclei (int, default=20) – Minimum number of nuclei required for a segment to pass QC.
minArea (int, default=1000) – Minimum area (in pixels or other units) required for a segment to pass QC.
- Raises:
ValueError – If adata is not provided and any of dcc_directory, pkc_file, or metadata_file is missing.
Notes
If adata is not provided, the class processes the GeoMx data from the provided files (dcc_directory, pkc_file, and metadata_file) using the GeoMxProcessor class.
The initialized object contains metadata and quality control thresholds that can be used for running QC checks and generating filtered datasets.
- adata¶
The AnnData object containing the GeoMx data, either provided or created during initialization.
- Type:
AnnData
- df¶
DataFrame representation of the expression matrix from the AnnData object.
- Type:
pandas.DataFrame
- roi_metadata¶
List of regions of interest (ROIs) from the unstructured metadata in the AnnData object.
- Type:
list
- neg_probe_indices¶
Indices of negative probes in the AnnData object.
- Type:
pandas.Index
- passed_rois¶
List of ROIs that pass all QC checks, initialized as empty.
- Type:
list
- calc_min_area(adata, idx)¶
Retrieve the minimum area for a specific ROI.
- Parameters:
adata (AnnData) – The AnnData object containing expression and metadata.
idx (str) – The key of the ROI in the unstructured data (uns) of the AnnData object.
- Returns:
The minimum area for the ROI.
- Return type:
int
- calc_min_nuclei(adata, idx)¶
Retrieve the minimum nuclei count for a specific ROI.
- Parameters:
adata (AnnData) – The AnnData object containing expression and metadata.
idx (str) – The key of the ROI in the unstructured data (uns) of the AnnData object.
- Returns:
The minimum nuclei count for the ROI.
- Return type:
int
- calc_percent_aligned(adata, idx)¶
Calculate the percentage of aligned reads for a specific ROI.
- Parameters:
adata (AnnData) – The AnnData object containing expression and metadata.
idx (str) – The key of the ROI in the unstructured data (uns) of the AnnData object.
- Returns:
The percentage of reads that were aligned.
- Return type:
float
- calc_percent_saturation(adata, idx)¶
Calculate the sequencing saturation for a specific ROI.
- Parameters:
adata (AnnData) – The AnnData object containing expression and metadata.
idx (str) – The key of the ROI in the unstructured data (uns) of the AnnData object.
- Returns:
The sequencing saturation value for the ROI.
- Return type:
float
- calc_percent_stitched(adata, idx)¶
Calculate the percentage of stitched reads for a specific ROI.
- Parameters:
adata (AnnData) – The AnnData object containing expression and metadata.
idx (str) – The key of the ROI in the unstructured data (uns) of the AnnData object.
- Returns:
The percentage of reads that were stitched.
- Return type:
float
- calc_percent_trimmed(adata, idx)¶
Calculate the percentage of trimmed reads for a specific ROI.
- Parameters:
adata (AnnData) – The AnnData object containing expression and metadata.
idx (str) – The key of the ROI in the unstructured data (uns) of the AnnData object.
- Returns:
The percentage of reads that were trimmed.
- Return type:
float
- calc_total_reads(adata, idx)¶
Calculate the total number of reads for a specific region of interest (ROI).
- Parameters:
adata (AnnData) – The AnnData object containing expression and metadata.
idx (str) – The key of the ROI in the unstructured data (uns) of the AnnData object.
- Returns:
The sum of all reads for the specified ROI.
- Return type:
int
- check_metric(metric_name, threshold, calc_function, unit='%')¶
Evaluates a specific quality control (QC) metric for each segment and identifies segments that fail.
This method applies a calculation function (calc_function) to compute a QC metric for each region of interest (ROI) in the dataset. It compares the computed values against a specified threshold to determine which segments pass or fail the QC check.
- Parameters:
metric_name (str) – Name of the metric being checked (e.g., “Percent Trimmed”, “Total Reads”).
threshold (float or int) – Minimum acceptable value for the metric. Segments with values below this threshold fail the QC check.
calc_function (callable) – A function that calculates the metric for a given segment. It should take the adata object and a segment identifier (idx) as inputs and return the computed value.
unit (str, optional) – Unit of the metric for display purposes (default is “%”).
- Returns:
A set of segment identifiers (ROIs) that pass the QC check.
- Return type:
set
- Raises:
Exception – If the calc_function encounters an error during computation.
Notes
This method iterates over all segment identifiers (roi_metadata) in the dataset.
Segments that fail the QC check are printed with a warning message, displaying the metric value and the threshold.
The progress and percentage of passing segments are displayed using _print_progress.
Examples
Define a metric calculation function:
def calc_total_reads(adata, idx): return adata[idx].sum()
Check the “Total Reads” metric:
passed_segments = qc.check_metric( metric_name="Total Reads", threshold=1000, calc_function=calc_total_reads, unit="reads" ) print("Segments passing QC:", passed_segments)
- static filter_by_negativeProbes(adata, negative_probe_cutoff=1.1, save_negatives=False)¶
Filters genes based on their background ratios compared to negative probes.
- Parameters:
adata (AnnData) – The AnnData object containing gene expression data.
cutoff (float, optional) – The threshold for filtering genes based on their background ratios. Genes with ratios below this threshold are removed (default is
1.1
).save_negatives (bool, optional) – If
True
, returns a second AnnData object containing only the negative probes (default isFalse
).
- Returns:
If save_negatives is
False
, returns the filtered AnnData object.If save_negatives is
True
, returns a tuple: (filtered AnnData object, AnnData object of negative probes).
- Return type:
AnnData or tuple of AnnData
- Raises:
ValueError – If the adata object does not contain the required “SystematicName” column.
Examples
Filter genes by negative probes and save the negative probes:
qc = QC() filtered_adata, negatives = qc.filter_by_negativeProbes(adata, cutoff=1.5, save_negatives=True)
Filter genes by negative probes without saving negatives:
qc = QC() filtered_adata = qc.filter_by_negativeProbes(adata, cutoff=1.2)
- plot_before_after_filtering()¶
Generate visualizations to compare data before and after QC filtering.
This method creates visualizations to show the differences in expression data before and after applying QC filters. The plots generated include:
Scatter plots of total expression sums per sample (before and after filtering).
Histograms of expression sum distributions (before and after filtering).
- Parameters:
None
- Returns:
The method generates and displays plots but does not return any data.
- Return type:
None
Notes
This method uses the run_all_checks method to filter the data based on QC metrics.
The raw and filtered AnnData objects are compared to highlight the impact of QC filtering.
The expression sums are computed across all samples for visualization.
Examples
qc = CountsQC(adata=my_adata) qc.plot_before_after_filtering()
- plot_qc_results()¶
Generate visualizations for Quality Control (QC) metrics.
This method generates a series of plots to visualize QC metrics across all regions of interest (ROIs) in the dataset. The visualizations include:
Bar plot showing the percentage of segments passing QC thresholds.
Histograms for each metric with optional threshold overlays.
A heatmap of QC failures across metrics.
A scatter plot comparing “Percent Trimmed” and “Percent Stitched” reads.
Thresholds for each metric are defined in the class attributes (e.g., minSegmentReads, percentTrimmed). Metrics without defined thresholds are visualized without overlays.
- Parameters:
None
- Returns:
This method generates and displays the plots but does not return any data.
- Return type:
None
Notes
- The metrics visualized include:
Total Reads
Percent Trimmed
Percent Stitched
Percent Aligned
Percent Saturation
Min Nuclei
Min Area
Metrics without valid data or missing thresholds are handled gracefully.
If thresholds are defined, they are indicated on the plots as dashed lines.
Example
qc = CountsQC(adata=my_adata) qc.plot_qc_results()
- run_all_checks(return_negative_probes=False, negative_probe_cutoff=None)¶
Run all QC checks and return filtered AnnData objects.
- Parameters:
return_negative_probes (bool, optional) – Whether to return an AnnData object containing only the negative probes (default is
False
).negative_probe_cutoff (float, optional) – The cutoff for filtering genes based on their background ratios compared to negative probes (default is
1.1
).
- Returns:
filtered_adata (AnnData) – AnnData object with only ROIs passing all QC checks.
negative_probes_adata (AnnData, optional) – AnnData object with only negative probes (if requested).
- write(filename: str = None, compression: Literal['gzip', 'lzf'] = None)¶
Write the Anndata object to disk.
- Parameters:
filename (str) – The name and the location of the file to write the Anndata object to.
compression (str, optional) – Compression strategy to use (‘gzip’ or ‘lzf’).
- Raises:
ValueError: – If the ‘filename’ is not provided or is invalid.