Normalization

A class for performing normalization on data using various methods and plotting PCA.

class opennano.normalization.Normalization(adata)

Bases: object

A class for performing normalization on data using various methods and plotting PCA.

norm_CPM(adata, new_layer=True):

Performs Counts Per Million normalization.

norm_log1p(adata, new_layer=True):

Applies log(1 + x) transformation to the data.

norm_MedianRatio(adata, new_layer="MedianRatio_norm"):

Normalizes data using the median ratio method.

norm_VST(adata, size_factors=None, dispersion=None, new_layer="VST_norm", layer=None):

Performs Variance Stabilizing Transformation (VST) normalization.

norm_Quantile(adata, layer=None, new_layer="Quantile_norm"):

Normalizes data using quantile normalization.

variance(adata, layer="Test", n_genes=500):

Computes variance and identifies highly variable genes.

apply_vst_with_base_norm(adata, choice="median", new_layer="VST_norm"):

Applies VST normalization with a specified base normalization method.

plot_pca(adata, layer="Test", n_genes=500):

Plots PCA on the top variable genes.

__init__(adata)

Initialize the Normalization class with the provided AnnData object.

Parameters:

adata (AnnData) – The input AnnData object to normalize and analyze.

apply_vst_with_base_norm(choice='median', new_layer='VST_norm')

Apply Variance Stabilizing Transformation (VST) on data with a specified base normalization.

This method allows the user to apply a base normalization method (e.g., CPM, log1p, quantile, or median ratio) before performing VST. The results are saved as a new layer in the AnnData object.

Parameters:
  • choice (str, optional) – The base normalization method to apply before VST. Options are: - "cpm": Counts Per Million normalization. - "log1p": Log(1 + x) transformation. - "quantile": Quantile normalization. - "median": Median ratio normalization (default).

  • new_layer (str, optional) – Name of the new layer to store the VST-normalized data. The default is "VST_norm".

Returns:

The AnnData object with the VST-normalized data added as a new layer.

Return type:

AnnData

Raises:

ValueError – If an invalid choice is provided.

Examples

Apply VST normalization with median ratio as the base normalization:

normalization = Normalization(adata)
normalization.apply_vst_with_base_norm(choice="median", new_layer="VST_norm")
# Output: VST normalization applied on base normalization 'median' and saved as layer 'VST_norm'

Apply VST normalization with quantile normalization as the base:

normalization = Normalization(adata)
normalization.apply_vst_with_base_norm(choice="quantile", new_layer="Quantile_VST_norm")
norm_CPM(new_layer=True)

Perform Counts Per Million (CPM) normalization on the AnnData object.

This method normalizes raw counts data to CPM values, which represent counts per million reads for each feature, adjusted by the total library size.

Parameters:

new_layer (bool, optional) – If True, saves the CPM-normalized data as a new layer named "CPM_norm" in the AnnData object. If False, returns the CPM-normalized data as a pandas DataFrame (default is True).

Returns:

  • If new_layer is False, returns a pandas DataFrame containing the CPM-normalized data.

  • If new_layer is True, returns None and stores the normalized data in the AnnData object’s layers as "CPM_norm".

Return type:

pandas.DataFrame or None

Raises:

ValueError – If the specified AnnData object does not contain raw counts data.

Examples

Normalize data and save as a new layer:

normalization = Normalization(adata)
normalization.norm_CPM(new_layer=True)
# Output: Added layer 'CPM_norm' to:
#         AnnData object with n_obs × n_vars = 100 × 200

Return normalized data as a DataFrame:

normalization = Normalization(adata)
cpm_data = normalization.norm_CPM(new_layer=False)
print(cpm_data.head())
norm_MedianRatio(new_layer='MedianRatio_norm')

Normalize data using the median ratio method.

This method computes the geometric mean for each feature (gene) and normalizes the data by dividing each value by its respective median across samples.

Parameters:

new_layer (str or None, optional) – Name of the new layer to store the median ratio-normalized data. If None, the normalized data is returned as a pandas DataFrame without modifying the AnnData object (default is "MedianRatio_norm").

Returns:

  • If new_layer is None, returns a pandas DataFrame containing the median ratio-normalized data.

  • If new_layer is specified, returns None and stores the normalized data in the AnnData object’s layers.

Return type:

pandas.DataFrame or None

Raises:

ValueError – If the input data is not a valid AnnData object or does not contain raw counts data.

Examples

Normalize data using the median ratio method and save as a new layer:

normalization = Normalization(adata)
normalization.norm_MedianRatio(new_layer="MedianRatio_norm")
# Output: Added layer 'MedianRatio_norm' to:
#         AnnData object with n_obs × n_vars = 100 × 200

Return the normalized data as a DataFrame:

normalization = Normalization(adata)
median_ratio_data = normalization.norm_MedianRatio(new_layer=None)
print(median_ratio_data.head())
norm_Quantile(layer=None, new_layer='Quantile_norm')

Perform quantile normalization.

This method normalizes the data by ranking the values and replacing each value with the mean value of its rank across all samples.

Parameters:
  • layer (str or None, optional) – Name of the existing layer in AnnData to apply quantile normalization to. If None, the raw counts (adata.X) are used (default is None).

  • new_layer (str or None, optional) – Name of the new layer to store the quantile-normalized data. If None, the normalized data replaces the raw counts in adata.X (default is "Quantile_norm").

Returns:

Normalized data is stored in the specified new_layer or replaces the raw counts in adata.X.

Return type:

None

Raises:

ValueError – If the specified layer is not found in the AnnData object.

Examples

Normalize data using quantile normalization and save as a new layer:

normalization = Normalization(adata)
normalization.norm_Quantile(new_layer="Quantile_norm")
# Output: Added layer 'Quantile_norm' to:
#         AnnData object with n_obs × n_vars = 100 × 200

Replace raw counts with quantile-normalized data:

normalization = Normalization(adata)
normalization.norm_Quantile(new_layer=None)
print(adata.X)
norm_VST(size_factors=None, dispersion=None, new_layer='VST_norm', layer=None)

Perform Variance Stabilizing Transformation (VST) normalization.

This method stabilizes the variance of normalized count data. It first normalizes raw counts using size factors, optionally computes dispersion estimates, and then applies the variance stabilizing transformation.

Parameters:
  • size_factors (array-like, optional) – Pre-computed size factors for normalization. If None, size factors are computed as the sum of counts divided by the median library size (default is None).

  • dispersion (array-like, optional) – Dispersion estimates for variance stabilization. If None, dispersion is estimated as the inverse square root of the mean counts (default is None).

  • new_layer (str or None, optional) – Name of the new layer to store the VST-normalized data. If None, the normalized data is returned as a pandas DataFrame without modifying the AnnData object (default is "VST_norm").

  • layer (str or None, optional) – Name of the existing layer in AnnData to apply VST normalization to. If None, the default raw counts from the AnnData object are used (default is None).

Returns:

  • If new_layer is None, returns a pandas DataFrame containing the

    VST-normalized data.

  • If new_layer is specified, returns None and stores the normalized

    data in the AnnData object’s layers.

Return type:

pandas.DataFrame or None

Raises:

ValueError – If layer is not found in the AnnData object or the input data is invalid.

Examples

Normalize data using VST with default parameters and save as a new layer:

normalization = Normalization(adata)
normalization.norm_VST(new_layer="VST_norm")
# Output: Added layer 'VST_norm' to:
#         AnnData object with n_obs × n_vars = 100 × 200

Return the VST-normalized data as a DataFrame:

normalization = Normalization(adata)
vst_data = normalization.norm_VST(new_layer=None)
print(vst_data.head())
norm_log1p(new_layer=True)

Apply log(1 + x) transformation to CPM-normalized data.

This method first performs CPM normalization on the data (if not already done), then applies a log(1 + x) transformation.

Parameters:

new_layer (bool, optional) – If True, saves the log-transformed data as a new layer named "log1p_norm" in the AnnData object. If False, creates the log-transformed data as a pandas DataFrame (default is True).

Returns:

  • If new_layer is False, returns a pandas DataFrame containing the log-transformed data.

  • If new_layer is True, returns None and stores the transformed data in the AnnData object’s layers as "log1p_norm".

Return type:

pandas.DataFrame or None

Raises:

ValueError – If the input data is not a valid AnnData object or does not contain raw counts data.

Examples

Apply log(1 + x) transformation and save as a new layer:

normalization = Normalization(adata)
normalization.norm_log1p(new_layer=True)
# Output: Added layer 'log1p_norm' to:
#         AnnData object with n_obs × n_vars = 100 × 200

Return log-transformed data as a DataFrame:

normalization = Normalization(adata)
log1p_data = normalization.norm_log1p(new_layer=False)
print(log1p_data.head())
plot_density(layer=None, title=None, figsize=(10, 6), log_scale=False)

Generate density plots to visualize expression distributions across samples for a given normalization.

Parameters:
  • adata (AnnData) – The AnnData object containing gene expression data.

  • layer (str, optional) – Name of the layer in AnnData to plot. If None, uses adata.X (default is None).

  • title (str, optional) – Title of the plot (default is None).

  • figsize (tuple, optional) – Size of the plot (default is (10, 6)).

  • log_scale (bool, optional) – If True, applies a log10 transformation to the expression data before plotting (default is False).

Returns:

Displays the density plot.

Return type:

None

plot_pca(layer='VST_norm', n_genes=500)

Perform PCA on the most variable genes and generate a PCA plot.

This method computes PCA on the specified layer of the AnnData object using the top n_genes most variable features. It requires that a specified layer (e.g., a VST-normalized layer) is available in the AnnData object.

Parameters:
  • layer (str, optional) – Name of the layer in AnnData to perform PCA on. The default is "VST_norm". The layer must contain pre-normalized data (e.g., via VST).

  • n_genes (int, optional) – Number of top variable features to use for PCA computation. The default is 500.

Returns:

The PCA plot is displayed directly.

Return type:

None

Raises:

ValueError – If the specified layer is not found in the AnnData object.

Examples

Generate a PCA plot using the VST-normalized layer:

normalization = norm_VST(adata)
normalization.plot_pca(layer="VST_norm", n_genes=500)

Generate a PCA plot using a custom normalized layer:

normalization = norm_VST(adata)
normalization.plot_pca(layer="Quantile_norm", n_genes=300)
variance(adata=None, layer=None, n_genes=500)

Compute variance for each feature and identify highly variable genes.

This method calculates the variance across all samples for each feature (gene) in the specified layer and flags the top n_genes as highly variable.

Parameters:
  • layer (str or None, optional) – Name of the existing layer in AnnData to compute variance from. If None, the default raw counts (adata.X) are used (default is None).

  • n_genes (int, optional) – Number of top variable features to mark as highly variable (default is 500).

Returns:

The variance is added to adata.var["vars"], and highly variable genes are flagged in adata.var["highly_variable"].

Return type:

None

Raises:

ValueError – If the specified layer is not found in the AnnData object.

Examples

Compute variance and mark highly variable genes:

normalization = Normalization(adata)
normalization.variance(layer="Quantile_norm", n_genes=300)
print(adata.var.head())