Visualize how much each region in a region set is associated with each PC. For each PC, the average absolute loading is calculated for each region in the region set. Then for a given PC, the average loading is converted to a percentile/quantile based on the distribution of all loadings for that PC. These values are plotted in a heatmap.

regionQuantileByPC(loadingMat, signalCoord, regionSet, rsName = "",
  PCsToAnnotate = paste0("PC", 1:5), maxRegionsToPlot = 8000,
  cluster_rows = TRUE, row_title = "Region", column_title = rsName,
  column_title_side = "top", cluster_columns = FALSE,
  name = "Percentile of Loading Scores in PC", col = c("skyblue",
  "yellow"), ...)

Arguments

loadingMat

matrix of loadings (the coefficients of the linear combination that defines each PC). One named column for each PC. One row for each original dimension/variable (should be same order as original data/signalCoord). The x$rotation output of prcomp().

signalCoord

a GRanges object or data frame with coordinates for the genomic signal/original data (eg DNA methylation) included in the PCA. Coordinates should be in the same order as the original data and the loadings (each item/row in signalCoord corresponds to a row in loadingMat). If a data.frame, must have chr and start columns. If end is included, start and end should be the same. Start coordinate will be used for calculations.

regionSet

A genomic ranges object with regions corresponding to the same biological annotation. These are the regions that will be visualized. Must be from the same reference genome as the coordinates for the actual data (signalCoord).

rsName

character vector. Names of the region sets in the same order as GRList. For use as a title for each heatmap.

PCsToAnnotate

A character vector with principal components to include. eg c("PC1", "PC2") These should be column names of loadingMat.

maxRegionsToPlot

how many top regions from region set to include in heatmap. Including too many may slow down computation and increase memory use. If regionSet has more regions than maxRegionsToPlot, a number of regions equal to maxRegionsToPlot will be randomly sampled from the region set and these regions will be plotted. Clustering rows is a major limiting factor on how long it takes to plot the regions so if you want to plot many regions, you can also set cluster_rows to FALSE.

cluster_rows

"logical" object, whether to cluster rows or not (may increase computation time significantly for large number of rows)

row_title

character object, row title

column_title

character object, column title

column_title_side

character object, where to put the column title: "top" or "bottom"

cluster_columns

"logical" object, whether to cluster columns. It is recommended to keep this as FALSE so it will be easier to compare PCs (with cluster_columns = FALSE, they will be in the same specified order in different heatmaps)

name

character object, legend title

col

a vector of colors or a color mapping function which will be passed to the ComplexHeatmap::Heatmap() function. See ?Heatmap (the "col" parameter) for more details.

...

optional parameters for ComplexHeatmap::Heatmap()

Value

a heatmap. Columns are PCs, rows are regions. This heatmap allows you to see if some regions are associated with certain PCs but not others. Also, you can see if a subset of regions in the region set are associated with PCs while another subset are not associated with any PCs To color each region, first the absolute loading values within that region are averaged. Then this average is compared to the distribution of absolute loading values for all individual genomic signal values to get a quantile/percentile for that region. Colors are based on this quantile/percentile. The output is a Heatmap object (ComplexHeatmap package).

Examples

data("brcaLoadings1") data("brcaMCoord1") data("esr1_chr1") data("brcaPCScores") regionByPCHM <- regionQuantileByPC(loadingMat = brcaLoadings1, signalCoord = brcaMCoord1, regionSet = esr1_chr1, rsName = "Estrogen Receptor Chr1", PCsToAnnotate=paste0("PC", 1:2), maxRegionsToPlot = 8000, cluster_rows = TRUE, cluster_columns = FALSE, column_title = rsName, name = "Percentile of Loading Scores in PC")