First bins regions and averages the proportion of methylation for all methylation sites within each bin (ie the methylation of all sites within region 1, bin 1 are averaged, then all sites within region 1, bin 2 are averaged, etc.) Then aggregates methylation across all regions by bin by averaging the proportion of methylation in each corresponding bin (ie all bin1's together, all bin2's together, etc.).
BSBinAggregate(BSDT, rangeDT, binNum, minReads = 500, byRegionGroup = TRUE, splitFactor = NULL, hasCoverage = TRUE)
BSDT | A single data table that has DNA methylation data on individual sites including a "chr" column with chromosome, a "start" column with the coordinate number for the cytosine, a "methylProp" column with proportion of methylation (0 to 1), optionally a "methylCount" column with number of methylated reads for each site, and optionally a "coverage" column with total number of reads for each site (hasCoverage param). |
---|---|
rangeDT | A data table with the sets of regions to be binned, with columns named "start", "end". Strand may also be given and will affect the output. See "Value" section. |
binNum | Number of bins across the region. |
minReads | Filter out bins with fewer than X reads before returning. |
byRegionGroup | Default TRUE will aggregate methylation over corresponding bins for each region (all bin1's aggregated, all bin2's, etc). byRegionGroup = FALSE is deprecated. |
splitFactor | With default NULL, aggregation will be done separately/individually for each sample. |
hasCoverage | Default TRUE. Whether there is a coverage column |
With splitFactor = NULL, it will return a data.table with binNum rows, containing aggregated methylation data over regions in region set "rangeDT". Each region was split into bins; methylation was put in these bins; Output contains sum of the all corresponding bins for the regions of each region set ie for all regions in each region set: first bins summed, second bins summed, etc. Columns of the output should be "bin", "methylProp", and "coverage" ########################################################################### Info about how strand of rangeDT affects output: The MIRA signature will be symmetrical if no strand information is given for the regions (produced by averaging the signature with the reverse of the signature), because the orientation of the regions is arbitrary with respect to biological features (like a promoter for instance) that could be oriented directionally (e.g. 5' to 3'). If strand information is given, regions on the minus strand will be flipped before being aggregated with plus strand regions so the MIRA signature will be in 5' to 3' orientation. ###########################################################################
data("exampleBSDT") # exampleBSDT data("exampleRegionSet") # exampleRegionSet exampleBSDT = addMethPropCol(exampleBSDT) aggregateBins = BSBinAggregate(BSDT = exampleBSDT, rangeDT = exampleRegionSet, binNum = 11, splitFactor = NULL)