Given a start, end, and number of bins, to divide, this function will split the regions into bins. Bins will be only approximately the same size, due to rounding. (they should not be more than 1 different).
binRegion(start, end, bins, idDF = NULL, strand = "*")
start | Coordinate for beginning of range/range. |
---|---|
end | Coordinate for end of range/region. |
bins | How many bins to divide this range/region. |
idDF | A string/vector of strings that has chromosome (e.g. "chr1") for given start and end values |
strand | "strand" column of the data.table (or single strand value if binRegion is only used on one region). Default is "*". |
A data.table, expanded to nrow = number of bins, with these id columns: id: region ID binID: repeating ID (this is the value to aggregate across) ubinID: unique bin IDs
Use case: take a set of regions, like CG islands, and bin them; now you can aggregate signal scores across the bins, giving you an aggregate signal in bins across many regions of the same type.
In theory, this just runs on 3 values, but you can run it inside a data.table j expression to divide a bunch of regions in the same way.
library(data.table) start = c(100, 1000, 3000) end = c(500, 1400, 3400) chr = c("chr1", "chr1", "chr2") strand = c("*", "*", "*") # strand not included in object # since MIRA assumes "*" already unless given something else regionsToBinDT = data.table(chr, start, end) numberOfBins = 15 # data.table "j command" using column names and numberOfBins variable binnedRegionDT = regionsToBinDT[, binRegion(start, end, numberOfBins, chr)]