Given a start, end, and number of bins, to divide, this function will split the regions into bins. Bins will be only approximately the same size, due to rounding. (they should not be more than 1 different).

binRegion(start, end, bins, idDF = NULL, strand = "*")

Arguments

start

Coordinate for beginning of range/range.

end

Coordinate for end of range/region.

bins

How many bins to divide this range/region.

idDF

A string/vector of strings that has chromosome (e.g. "chr1") for given start and end values

strand

"strand" column of the data.table (or single strand value if binRegion is only used on one region). Default is "*".

Value

A data.table, expanded to nrow = number of bins, with these id columns: id: region ID binID: repeating ID (this is the value to aggregate across) ubinID: unique bin IDs

Details

Use case: take a set of regions, like CG islands, and bin them; now you can aggregate signal scores across the bins, giving you an aggregate signal in bins across many regions of the same type.

In theory, this just runs on 3 values, but you can run it inside a data.table j expression to divide a bunch of regions in the same way.

Examples

library(data.table) start = c(100, 1000, 3000) end = c(500, 1400, 3400) chr = c("chr1", "chr1", "chr2") strand = c("*", "*", "*") # strand not included in object # since MIRA assumes "*" already unless given something else regionsToBinDT = data.table(chr, start, end) numberOfBins = 15 # data.table "j command" using column names and numberOfBins variable binnedRegionDT = regionsToBinDT[, binRegion(start, end, numberOfBins, chr)]