Given a start coordinate, end coordinate, and number of bins to divide, this function will split the regions into that many bins. Bins will be only approximately the same size, due to rounding. (they should not be more than 1 different).

binRegion(start, end, binSize = NULL, binCount = NULL, indicator = NULL)



The starting coordinate


The ending coordinate


The size of bin to divide the genome into. You must supply either binSize (priority) or binCount.


The number of bins to divide. If you do not supply binSize, you must supply binCount, which will be used to calculate the binSize.


A vector with identifiers to keep with your bins, in case you are doing this on a long table with multiple segments concatenated


A data.table, expanded to nrow = number of bins, with these id columns: id: region ID binID: repeating ID (this is the value to aggregate across) ubinID: unique bin IDs


Use case: take a set of regions, like CG islands, and bin them; now you can aggregate signal scores across the bins, giving you an aggregate signal in bins across many regions of the same type.

In theory, this just runs on 3 values, but you can run it inside a data.table j expression to divide a bunch of regions in the same way.


Rbins = binRegion(1, 3000, 100, 1000)