Given a start coordinate, end coordinate, and number of bins to divide, this function will split the regions into that many bins. Bins will be only approximately the same size, due to rounding. (they should not be more than 1 different).

binRegion(start, end, binSize = NULL, binCount = NULL, indicator = NULL)

Arguments

start

The starting coordinate

end

The ending coordinate

binSize

The size of bin to divide the genome into. You must supply either binSize (priority) or binCount.

binCount

The number of bins to divide. If you do not supply binSize, you must supply binCount, which will be used to calculate the binSize.

indicator

A vector with identifiers to keep with your bins, in case you are doing this on a long table with multiple segments concatenated

Value

A data.table, expanded to nrow = number of bins, with these id columns: id: region ID binID: repeating ID (this is the value to aggregate across) ubinID: unique bin IDs

Details

Use case: take a set of regions, like CG islands, and bin them; now you can aggregate signal scores across the bins, giving you an aggregate signal in bins across many regions of the same type.

In theory, this just runs on 3 values, but you can run it inside a data.table j expression to divide a bunch of regions in the same way.

Examples

Rbins = binRegion(1, 3000, 100, 1000)