Parses the x/y format of methylation calls, splitting them into individual columns: "methylCount" column for number of methylated reads for site and "coverage" column for total number of reads covering that site. Input files should have the following columns: "chr", "start", "end", "meth", "rate", "strand".

BSreadBiSeq(files, contrastList = NULL,
  sampleNames = tools::file_path_sans_ext(basename(files)), cores = 4,
  returnAsList = FALSE)

Arguments

files

a list of filenames (use parseInputArg if necessary)

contrastList

Generally not needed for MIRA. A list of named character vectors, each with length equal to the number of items in files. These will translate into column names in the final table.

sampleNames

a vector of length length(files), name for each file.

cores

number of processors.

returnAsList

Whether to return the output as a list or as one big data.table.

Value

Data from each input file joined together into one big data.table. If returnAsList = TRUE, then input from each file will be in its own data.table in a list.

Details

This can run into memory problems if there are too many files... because of the way parallel lacks long vector support. The solution is to just use a single core; or to pass mc.preschedule = FALSE; This makes it so that each file is processed as a separate job. Much better.

Examples

shortBSDTFile = system.file("extdata", "shortRRBS.bed", package = "MIRA") shortBSDT = BSreadBiSeq(shortBSDTFile)
#> Reading 1 files..
#> File reading finished (1 files). Parsing Biseq format...
#> .
#> Parsing complete, building final tables and cleaning up...