Parses the x/y format of methylation calls, splitting them into individual columns: "methylCount" column for number of methylated reads for site and "coverage" column for total number of reads covering that site. Input files should have the following columns: "chr", "start", "end", "meth", "rate", "strand".
BSreadBiSeq(files, contrastList = NULL, sampleNames = tools::file_path_sans_ext(basename(files)), cores = 4, returnAsList = FALSE)
files | a list of filenames (use parseInputArg if necessary) |
---|---|
contrastList | Generally not needed for MIRA. A list of named character vectors, each with length equal to the number of items in files. These will translate into column names in the final table. |
sampleNames | a vector of length length(files), name for each file. |
cores | number of processors. |
returnAsList | Whether to return the output as a list or as one big data.table. |
Data from each input file joined together into one big data.table. If returnAsList = TRUE, then input from each file will be in its own data.table in a list.
This can run into memory problems if there are too many files... because of the way parallel lacks long vector support. The solution is to just use a single core; or to pass mc.preschedule = FALSE; This makes it so that each file is processed as a separate job. Much better.
shortBSDTFile = system.file("extdata", "shortRRBS.bed", package = "MIRA") shortBSDT = BSreadBiSeq(shortBSDTFile)#>#>#>#>