../code.databio.org/vignettes/vignette2multipleArguments.Rmd
vignette2multipleArguments.Rmd
Before you start see the Getting started with BiocProject
vignette for the basic information and installation instructions.
Get paths to the files used in this vignette
library(BiocProject) ProjectConfigArgs = system.file( "extdata", "example_peps-master", "example_BiocProject", "project_config_resize.yaml", package = "BiocProject" ) readBedFiles_resize = system.file( "extdata", "example_peps-master", "example_BiocProject", "readBedFiles_resize.R", package = "BiocProject" )
What if your custom data processing function requires more arguments than just a PEP?
For reference consider the readBedFiles_resize.R
function and its interface. This function additionally requires the resize.width
argument.
function (project, resize.width)
{
cwd = getwd()
paths = pepr::sampleTable(project)$file_path
sampleNames = pepr::sampleTable(project)$sample_name
setwd(dirname(project@file))
result = lapply(paths, function(x) {
df = read.table(x)
colnames(df) = c("chr", "start", "end")
gr = GenomicRanges::resize(GenomicRanges::GRanges(df),
width = resize.width)
})
setwd(cwd)
names(result) = sampleNames
return(GenomicRanges::GRangesList(result))
}
There are a few ways to enable your function to get multiple arguments - not just a PEP (pepr::Project
) object, which is the basic scenario.
The options:
funcArgs
argument of BiocProject
functionfunc
argument of BiocProject
functionThe easiest way to provide addtional arguments to your data reading/processing function is to add addtional section in the config file. See the config file below for reference:
pep_version: 2.0.0
sample_table: sample_table.csv
bioconductor:
readFunName: readBedFiles_resize
readFunPath: readBedFiles_resize.R
funcArgs:
resize.width: 100
The section funcArgs
was added within the bioconductor
section.
bp = BiocProject(ProjectConfigArgs) #> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject/project_config_resize.yaml #> Used function 'readBedFiles_resize' from the environment bp #> GRangesList object of length 2: #> $laminB1Lads #> GRanges object with 1302 ranges and 0 metadata columns: #> seqnames ranges strand #> <Rle> <IRanges> <Rle> #> [1] chr1 11401198-11401297 * #> [2] chr1 14877629-14877728 * #> [3] chr1 18229570-18229669 * #> [4] chr1 29618442-29618541 * #> [5] chr1 33943885-33943984 * #> ... ... ... ... #> [1298] chrX 154066672-154066771 * #> [1299] chrY 2880166-2880265 * #> [1300] chrY 15047033-15047132 * #> [1301] chrY 15603977-15604076 * #> [1302] chrY 16966225-16966324 * #> ------- #> seqinfo: 24 sequences from an unspecified genome; no seqlengths #> #> $vistaEnhancers #> GRanges object with 1339 ranges and 0 metadata columns: #> seqnames ranges strand #> <Rle> <IRanges> <Rle> #> [1] chr1 3190581-3190680 * #> [2] chr1 8130439-8130538 * #> [3] chr1 10593123-10593222 * #> [4] chr1 10732070-10732169 * #> [5] chr1 10757664-10757763 * #> ... ... ... ... #> [1335] chrX 139380916-139381015 * #> [1336] chrX 139593502-139593601 * #> [1337] chrX 139674499-139674598 * #> [1338] chrX 147829016-147829115 * #> [1339] chrX 150407692-150407791 * #> ------- #> seqinfo: 24 sequences from an unspecified genome; no seqlengths
funcArgs
argumentProvide additional funcArgs
argument to the BiocProject
function. This argument has to be a named list. The names have to correspond to the argument names of your function. The PEP will be passed to your function by default. For example:
Read the function into R environment and run the BiocProject
function with the funcArgs
argument
source(readBedFiles_resize) bpArgs = BiocProject(file=ProjectConfigArgs, funcArgs=list(resize.width=200)) #> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject/project_config_resize.yaml #> Used function 'readBedFiles_resize' from the environment bpArgs #> GRangesList object of length 2: #> $laminB1Lads #> GRanges object with 1302 ranges and 0 metadata columns: #> seqnames ranges strand #> <Rle> <IRanges> <Rle> #> [1] chr1 11401198-11401397 * #> [2] chr1 14877629-14877828 * #> [3] chr1 18229570-18229769 * #> [4] chr1 29618442-29618641 * #> [5] chr1 33943885-33944084 * #> ... ... ... ... #> [1298] chrX 154066672-154066871 * #> [1299] chrY 2880166-2880365 * #> [1300] chrY 15047033-15047232 * #> [1301] chrY 15603977-15604176 * #> [1302] chrY 16966225-16966424 * #> ------- #> seqinfo: 24 sequences from an unspecified genome; no seqlengths #> #> $vistaEnhancers #> GRanges object with 1339 ranges and 0 metadata columns: #> seqnames ranges strand #> <Rle> <IRanges> <Rle> #> [1] chr1 3190581-3190780 * #> [2] chr1 8130439-8130638 * #> [3] chr1 10593123-10593322 * #> [4] chr1 10732070-10732269 * #> [5] chr1 10757664-10757863 * #> ... ... ... ... #> [1335] chrX 139380916-139381115 * #> [1336] chrX 139593502-139593701 * #> [1337] chrX 139674499-139674698 * #> [1338] chrX 147829016-147829215 * #> [1339] chrX 150407692-150407891 * #> ------- #> seqinfo: 24 sequences from an unspecified genome; no seqlengths
The funcArgs
argument gets a one element list and passes the resize.width
argument to your custom data processing function. If any arguments are present in the config file, they will be overwritten (the width of the ranges has changed from 100 to 200 in the example above).
You can use an anonymous function (that is implemented in the BiocProject
function call) to provide additional arguments to your function of interest. For example:
bpAnonymous = BiocProject(file=ProjectConfigArgs, func=function(x){ readBedFiles_resize(project=x, resize.width=100) } ) #> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject/project_config_resize.yaml #> Used function from the 'func' argument #Inspect it bpAnonymous #> GRangesList object of length 2: #> $laminB1Lads #> GRanges object with 1302 ranges and 0 metadata columns: #> seqnames ranges strand #> <Rle> <IRanges> <Rle> #> [1] chr1 11401198-11401297 * #> [2] chr1 14877629-14877728 * #> [3] chr1 18229570-18229669 * #> [4] chr1 29618442-29618541 * #> [5] chr1 33943885-33943984 * #> ... ... ... ... #> [1298] chrX 154066672-154066771 * #> [1299] chrY 2880166-2880265 * #> [1300] chrY 15047033-15047132 * #> [1301] chrY 15603977-15604076 * #> [1302] chrY 16966225-16966324 * #> ------- #> seqinfo: 24 sequences from an unspecified genome; no seqlengths #> #> $vistaEnhancers #> GRanges object with 1339 ranges and 0 metadata columns: #> seqnames ranges strand #> <Rle> <IRanges> <Rle> #> [1] chr1 3190581-3190680 * #> [2] chr1 8130439-8130538 * #> [3] chr1 10593123-10593222 * #> [4] chr1 10732070-10732169 * #> [5] chr1 10757664-10757763 * #> ... ... ... ... #> [1335] chrX 139380916-139381015 * #> [1336] chrX 139593502-139593601 * #> [1337] chrX 139674499-139674598 * #> [1338] chrX 147829016-147829115 * #> [1339] chrX 150407692-150407791 * #> ------- #> seqinfo: 24 sequences from an unspecified genome; no seqlengths