Introduction

Before you start see the Getting started with BiocProject vignette for the basic information and installation instructions.

Get paths to the files used in this vignette

library(BiocProject)
ProjectConfigArgs = system.file(
  "extdata",
  "example_peps-master",
  "example_BiocProject",
  "project_config_resize.yaml",
  package = "BiocProject"
)

readBedFiles_resize =  system.file(
  "extdata",
  "example_peps-master",
  "example_BiocProject",
  "readBedFiles_resize.R",
  package = "BiocProject"
)

Ways to provide addtional arguments

What if your custom data processing function requires more arguments than just a PEP?

For reference consider the readBedFiles_resize.R function and its interface. This function additionally requires the resize.width argument.

function (project, resize.width) 
{
    cwd = getwd()
    paths = pepr::sampleTable(project)$file_path
    sampleNames = pepr::sampleTable(project)$sample_name
    setwd(dirname(project@file))
    result = lapply(paths, function(x) {
        df = read.table(x)
        colnames(df) = c("chr", "start", "end")
        gr = GenomicRanges::resize(GenomicRanges::GRanges(df), 
            width = resize.width)
    })
    setwd(cwd)
    names(result) = sampleNames
    return(GenomicRanges::GRangesList(result))
}

There are a few ways to enable your function to get multiple arguments - not just a PEP (pepr::Project) object, which is the basic scenario.

The options:

  • additional section in the config file
  • using funcArgs argument of BiocProject function
  • using an anonymous function in the func argument of BiocProject function

How to provide addtional section in the config file

The easiest way to provide addtional arguments to your data reading/processing function is to add addtional section in the config file. See the config file below for reference:

   pep_version: 2.0.0
   sample_table: sample_table.csv
   bioconductor:
      readFunName: readBedFiles_resize
      readFunPath: readBedFiles_resize.R
      funcArgs:
          resize.width: 100

The section funcArgs was added within the bioconductor section.

bp = BiocProject(ProjectConfigArgs)
#> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject/project_config_resize.yaml
#> Used function 'readBedFiles_resize' from the environment
bp
#> GRangesList object of length 2:
#> $laminB1Lads
#> GRanges object with 1302 ranges and 0 metadata columns:
#>          seqnames              ranges strand
#>             <Rle>           <IRanges>  <Rle>
#>      [1]     chr1   11401198-11401297      *
#>      [2]     chr1   14877629-14877728      *
#>      [3]     chr1   18229570-18229669      *
#>      [4]     chr1   29618442-29618541      *
#>      [5]     chr1   33943885-33943984      *
#>      ...      ...                 ...    ...
#>   [1298]     chrX 154066672-154066771      *
#>   [1299]     chrY     2880166-2880265      *
#>   [1300]     chrY   15047033-15047132      *
#>   [1301]     chrY   15603977-15604076      *
#>   [1302]     chrY   16966225-16966324      *
#>   -------
#>   seqinfo: 24 sequences from an unspecified genome; no seqlengths
#> 
#> $vistaEnhancers
#> GRanges object with 1339 ranges and 0 metadata columns:
#>          seqnames              ranges strand
#>             <Rle>           <IRanges>  <Rle>
#>      [1]     chr1     3190581-3190680      *
#>      [2]     chr1     8130439-8130538      *
#>      [3]     chr1   10593123-10593222      *
#>      [4]     chr1   10732070-10732169      *
#>      [5]     chr1   10757664-10757763      *
#>      ...      ...                 ...    ...
#>   [1335]     chrX 139380916-139381015      *
#>   [1336]     chrX 139593502-139593601      *
#>   [1337]     chrX 139674499-139674598      *
#>   [1338]     chrX 147829016-147829115      *
#>   [1339]     chrX 150407692-150407791      *
#>   -------
#>   seqinfo: 24 sequences from an unspecified genome; no seqlengths

How to use the funcArgs argument

Provide additional funcArgs argument to the BiocProject function. This argument has to be a named list. The names have to correspond to the argument names of your function. The PEP will be passed to your function by default. For example:

Read the function into R environment and run the BiocProject function with the funcArgs argument

source(readBedFiles_resize)
bpArgs =  BiocProject(file=ProjectConfigArgs, funcArgs=list(resize.width=200))
#> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject/project_config_resize.yaml
#> Used function 'readBedFiles_resize' from the environment
bpArgs
#> GRangesList object of length 2:
#> $laminB1Lads
#> GRanges object with 1302 ranges and 0 metadata columns:
#>          seqnames              ranges strand
#>             <Rle>           <IRanges>  <Rle>
#>      [1]     chr1   11401198-11401397      *
#>      [2]     chr1   14877629-14877828      *
#>      [3]     chr1   18229570-18229769      *
#>      [4]     chr1   29618442-29618641      *
#>      [5]     chr1   33943885-33944084      *
#>      ...      ...                 ...    ...
#>   [1298]     chrX 154066672-154066871      *
#>   [1299]     chrY     2880166-2880365      *
#>   [1300]     chrY   15047033-15047232      *
#>   [1301]     chrY   15603977-15604176      *
#>   [1302]     chrY   16966225-16966424      *
#>   -------
#>   seqinfo: 24 sequences from an unspecified genome; no seqlengths
#> 
#> $vistaEnhancers
#> GRanges object with 1339 ranges and 0 metadata columns:
#>          seqnames              ranges strand
#>             <Rle>           <IRanges>  <Rle>
#>      [1]     chr1     3190581-3190780      *
#>      [2]     chr1     8130439-8130638      *
#>      [3]     chr1   10593123-10593322      *
#>      [4]     chr1   10732070-10732269      *
#>      [5]     chr1   10757664-10757863      *
#>      ...      ...                 ...    ...
#>   [1335]     chrX 139380916-139381115      *
#>   [1336]     chrX 139593502-139593701      *
#>   [1337]     chrX 139674499-139674698      *
#>   [1338]     chrX 147829016-147829215      *
#>   [1339]     chrX 150407692-150407891      *
#>   -------
#>   seqinfo: 24 sequences from an unspecified genome; no seqlengths

The funcArgs argument gets a one element list and passes the resize.width argument to your custom data processing function. If any arguments are present in the config file, they will be overwritten (the width of the ranges has changed from 100 to 200 in the example above).

How to use an anonymous function

You can use an anonymous function (that is implemented in the BiocProject function call) to provide additional arguments to your function of interest. For example:

bpAnonymous = BiocProject(file=ProjectConfigArgs, func=function(x){
      readBedFiles_resize(project=x, resize.width=100)
    }
  )
#> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject/project_config_resize.yaml
#> Used function from the 'func' argument
#Inspect it
bpAnonymous
#> GRangesList object of length 2:
#> $laminB1Lads
#> GRanges object with 1302 ranges and 0 metadata columns:
#>          seqnames              ranges strand
#>             <Rle>           <IRanges>  <Rle>
#>      [1]     chr1   11401198-11401297      *
#>      [2]     chr1   14877629-14877728      *
#>      [3]     chr1   18229570-18229669      *
#>      [4]     chr1   29618442-29618541      *
#>      [5]     chr1   33943885-33943984      *
#>      ...      ...                 ...    ...
#>   [1298]     chrX 154066672-154066771      *
#>   [1299]     chrY     2880166-2880265      *
#>   [1300]     chrY   15047033-15047132      *
#>   [1301]     chrY   15603977-15604076      *
#>   [1302]     chrY   16966225-16966324      *
#>   -------
#>   seqinfo: 24 sequences from an unspecified genome; no seqlengths
#> 
#> $vistaEnhancers
#> GRanges object with 1339 ranges and 0 metadata columns:
#>          seqnames              ranges strand
#>             <Rle>           <IRanges>  <Rle>
#>      [1]     chr1     3190581-3190680      *
#>      [2]     chr1     8130439-8130538      *
#>      [3]     chr1   10593123-10593222      *
#>      [4]     chr1   10732070-10732169      *
#>      [5]     chr1   10757664-10757763      *
#>      ...      ...                 ...    ...
#>   [1335]     chrX 139380916-139381015      *
#>   [1336]     chrX 139593502-139593601      *
#>   [1337]     chrX 139674499-139674598      *
#>   [1338]     chrX 147829016-147829115      *
#>   [1339]     chrX 150407692-150407791      *
#>   -------
#>   seqinfo: 24 sequences from an unspecified genome; no seqlengths