Introduction

Before you start see the Getting started with BiocProject vignette for the basic information and installation instructions.

How to download the data with your function

There is no limit to the data processing function complexity. For example, the function can retrieve the data from a remote source and then process it.

For reference consider the readRemoteData.R function

function (project) 
{
    url = pepr::sampleTable(project)$remote_url[[1]]
    bfc = BiocFileCache::BiocFileCache(cache = tempdir(), ask = FALSE)
    path = BiocFileCache::bfcrpath(bfc, url)
    df = read.table(path)
    colnames(df) = c("chr", "start", "end", "name")
    GenomicRanges::GRanges(df)
}

and the PEP that it uses:

  • sample annotation sheet
sample_name remote_url
encodeRegions http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/encodeRegions
  • project config file
Registered S3 method overwritten by 'pryr':
  method      from
  print.bytes Rcpp
 pep_version: 2.0.0
 sample_table: sample_table.csv
 bioconductor:
    readFunName: readRemoteData
    readFunPath: readRemoteData.R

Execute the BiocProject function

Get path to the config file

library(BiocProject)
ProjectConfigRemote = system.file(
  "extdata",
  "example_peps-master",
  "example_BiocProject_remote",
  "project_config.yaml",
  package = "BiocProject"
)

Run the BiocProject function. Creates an object returned with the data processing function with a PEP in its metadata slot:

bpRemote = BiocProject(file=ProjectConfigRemote)
#> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/project_config.yaml
#> Function 'readRemoteData' read from file '/tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/readRemoteData.R'
#> adding rname 'http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/encodeRegions'

With this simple line of code:

  1. the project metadata were read
  2. data processing function was read into R envrionment
  3. data were downloaded from the remote source and processed
  4. everything was conveniently stored in the created object

Let’s inspect the results:

bpRemote
#> GRanges object with 44 ranges and 1 metadata column:
#>        seqnames              ranges strand |        name
#>           <Rle>           <IRanges>  <Rle> | <character>
#>    [1]     chr1 151158060-151658060      * |      ENr231
#>    [2]    chr10   55483812-55983812      * |      ENr114
#>    [3]    chr11     1743415-2349463      * |      ENm011
#>    [4]    chr11     4774419-5776011      * |      ENm009
#>    [5]    chr11   64184312-64684312      * |      ENr332
#>    ...      ...                 ...    ... .         ...
#>   [40]     chr7 126078655-127241852      * |      ENm014
#>   [41]     chr8 118813039-119313039      * |      ENr321
#>   [42]     chr9 131685301-132185301      * |      ENr232
#>   [43]     chrX 122782314-123282314      * |      ENr324
#>   [44]     chrX 153114297-154409887      * |      ENm006
#>   -------
#>   seqinfo: 21 sequences from an unspecified genome; no seqlengths

And the metadata

metadata(bpRemote)
#> $PEP
#> PEP project object. Class:  Project
#>   file:  
#> /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/project_config.yaml
#>   samples:  1
sampleTable(bpRemote)
#>      sample_name
#> 1: encodeRegions
#>                                                               remote_url
#> 1: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/encodeRegions
config(bpRemote)
#> Config object. Class: Config
#>  pep_version: 2.0.0
#>  sample_table: 
#> /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/sample_table.csv
#>  bioconductor:
#>     readFunName: readRemoteData
#>     readFunPath: 
#> /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/readRemoteData.R
#>  name: example_BiocProject_remote