../code.databio.org/vignettes/vignette4remoteData.Rmd
vignette4remoteData.Rmd
Before you start see the Getting started with BiocProject
vignette for the basic information and installation instructions.
There is no limit to the data processing function complexity. For example, the function can retrieve the data from a remote source and then process it.
For reference consider the readRemoteData.R
function
function (project)
{
url = pepr::sampleTable(project)$remote_url[[1]]
bfc = BiocFileCache::BiocFileCache(cache = tempdir(), ask = FALSE)
path = BiocFileCache::bfcrpath(bfc, url)
df = read.table(path)
colnames(df) = c("chr", "start", "end", "name")
GenomicRanges::GRanges(df)
}
and the PEP that it uses:
sample_name | remote_url |
---|---|
encodeRegions | http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/encodeRegions |
Registered S3 method overwritten by 'pryr':
method from
print.bytes Rcpp
pep_version: 2.0.0
sample_table: sample_table.csv
bioconductor:
readFunName: readRemoteData
readFunPath: readRemoteData.R
BiocProject
functionGet path to the config file
library(BiocProject) ProjectConfigRemote = system.file( "extdata", "example_peps-master", "example_BiocProject_remote", "project_config.yaml", package = "BiocProject" )
Run the BiocProject
function. Creates an object returned with the data processing function with a PEP in its metadata
slot:
bpRemote = BiocProject(file=ProjectConfigRemote) #> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/project_config.yaml #> Function 'readRemoteData' read from file '/tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/readRemoteData.R' #> adding rname 'http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/encodeRegions'
With this simple line of code:
R
envrionmentLet’s inspect the results:
bpRemote #> GRanges object with 44 ranges and 1 metadata column: #> seqnames ranges strand | name #> <Rle> <IRanges> <Rle> | <character> #> [1] chr1 151158060-151658060 * | ENr231 #> [2] chr10 55483812-55983812 * | ENr114 #> [3] chr11 1743415-2349463 * | ENm011 #> [4] chr11 4774419-5776011 * | ENm009 #> [5] chr11 64184312-64684312 * | ENr332 #> ... ... ... ... . ... #> [40] chr7 126078655-127241852 * | ENm014 #> [41] chr8 118813039-119313039 * | ENr321 #> [42] chr9 131685301-132185301 * | ENr232 #> [43] chrX 122782314-123282314 * | ENr324 #> [44] chrX 153114297-154409887 * | ENm006 #> ------- #> seqinfo: 21 sequences from an unspecified genome; no seqlengths
And the metadata
metadata(bpRemote) #> $PEP #> PEP project object. Class: Project #> file: #> /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/project_config.yaml #> samples: 1 sampleTable(bpRemote) #> sample_name #> 1: encodeRegions #> remote_url #> 1: http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/encodeRegions config(bpRemote) #> Config object. Class: Config #> pep_version: 2.0.0 #> sample_table: #> /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/sample_table.csv #> bioconductor: #> readFunName: readRemoteData #> readFunPath: #> /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject_remote/readRemoteData.R #> name: example_BiocProject_remote