../code.databio.org/vignettes/vignette3simpleCache.Rmd
vignette3simpleCache.Rmd
This vignette assumes you’re familiar with the “Getting started with BiocProject
vignette” for the basic BiocProject
information and “An introduction to simpleCache
” for the basic simpleCache
information.
simpleCache
with BiocProject
For a large project, it can take substantial computational effort to run the initial data loading function that will load your data into R. We’d like to cache that result so that it doesn’t have to be reprocessed every time we want to load our project metadata and data. Pairing simpleCache
with BiocProject
allows us to do just that. This means that if your custom data processing function loads or processes large data sets that take a long time, the R
object will not be recalculated, but simply reloaded.
Briefly, this is the simpleCache
logic:
simpleCache
with BiocProject
Load the libraries and set the example project config file path:
library(simpleCache) library(BiocProject) projectConfig = system.file( "extdata", "example_peps-master", "example_BiocProject", "project_config.yaml", package = "BiocProject" )
Set the cache directory and read the data in
setCacheDir(tempdir()) simpleCache("dataSet1", { BiocProject(file = projectConfig) }) #> ::Creating cache:: /tmp/RtmpaXwULV/dataSet1.RData #> Loading config file: /tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject/project_config.yaml #> Function 'readBedFiles' read from file '/tmp/Rtmpp7Kvae/temp_libpath658d0e8e5/BiocProject/extdata/example_peps-master/example_BiocProject/readBedFiles.R'
This loads your PEP and its data with BiocProject
, and then caches the result with simpleCache
. Say you rerun this line of code. simpleCache
prevents the calculations from rerunning since the dataSet1
object is already present in the memory:
simpleCache("dataSet1", { BiocProject(file = projectConfig) }) #> ::Object exists (in .GlobalEnv):: dataSet1
Say you come back to your analysis after a while and the dataSet1
object is not in the memory (simulated by removing it with rm()
function here). simpleCache
loads the object from the directory you have specified in setCacheDir()
.
rm(dataSet1) simpleCache("dataSet1", { BiocProject(file = projectConfig) }) #> ::Loading cache:: /tmp/RtmpaXwULV/dataSet1.RData
And that’s it! In the simplest case this is all you need to organize, read, process your data and prevent from copious results recalculations every time you come back to your project.