vignettes/feature3_derivedAttributes.Rmd
feature3_derivedAttributes.Rmd
pepr
This vignette will show you how and why to use the derived attributes functionality of the pepr
package.
basic information about the PEP concept on the project website.
broader theoretical description in the derived attributes documentation section.
The example below demonstrates how to use the derived attributes to flexibly define the samples attributes the file_path
column of the sample_table.csv
file to match the file names in your project. Please consider the example below for reference:
sample_name | protocol | organism | time | file_path |
---|---|---|---|---|
pig_0h | RRBS | pig | 0 | data/lab/project/pig_0h.fastq |
pig_1h | RRBS | pig | 1 | data/lab/project/pig_1h.fastq |
frog_0h | RRBS | frog | 0 | data/lab/project/frog_0h.fastq |
frog_1h | RRBS | frog | 1 | data/lab/project/frog_1h.fastq |
As the name suggests the attributes in the specified attributes (here: file_path
) can be derived from other ones. The way how this process is carried out is indicated explicitly in the project_config.yaml
file (presented below). The name of the column is determined in the sample_modifiers.derive.attributes
key-value pair, whereas the pattern for the attributes construction - in the sample_modifiers.derive.sources
one. Note that the second level key (here: source
) has to exactly match the attributes in the file_path
column of the modified sample_annotation.csv
(presented below).
Registered S3 method overwritten by 'pryr':
method from
print.bytes Rcpp
pep_version: 2.0.0
sample_table: sample_table.csv
output_dir: $HOME/hello_looper_results
sample_modifiers:
derive:
attributes: file_path
sources:
source1: $HOME/data/lab/project/{organism}_{time}h.fastq
source2:
/path/from/collaborator/weirdNamingScheme_{external_id}.fastq
Let’s introduce a few modifications to the original sample_annotation.csv
file to map the appropriate data sources from the project_config.yaml
with attributes in the derived column - [file_path]
:
sample_name | protocol | organism | time | file_path |
---|---|---|---|---|
pig_0h | RRBS | pig | 0 | source1 |
pig_1h | RRBS | pig | 1 | source1 |
frog_0h | RRBS | frog | 0 | source1 |
frog_1h | RRBS | frog | 1 | source1 |
Load pepr
and read in the project metadata by specifying the path to the project_config.yaml
:
library(pepr) projectConfig = system.file( "extdata", paste0("example_peps-", branch), "example_derive", "project_config.yaml", package = "pepr" ) p = Project(projectConfig) #> Loading config file: /private/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/RtmpEYsaEm/temp_libpath43b174cbd72/pepr/extdata/example_peps-master/example_derive/project_config.yaml
And inspect it:
sampleTable(p) #> sample_name protocol organism time #> 1: pig_0h RRBS pig 0 #> 2: pig_1h RRBS pig 1 #> 3: frog_0h RRBS frog 0 #> 4: frog_1h RRBS frog 1 #> file_path #> 1: /Users/mstolarczyk/data/lab/project/pig_0h.fastq #> 2: /Users/mstolarczyk/data/lab/project/pig_1h.fastq #> 3: /Users/mstolarczyk/data/lab/project/frog_0h.fastq #> 4: /Users/mstolarczyk/data/lab/project/frog_1h.fastq
As you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file.
What is more, the p
object consists of all the information from the project config file (project_config.yaml
). Run the following line to explore it:
config(p) #> Config object. Class: Config #> pep_version: 2.0.0 #> sample_table: #> /private/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/RtmpEYsaEm/temp_libpath43b174cbd72/pepr/extdata/example_peps-master/example_derive/sample_table.csv #> output_dir: /Users/mstolarczyk/hello_looper_results #> sample_modifiers: #> derive: #> attributes: file_path #> sources: #> source1: #> /Users/mstolarczyk/data/lab/project/{organism}_{time}h.fastq #> source2: #> /path/from/collaborator/weirdNamingScheme_{external_id}.fastq #> name: example_derive