The PEP that this example is based on is available in the example_peps repsitory in the example_subprojects1 folder.
The example below demonstrates how and why to use implied attributes functionality to define numerous similar projects in a single project config file. This functionality is extremely convenient when one has to define projects with small settings discreptancies, like different attributes in the annotation sheet. For example libraries ABCD
and EFGH
instead of the original RRBS
.
Import libraries and set the working directory:
import peppy
Read in the project metadata by specifying the path to the project_config.yaml
p_subproj = peppy.Project("../examples/example_peps-master/example_subprojects1/project_config.yaml")
No local config file was provided
Found global config file in DIVCFG: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml
Loading divvy config file: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml
Use 'compute_packages' instead of 'compute'
Available packages: set(['singularity_local', 'default', 'largemem', 'singularity_slurm', 'sigterm', 'local', 'parallel'])
Activating compute package 'default'
To see whether there are any subprojects available within the project_config.yaml
file run the following command:
Let's inspect the sample annotation sheet.
p_subproj.sheet
sample_name | library | organism | time | file_path | |
---|---|---|---|---|---|
0 | pig_0h | RRBS | pig | 0 | source1 |
1 | pig_1h | RRBS | pig | 1 | source1 |
2 | frog_0h | RRBS | frog | 0 | source1 |
3 | frog_1h | RRBS | frog | 1 | source1 |
p_subproj.subprojects
{'newLib2': {'metadata': {'sample_annotation': 'sample_annotation_newLib2.csv'}}, 'newLib': {'metadata': {'sample_annotation': 'sample_annotation_newLib.csv'}}}
As you can see, there are two subprojects available: newLib
and newLib2
. Nonetheless, only the main opne is "active".
Each of subprojects can be activated with the following command:
sp = p_subproj.activate_subproject("newLib")
sp2 = p_subproj.activate_subproject("newLib2")
No local config file was provided
Found global config file in DIVCFG: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml
Loading divvy config file: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml
Use 'compute_packages' instead of 'compute'
Available packages: set(['singularity_local', 'default', 'largemem', 'singularity_slurm', 'sigterm', 'local', 'parallel'])
Activating compute package 'default'
No local config file was provided
Found global config file in DIVCFG: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml
Loading divvy config file: /Users/mstolarczyk/Uczelnia/UVA/code/pepenv/uva_rivanna.yaml
Use 'compute_packages' instead of 'compute'
Available packages: set(['singularity_local', 'default', 'largemem', 'singularity_slurm', 'sigterm', 'local', 'parallel'])
Activating compute package 'default'
Let's inspect the sample annotation sheet when the newLib2
subproject is active.
sp.sheet
sample_name | library | organism | time | file_path | |
---|---|---|---|---|---|
0 | pig_0h | EFGH | pig | 0 | source1 |
1 | pig_1h | EFGH | pig | 1 | source1 |
2 | frog_0h | EFGH | frog | 0 | source1 |
3 | frog_1h | EFGH | frog | 1 | source1 |
The library
attribute in each sample has changed from RRBS
to EFGH
. This behavior was specified in the project_config.yaml
that points to a different sample_annotation_newLib2.csv
with changed library
attribute.
with open("../examples/example_peps-master/example_subprojects1/project_config.yaml") as f:
print(f.read())
metadata:
sample_annotation: sample_annotation.csv
output_dir: $HOME/hello_looper_results
derived_attributes: [file_path]
data_sources:
source1: /data/lab/project/{organism}_{time}h.fastq
source2: /path/from/collaborator/weirdNamingScheme_{external_id}.fastq
subprojects:
newLib:
metadata:
sample_annotation: sample_annotation_newLib.csv
newLib2:
metadata:
sample_annotation: sample_annotation_newLib2.csv
with open("../examples/example_peps-master/example_subprojects1/sample_annotation_newLib2.csv") as f:
print(f.read())
sample_name,library,organism,time,file_path
pig_0h,EFGH,pig,0,source1
pig_1h,EFGH,pig,1,source1
frog_0h,EFGH,frog,0,source1
frog_1h,EFGH,frog,1,source1