How does PEPATAC handle technical or biological replicates?

Currently, PEPATAC intentionally does not incorporate replicate information because there is no universally accepted approach to dealing with replicates, which depends on the biology of the particular samples. Instead, we recommend a two-stage approach: First, individually run each replicate through the pipeline, and evaluate each replicate separately to ensure quality-control. Then, either merge scores of replicates at the peak level, or merge raw fastq files for replicates you wish to keep and re-run the pipeline on the merged sample. For an example of this, see Corces, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362, (Supplemental methods under "Constructing a counts matrix and normalization.").

When deciding whether or not to merge technical replicates, you should first follow basic QC procedures you would perform on any sample (see FAQ question below). But in addition, you can use a cross-replicate comparison to make sure the replicates correspond to one another. There are several ways to do this. For example, calculate the ATAC-seq log2(CPM*) correlation between each replicate.

*CPM = counts + (scaled prior count using edgeR) per million mapped reads (see Corces et al. (2018) Supplemental methods)

How do I know if my samples or replicates are high quality?

What is the $GENOME variable?

The $GENOME environment variable represents the PATH to where you have stored refgenie compatible genome builds. For example, if I have placed the hg38 genome build in a genomes/ directory in my HOME directory, I would execute the following command:

export GENOME="$HOME/genomes"