Run multiple samples through
This guide walks you through customizing configuration files for your own project. The pipeline can be run directly from the command line for a single sample (see Install and run). If you need to run it on many samples, you could write your own sample handling code, but we have pre-configured everything to work nicely with
looper, our sample handling engine. This section explains how to use
This pipeline is pre-configured to work with
looper. Looper is a pipeline submission engine that makes it easy to deploy any pipeline across samples. It will let you run the jobs locally, in containers, using any cluster resource manager, or in containers on a cluster.
If you've already gone through the installation process for
PEPATAC, you will already have
looper installed. Otherwise, install
pip install --user https://github.com/pepkit/looper/zipball/master
2: Configure project files
To configure your project to use
looper, you must use a project format called PEP format. There are multiple examples you can adapt in the
examples/ folder. The details for how to construct this are universal to all pipelines that read PEPs, including PEPATAC, and you should follow the detailed instructions on how to create a PEP. We have included an example test PEP to get you started. In short, you need two files for your project:
- project config file -- describes output locations, pointers to data, etc.
- sample annotation file -- comma-separated value (CSV) list of your samples.
The sample annotation file must specify these columns:
- library ('ATAC' or 'ATACSEQ' or 'ATAC-seq')
- organism (may be 'human' or 'mouse')
- whatever else you want
Then, run your project by passing your project config file to
looper run project_config.yaml.
3: Run the pipeline through
Start by running the example project in the
examples/test_project/ folder. Let's use the
-d argument to do a dry run, which will create job scripts for every sample in the project, but will not execute them:
looper run -d examples/test_project/test_config.yaml
If the looper executable is not in your
$PATH, add the following line to your
If that worked, let's actually run the example by taking out the
looper run examples/test_project/test_config.yaml
There are lots of other cool things you can do with looper, like dry runs, summarize results, check on pipeline run status, clean intermediate files to save disk space, lump multiple samples into one job, and more. For details, consult the looper docs.