looper
run?Looper
can run samples through any pipeline that runs on the command line. The flexible pipeline interface file allows looper
to execute arbitrary shell commands. A pipeline may consist of scripts in languages like Perl, Python, or bash, or it may be built with a particular framework. Typically, we use Python pipelines built using the pypiper
package, which provides some additional power to looper
, but that's optional.
looper
executable available on PATH
?By default, Python packages are installed to ~/.local/bin
.
You can add that location to your path by appending it (export PATH=$PATH:~/.local/bin
).
Looper uses the external package divvy for cluster computing, making it flexible enough to use with any cluster resource environment. Please see the tutorial on cluster computing with looper and divvy.
looper
and pypiper
?pypiper
is a more traditional workflow-building framework; it helps you build pipelines to process individual samples. looper
is completely pipeline-agnostic, and has nothing to do with individual processing steps; it operates groups of samples (as in a project), submitting the appropriate pipeline(s) to a cluster or server (or running them locally). The two projects are independent and can be used separately, but they are most powerful when combined. They complement one another, together constituting a comprehensive pipeline management system.
Not submitting, flag found: ['*_<status>.flag']
)?When using the run
subcommand, for each sample being processed looper
first checks for "flag" files in the sample's designated output folder for flag files (which can be _completed.flag
, or _running.flag
, or _failed.flag
). Typically, we don't want to resubmit a job that's already running or already finished, so by default, looper
will not submit a job when it finds a flag file. This is what the message above is indicating.
If you do in fact want to re-rerun a sample (maybe you've updated the pipeline, or you want to run restart a failed attempt), you can do so by just passing to looper
at startup the --ignore-flags
option; this will skip the flag check for all samples. If you only want to re-run or restart a few samples, it's best to just delete the flag files for the samples you want to restart, then use looper run
as normal.
You may be interested in the usage docs for the looper rerun
command, which runs any failed samples.
As of version 0.11
, you can use looper rerun
to submit only jobs with a failed
flag. By default, looper
will not submit a job that has already run. If you want to restart a sample (maybe you've updated the pipeline, or you want to restart a failed attempt), you can either use looper rerun
to restart only failed jobs, or you pass --ignore-flags
, which will resubmit all samples. If you want more specificity, you can just manually delete the "flag" files for the samples you want to restart, then use looper run
as normal.
divvy
computing configuration file?You may notice that the compute config file does not specify resources to request (like memory, CPUs, or time). Yet, these are required in order to submit a job to a cluster. Resources are not handled by the divcfg file because they not relative to a particular computing environment; instead they vary by pipeline and sample. As such, these items should be defined at other stages.
Resources defined in the pipeline_interface.yaml
file (pipelines
section) that connects looper to a pipeline. The reason for this is that the pipeline developer is the most likely to know what sort of resources her pipeline requires, so she is in the best position to define the resources requested. For more information on how to adjust resources, see the pipelines
section of the pipeline interface page. If all the different configuration files seem confusing, now is a good time to review who's who in configuration files.
There's a list on the config files page.