Your final pypiper pipeline will be a python script that a pipeline user will invoke on the command-line. You will likely need to allow the user to change some parameters on the command line, and to take full advantage of Pypiper (make your pipeline recoverable, etc.), you wil need to add command-line options to your pipeline that change pypiper's settings as well. Pypiper uses the typical Python argparse module to define command-line arguments to your pipeline, and offers a series of built-in functions to help you populate your pipeline's
ArgumentParser with pypiper-specific options.
You can use an ArgumentParser as usual, adding whatever arguments you like. Then, you add Pypiper args to your parser with the function
add_pypiper_args(), and pass command-line options and arguments to your
PipelineManager, like this:
import pypiper, os, argparse parser = ArgumentParser(description='Write a short description here') # add any custom args here # e.g. parser.add_argument('--foo', help='foo help') # once you've established all your custom arguments, we can add the default # pypiper arguments to your parser like this: parser = pypiper.add_pypiper_args(parser) # Then, pass the args parsed along to the PipelineManger args = parser.parse_args() pipeline = pypiper.PipelineManager(name="my_pipeline", outfolder="out", \ args=args)
Once you've added pypiper arguments, your pipeline will then enable a few built-in arguments:
--dirty, for example. As a side bonus, all arguments (including any of your custom arguments) will be recorded in the log outputs.
That's the basics. But you can customize things for more efficiency using a simple set of pre-built args and groups of args in pypiper:
With that said, there are a few universal (Pypiper-added) options that are frequently (but not necessarily always) honored by pypiper pipelines. These default pypiper arguments are detailed below:
Recover mode, overwrite locks. This argument will tell pypiper to recover from a failed previous run. Pypiper will execute commands until it encounters a locked file, at which point it will re-execute the failed command and continue from there.
Force run follow-functions. By default, follow-functions are only run if their corresponding
run command was run; with this option you can force all follow functions to run. This is useful for regenerating QC data on existing output. For more details, see :ref:
the follow argument <the_follow_argument>.
Make all cleanups manual. By default, pypiper pipelines will delete any intermediate files. For debugging, you may want to turn this option off -- you can do that by specifying dirty mode.
New start mode. This flag will tell pypiper to start over, and run every command, even if its target output already exists.
There are two ways to modulate the arguments added by
add_pypiper_args() function: the
groups argument, which lets you add argument groups; or the
args argument, which lets you add arguments indvidually. By default,
add_pypiper_args() add all arguments listed in the
pypiper group. You may instead pass a list of one or more of these groups of arguments (to
groups) or individual arguments (to
args) to customize exactly the set of built-in options your pipeline implements.
parser.add_pypiper_args(parser, groups=['pypiper', 'common']) will add all arguments listed under
Individual arguments that are understood and used by pypiper:
-R, --recover: for a failed pipeline run, start off at the last successful step.
-N, --new-start: Just recreate everything, even if it exists.
-D, --dirty: Disables automatic cleaning of temporary files, so all intermediate files will still exist after a pipeline run (either sucessful or failed). Useful for debugging a pipeline even if it succeeds.
-F, --follow: Runs all
follow-functions, regardless of whether the accompanying command is run.
-C, --config: Pypiper pipeline config yaml file.
Individual arguments just provided for convenience and standardization:
-S, --sample-name: name of the sample
-I, --input: primary input file (e.g. read1)
-I2, --input2: secondary input file (e.g. read2)
-O, --output-parent: parent folder for pipeline results (the pipeline will use this as the parent directory for a folder named
-P, --cores: Number of cores to use
-M, --mem: Amount of memory in megabytes
-G, --genome: Reference genome assembly (e.g.
-Q, --simple-or-paired: For sequencing data, is input single-end or paired-end?
If you're using the built-in arguments, you may want to module which are required and which are not. That way, you can piggyback on how
ArgumentParser handles required arguments very nicely -- if the user does not specify a required argument, the pipeline will automatically prompt with usage instructions.
By default, built-in arguments are not flagged as required, but you can pass a list of required built-ins to the
required parameter, like
add_pypiper_args(parser, args=["sample-name"], required=["sample-name"]).
import pypiper, os, argparse parser = ArgumentParser(description='Write a short description here') # add just arguments from group `pypiper` parser = pypiper.add_pypiper_args(parser, groups=["pypiper"]) # add just arguments from group `common` parser = pypiper.add_pypiper_args(parser, groups=["common"]) # add arguments from two groups parser = pypiper.add_pypiper_args(parser, groups=["common", "resources"], required=["sample-name", "output-parent"]) # add individual argument parser = pypiper.add_pypiper_args(parser, args=["genome"]) # add some groups and some individual arguments parser = pypiper.add_pypiper_args(parser, args=["genome"], groups=["looper", "ngs"])