Package pararead Documentation

Class ParaReadProcessor

Base class for parallel processing of sequencing reads.

Implement call to define work for each reads chunk, (e.g., chromosome). Unaligned reads are permitted, but the work then cannot rely on any sort of biologically meaningful chunking of the reads unless a partition() function is implemented. If unaligned reads are used and no partition() is implemented, reads will be arbitrarily split into chunks.


def chunk_reads(*args, **kwargs):


Aggregate output from independent read chunks into single output file.


good_chromosomes : Iterable of str Identifier (e.g., chromosome) for each chunk of reads processed. strict : bool Whether to throw an exception upon encountering a missing file. If not, simply log a warning message and continue the aggregation process that's underway, working with what is available. chrom_sep : str Delimiter between output from each chromosome. Returns

Iterable of str Path to each file successfully combined. Raises

MissingOutputFileException If executing in strict mode, and there's a reads chunk key for which the derived filepath does not exist. IllegalChunkException If a chunk of reads outside of those declared to be of interest is requested to participate in the combination.

def combine(self, good_chromosomes, strict=False, chrom_sep=None):


Pull a chunk of sequencing reads from a file.


chromosome : str Identifier for chunk of reads to select. Returns

Iterable of pysam.AlignedSegment

def fetch_chunk(self, chromosome):


Retrieve one of the files registered with pararead.


file_key : str Which file to fetch. Returns

object, likely pysam.AlignmentFile File ADT instance associated with the requested key. Raises

CommandOrderException If the indicated file hasn't been registered.

def fetch_file(self, file_key):


Refer to the pararead files mapping.


dict[str, object] Pararead files mapping.

def files:


Determine the size of the given chromosome.


chrom : str Name of chromosome of interest. Returns

int Size of chromosome of interest. Raises

CommandOrderException If there's no chromosome sizes map yet. UnknownChromosomeException If requested chromosome is not in the sizes map.

def get_chrom_size(self, chrom):


Returns ------- pysam.AlignmentFile | pysam.VariantFile Instance of the reads file abstraction appropriate for the given type of input data (e.g., BAM or VCF).


CommandOrderException If a command prerequisite for a parallel reads processor operation has not yet been performed.

def readsfile:


Add to module map any large/unpicklable variables required by call.


**file_builder_kwargs Arbitrary keyword arguments for the pysam file constructor. Warnings

A subclass overriding this method should be sure to register the file passed to the constructor, or call this method from the overriding implementation. Raises

FileTypeException If the path to the reads file given doesn't appear to match one of the supported file types.

def register_files(self, **file_builder_kwargs):


Do the processing defined partitioned across each unit (chromosome).


chunksize : int, optional Number of reads per processing chunk; if unspecified, the default heuristic of size s.t. each core gets ~ 4 chunks. interleave_chunk_sizes : bool, default False Whether to interleave reads chunk sizes. If off (default), just use the distribution that Python determines. Returns

collections.Iterable of str Names of chromosomes for which result is non-null. Raises

MissingHeaderException If attempting to run with an unaligned reads file in the context of an aligned file requirement.

def run(self, chunksize=None, interleave_chunk_sizes=False):


Establish the package-level logger.

This is intended to be called just once per "session", with a "session" defined as an invocation of the main workflow, a testing session, or an import of the primary abstractions, e.g. in an interactive iPython session. Parameters

stream : str or None, optional Standard stream to use as log destination. The default behavior is to write logs to stdout, even if null is passed here. This is to allow a CLI argument as input to stream parameter, where it may be undesirable to require specification of a default value in the client application in order to prevent passing None if no CLI option value is given. To disable standard stream logging, set 'silent' to True or pass a path to a file to which to write logs, which gets priority over a standard stream as the destination for log messages. logfile : str or FileIO[str], optional Path to filesystem location to use as logs destination. If provided, this mutes logging to a standard output stream. make_root : bool, default True Whether to use returned logger as root logger. This means that the name will be 'root' and that messages will not propagate. propagate : bool, default False Whether to allow messages from this logger to reach parent logger(s). silent : bool Whether to silence logging. This is only guaranteed for messages from this logger and for those from loggers beneath this one in the runtime hierarchy without no separate handling. Propagation must also be turned off separately--if this is not the root logger--in order to ensure that messages are not handled and emitted from a potential parent to the logger built here. devmode : bool, default False Whether to log in development mode. Possibly among other behavioral changes to logs handling, use a more information-rich message format template. level : int or str Minimum severity threshold of a logging message to be handled. verbosity : int Alternate mode of expression for logging level that better accords with intuition about how to convey this. It's positively associated with message volume rather than negatively so, as logging level is. This takes precedence over 'level' if both are provided. fmt : str Message format/template. datefmt : str Format/template for time component of a log record. Returns

logging.Logger Configured logger instance.

def setup_logger(stream=None, logfile=None, make_root=True, propagate=False, silent=False, devmode=False, level='INFO', verbosity=None, fmt=None, datefmt=None):


Augment a CLI argument parser with this package's logging options.


parser : argparse.ArgumentParser CLI options and argument parser to augment with logging options. Returns

argparse.ArgumentParser The input argument, supplemented with this package's logging options.

def add_logging_options(parser):


Convenience function creating a logger.

This module provides the ability to augment a CLI parser with logging-related options/arguments so that client applications do not need intimate knowledge of the implementation. This function completes that lack of burden, parsing values for the options supplied herein. Parameters

opts : argparse.Namespace Command-line options/arguments parsed from command line. **kwargs : dict Additional keyword arguments to the logger configuration function. Returns

logging.Logger Configured logger instance. Raises

AbsentOptionException If one of the expected options isn't available in the given Namespace. Such a case suggests that a client application didn't use this module to add the expected logging options to a parser.

def logger_via_cli(opts, **kwargs):