Package pararead Documentation

Class ParaReadProcessor

Base class for parallel processing of sequencing reads.

Implement call to define work for each reads chunk, (e.g., chromosome). Unaligned reads are permitted, but the work then cannot rely on any sort of biologically meaningful chunking of the reads unless a partition() function is implemented. If unaligned reads are used and no partition() is implemented, reads will be arbitrarily split into chunks.

chunk_reads

def chunk_reads(*args, **kwargs):

combine

Aggregate output from independent read chunks into single output file.

Parameters

good_chromosomes : Iterable of str Identifier (e.g., chromosome) for each chunk of reads processed. strict : bool Whether to throw an exception upon encountering a missing file. If not, simply log a warning message and continue the aggregation process that's underway, working with what is available. chrom_sep : str Delimiter between output from each chromosome. Returns


Iterable of str Path to each file successfully combined. Raises


MissingOutputFileException If executing in strict mode, and there's a reads chunk key for which the derived filepath does not exist. IllegalChunkException If a chunk of reads outside of those declared to be of interest is requested to participate in the combination.

def combine(self, good_chromosomes, strict=False, chrom_sep=None):

fetch_chunk

Pull a chunk of sequencing reads from a file.

Parameters

chromosome : str Identifier for chunk of reads to select. Returns


Iterable of pysam.AlignedSegment

def fetch_chunk(self, chromosome):

fetch_file

Retrieve one of the files registered with pararead.

Parameters

file_key : str Which file to fetch. Returns


object, likely pysam.AlignmentFile File ADT instance associated with the requested key. Raises


CommandOrderException If the indicated file hasn't been registered.

def fetch_file(self, file_key):

files

Refer to the pararead files mapping.

Returns

dict[str, object] Pararead files mapping.

def files:

get_chrom_size

Determine the size of the given chromosome.

Parameters

chrom : str Name of chromosome of interest. Returns


int Size of chromosome of interest. Raises


CommandOrderException If there's no chromosome sizes map yet. UnknownChromosomeException If requested chromosome is not in the sizes map.

def get_chrom_size(self, chrom):

readsfile

Returns ------- pysam.AlignmentFile | pysam.VariantFile Instance of the reads file abstraction appropriate for the given type of input data (e.g., BAM or VCF).

Raises

CommandOrderException If a command prerequisite for a parallel reads processor operation has not yet been performed.

def readsfile:

register_files

Add to module map any large/unpicklable variables required by call.

Parameters

**file_builder_kwargs Arbitrary keyword arguments for the pysam file constructor. Warnings


A subclass overriding this method should be sure to register the file passed to the constructor, or call this method from the overriding implementation. Raises


FileTypeException If the path to the reads file given doesn't appear to match one of the supported file types.

def register_files(self, **file_builder_kwargs):

run

Do the processing defined partitioned across each unit (chromosome).

Parameters

chunksize : int, optional Number of reads per processing chunk; if unspecified, the default heuristic of size s.t. each core gets ~ 4 chunks. interleave_chunk_sizes : bool, default False Whether to interleave reads chunk sizes. If off (default), just use the distribution that Python determines. Returns


collections.Iterable of str Names of chromosomes for which result is non-null. Raises


MissingHeaderException If attempting to run with an unaligned reads file in the context of an aligned file requirement.

def run(self, chunksize=None, interleave_chunk_sizes=False):

setup_logger

Establish the package-level logger.

This is intended to be called just once per "session", with a "session" defined as an invocation of the main workflow, a testing session, or an import of the primary abstractions, e.g. in an interactive iPython session. Parameters


stream : str or None, optional Standard stream to use as log destination. The default behavior is to write logs to stdout, even if null is passed here. This is to allow a CLI argument as input to stream parameter, where it may be undesirable to require specification of a default value in the client application in order to prevent passing None if no CLI option value is given. To disable standard stream logging, set 'silent' to True or pass a path to a file to which to write logs, which gets priority over a standard stream as the destination for log messages. logfile : str or FileIO[str], optional Path to filesystem location to use as logs destination. If provided, this mutes logging to a standard output stream. make_root : bool, default True Whether to use returned logger as root logger. This means that the name will be 'root' and that messages will not propagate. propagate : bool, default False Whether to allow messages from this logger to reach parent logger(s). silent : bool Whether to silence logging. This is only guaranteed for messages from this logger and for those from loggers beneath this one in the runtime hierarchy without no separate handling. Propagation must also be turned off separately--if this is not the root logger--in order to ensure that messages are not handled and emitted from a potential parent to the logger built here. devmode : bool, default False Whether to log in development mode. Possibly among other behavioral changes to logs handling, use a more information-rich message format template. level : int or str Minimum severity threshold of a logging message to be handled. verbosity : int Alternate mode of expression for logging level that better accords with intuition about how to convey this. It's positively associated with message volume rather than negatively so, as logging level is. This takes precedence over 'level' if both are provided. fmt : str Message format/template. datefmt : str Format/template for time component of a log record. Returns


logging.Logger Configured logger instance.

def setup_logger(stream=None, logfile=None, make_root=True, propagate=False, silent=False, devmode=False, level='INFO', verbosity=None, fmt=None, datefmt=None):

add_logging_options

Augment a CLI argument parser with this package's logging options.

Parameters

parser : argparse.ArgumentParser CLI options and argument parser to augment with logging options. Returns


argparse.ArgumentParser The input argument, supplemented with this package's logging options.

def add_logging_options(parser):

logger_via_cli

Convenience function creating a logger.

This module provides the ability to augment a CLI parser with logging-related options/arguments so that client applications do not need intimate knowledge of the implementation. This function completes that lack of burden, parsing values for the options supplied herein. Parameters


opts : argparse.Namespace Command-line options/arguments parsed from command line. **kwargs : dict Additional keyword arguments to the logger configuration function. Returns


logging.Logger Configured logger instance. Raises


AbsentOptionException If one of the expected options isn't available in the given Namespace. Such a case suggests that a client application didn't use this module to add the expected logging options to a parser.

def logger_via_cli(opts, **kwargs):