Base class for parallel processing of sequencing reads.
Implement call to define work for each reads chunk, (e.g., chromosome). Unaligned reads are permitted, but the work then cannot rely on any sort of biologically meaningful chunking of the reads unless a partition() function is implemented. If unaligned reads are used and no partition() is implemented, reads will be arbitrarily split into chunks.
def chunk_reads(*args, **kwargs):
Aggregate output from independent read chunks into single output file.
good_chromosomes : Iterable of str Identifier (e.g., chromosome) for each chunk of reads processed. strict : bool Whether to throw an exception upon encountering a missing file. If not, simply log a warning message and continue the aggregation process that's underway, working with what is available. chrom_sep : str Delimiter between output from each chromosome. Returns
Iterable of str Path to each file successfully combined. Raises
MissingOutputFileException If executing in strict mode, and there's a reads chunk key for which the derived filepath does not exist. IllegalChunkException If a chunk of reads outside of those declared to be of interest is requested to participate in the combination.
def combine(self, good_chromosomes, strict=False, chrom_sep=None):
Pull a chunk of sequencing reads from a file.
chromosome : str Identifier for chunk of reads to select. Returns
Iterable of pysam.AlignedSegment
def fetch_chunk(self, chromosome):
Retrieve one of the files registered with pararead.
file_key : str Which file to fetch. Returns
object, likely pysam.AlignmentFile File ADT instance associated with the requested key. Raises
CommandOrderException If the indicated file hasn't been registered.
def fetch_file(self, file_key):
Refer to the pararead files mapping.
dict[str, object] Pararead files mapping.
Determine the size of the given chromosome.
chrom : str Name of chromosome of interest. Returns
int Size of chromosome of interest. Raises
CommandOrderException If there's no chromosome sizes map yet. UnknownChromosomeException If requested chromosome is not in the sizes map.
def get_chrom_size(self, chrom):
Returns ------- pysam.AlignmentFile | pysam.VariantFile Instance of the reads file abstraction appropriate for the given type of input data (e.g., BAM or VCF).
CommandOrderException If a command prerequisite for a parallel reads processor operation has not yet been performed.
Add to module map any large/unpicklable variables required by call.
**file_builder_kwargs Arbitrary keyword arguments for the pysam file constructor. Warnings
A subclass overriding this method should be sure to register the file passed to the constructor, or call this method from the overriding implementation. Raises
FileTypeException If the path to the reads file given doesn't appear to match one of the supported file types.
def register_files(self, **file_builder_kwargs):
Do the processing defined partitioned across each unit (chromosome).
chunksize : int, optional Number of reads per processing chunk; if unspecified, the default heuristic of size s.t. each core gets ~ 4 chunks. interleave_chunk_sizes : bool, default False Whether to interleave reads chunk sizes. If off (default), just use the distribution that Python determines. Returns
collections.Iterable of str Names of chromosomes for which result is non-null. Raises
MissingHeaderException If attempting to run with an unaligned reads file in the context of an aligned file requirement.
def run(self, chunksize=None, interleave_chunk_sizes=False):
Establish the package-level logger.
This is intended to be called just once per "session", with a "session" defined as an invocation of the main workflow, a testing session, or an import of the primary abstractions, e.g. in an interactive iPython session. Parameters
stream : str or None, optional Standard stream to use as log destination. The default behavior is to write logs to stdout, even if null is passed here. This is to allow a CLI argument as input to stream parameter, where it may be undesirable to require specification of a default value in the client application in order to prevent passing None if no CLI option value is given. To disable standard stream logging, set 'silent' to True or pass a path to a file to which to write logs, which gets priority over a standard stream as the destination for log messages. logfile : str or FileIO[str], optional Path to filesystem location to use as logs destination. If provided, this mutes logging to a standard output stream. make_root : bool, default True Whether to use returned logger as root logger. This means that the name will be 'root' and that messages will not propagate. propagate : bool, default False Whether to allow messages from this logger to reach parent logger(s). silent : bool Whether to silence logging. This is only guaranteed for messages from this logger and for those from loggers beneath this one in the runtime hierarchy without no separate handling. Propagation must also be turned off separately--if this is not the root logger--in order to ensure that messages are not handled and emitted from a potential parent to the logger built here. devmode : bool, default False Whether to log in development mode. Possibly among other behavioral changes to logs handling, use a more information-rich message format template. level : int or str Minimum severity threshold of a logging message to be handled. verbosity : int Alternate mode of expression for logging level that better accords with intuition about how to convey this. It's positively associated with message volume rather than negatively so, as logging level is. This takes precedence over 'level' if both are provided. fmt : str Message format/template. datefmt : str Format/template for time component of a log record. Returns
logging.Logger Configured logger instance.
def setup_logger(stream=None, logfile=None, make_root=True, propagate=False, silent=False, devmode=False, level='INFO', verbosity=None, fmt=None, datefmt=None):
Augment a CLI argument parser with this package's logging options.
parser : argparse.ArgumentParser CLI options and argument parser to augment with logging options. Returns
argparse.ArgumentParser The input argument, supplemented with this package's logging options.
Convenience function creating a logger.
This module provides the ability to augment a CLI parser with logging-related options/arguments so that client applications do not need intimate knowledge of the implementation. This function completes that lack of burden, parsing values for the options supplied herein. Parameters
opts : argparse.Namespace Command-line options/arguments parsed from command line. **kwargs : dict Additional keyword arguments to the logger configuration function. Returns
logging.Logger Configured logger instance. Raises
AbsentOptionException If one of the expected options isn't available in the given Namespace. Such a case suggests that a client application didn't use this module to add the expected logging options to a parser.
def logger_via_cli(opts, **kwargs):