The data is anything that can be represented as collections of genomic regions. Each hub can connect to one or more data providers, which can mix both public and private data sources, enabling users to integrate data from multiple sources and to place internal data in the context of public resources for analysis.
Segment and a Region?A
Segmentis simply aRegionthat belongs to aSegmentation. In the back-end of the segmentation provider, it is assigned aSegmentIDand can be used to Normalize raw genome intervals into a Segmentation.
Segmentation?A
Segmentationis a SO::sequence_collection with an additional constraint: there are no overlappingRegions. We use this to refer to a division of the genome into parts, but the segments need not cover 100% of the genome (though they may).
An
episb-providerserver may or may not provide segmentations. If it does, it is called a segmentation provider. We can consider a segmentation provider to be a data provider that happens to provide data of typeSegmentation(andSegment).Why is the distinction important? Because ideally, most data providers will not provide
Segmentationsbut will simply re-use those provided by a segmentation provider. So, there will be many fewer segmentation providers than data providers, and they are likely to be more centralized and more used. They need to be more inter-connectable and may provide different query optimization.