genomicsdb.org - genomicsDB - Sparse Array Storage for Genomics

Description: Open source project providing a collaboration to optimizing sparse array storage for genomics. Please donate or contribute code to our github.

data (5180) automation (5025) storage (4822) database (3589) genomics (278) omics (26) genomicsdb (2)

Example domain paragraphs

Technology

Using high-level APIs provided in C++, Java*, and Spark*, users can both write and read variant records to and from GenomicsDB shared-nothing instances in parallel using multiple processes in a Single Process Multiple Data (SPMD) manner.

GenomicsDB uses columnar sparse arrays where samples are mapped to rows and genome positions or sites of variants are mapped to columns. These columns are partitioned in a shared-nothing fashion across thousands of machines, enabling the joint genotyping workflow in Broad Institute’s genome analyzer toolkit (GATK) to scale to 100,000 samples and beyond.

Links to genomicsdb.org (1)