Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.


Wiley LK, Sivley RM, Bush WS,.

Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.

Posted in Publications, Will Bush and tagged , , , , , .
Will Bush

Will Bush

William S. Bush, Ph.D., is a human geneticist and bioinformatician, and Assistant Professor within the Cleveland Institute for Computational Biology and the Department of Population and Quantitative Health Sciences at Case Western Reserve University.