Automated extraction of clinical traits of multiple sclerosis in electronic medical records.

The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course.We used four algorithms based on ICD-9 codes, text keywords, and medications […]

Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.

Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file […]

Genome simulation approaches for synthesizing in silico datasets for human genomics.

Simulated data is a necessary first step in the evaluation of new analytic methods because in simulated data the true effects are known. To successfully develop novel statistical and computational methods for genetic analysis, it is vital to simulate datasets consisting of single nucleotide polymorphisms (SNPs) spread throughout the genome at a density similar to […]

Genome simulation approaches for synthesizing in silico datasets for human genomics.

Ritchie MD, Bush WS,. Simulated data is a necessary first step in the evaluation of new analytic methods because in simulated data the true effects are known. To successfully develop novel statistical and computational methods for genetic analysis, it is vital to simulate datasets consisting of single nucleotide polymorphisms (SNPs) spread throughout the genome at […]

Visualizing SNP statistics in the context of linkage disequilibrium using LD-Plus.

Often in human genetic analysis, multiple tables of single nucleotide polymorphism (SNP) statistics are shown alongside a Haploview style correlation plot. Readers are then asked to make inferences that incorporate knowledge across these multiple sets of results. To better facilitate a collective understanding of all available data, we developed a Ruby-based web application, LD-Plus, to […]

Visualizing SNP statistics in the context of linkage disequilibrium using LD-Plus.

Bush WS, Dudek SM, Ritchie MD,. Often in human genetic analysis, multiple tables of single nucleotide polymorphism (SNP) statistics are shown alongside a Haploview style correlation plot. Readers are then asked to make inferences that incorporate knowledge across these multiple sets of results. To better facilitate a collective understanding of all available data, we developed […]

Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene-gene interactions.

Parallel multifactor dimensionality reduction is a tool for large-scale analysis of gene-gene and gene-environment interactions. The MDR algorithm was redesigned to allow an unlimited number of study subjects, total variables and variable states, and to remove restrictions on the order of interactions being analyzed. In addition, the algorithm is markedly more efficient, with approximately 150-fold […]