PSB 2020 Tutorial

Contact Us: wsb36[at]case.edu

Introduction:
The majority of accepted papers in biocomputing describe new computational approaches to relevant biological problems, and while journals and conferences often require the availability of software and source code, there are limited resources available for trainees to maximize the distribution and use of their software within the scientific community.  While the accepted standard is to make source code available for new approaches published work, the growing problem of system configuration issues, language and library version conflicts, and other implementation issues often impede the broad distribution and availability of software tools.  There are a variety of solutions to these implementation issues, but the learning curve for applying these solutions is steep.  In this tutorial for the Pacific Symposium of Biocomputing, we will demonstrate tools and approaches for packaging and distribution of published code.

What you will learn:
Our session co-chairs will provide hands-on experience creating the following: simple Docker containers that encapsulate code along with the libraries and software environments necessary to support them (Dr. Brett Beaulieu-Jones); R and python packages that contain validated source code and are distributed through widely available package repositories (Dr. Nicholas Wheeler), and Jupyter/Colab notebooks which meld analysis workflows with data visualization (Dr. Christian Darabos).  In this highly interactive and informal/comfortable session, participants are invited to bring laptops and actively work through packaging processes step-by-step.  We are hopeful that many PSB attendees with accepted papers will attend the session and package their own software for distribution on the PSB website.  All participants will be provided with example code for use in the tutorial.  We will provide a brief overview and demonstration (approximately 15-20 minutes for each packaging solution) followed by 20-25 minutes of supervised hands-on activity.


Workshop Organizers:

William S. Bush, Ph.D.

William S. Bush, Ph.D. is an Associate Professor in the Department of Population and Quantitative Health Sciences, and Assistant Director for Computational Methods in the Cleveland Institute for Computational Biology at Case Western Reserve University. Dr. Bush received his Ph.D. at Vanderbilt University in Human Genetics in 2008 and then continued as a post-doctoral fellow in the Neurogenomics Training Program at Vanderbilt. Dr. Bush was recently named a Mt. Sinai Health Care Foundation Scholar. As a human geneticist and bioinformatician, Dr. Bush’s research interests include understanding the functional impact of genetic variation, developing statistical and bioinformatics approaches for integrating functional genomics knowledge into genetic analysis, and the use of electronic medical records for translational research.  Dr. Bush has attended PSB annually since 2010.

Brett Beaulieu-Jones, Ph.D.

Brett Beaulieu-Jones, Ph.D. is a Post-doctoral Research Fellow in Biomedical Informatics in the Kohane lab at Harvard University.  He received his PhD from the Perelman School of Medicine at the University of Pennsylvania under the supervision of Dr. Jason Moore and Dr. Casey Greene. Dr. Beaulieu-Jones’ doctoral research focused on using machine learning-based methods to more precisely define phenotypes from large-scale biomedical data repositories, e.g. those contained in clinical records. He is currently performing large-scale data integration (genomic, therapeutic, imaging) to both better understand disease etiology as well as provide precise therapeutic recommendations. Initially, he is working to develop targeted models of drug selection for patients with refractory epilepsy and to further develop machine learning methods that model the way patients progress over time using longitudinal data.  Dr. Beaulieu-Jones has attended PSB since 2016.

Christian Darabos, Ph.D.

Christian Darabos, Ph.D. is a Bioinformatics Applications Specialist in Research Computing at Dartmouth College.  He graduated with a double Ph.D. degree under the supervision of Prof. Marco Tomassini at the Information Systems Department of the Faculty of Business and Economics (HEC) of the University of Lausanne, Switzerland, in collaboration with the Computational Biology Unit of the Molecular Biotechnology Center of the University of Torino in Italy, under the supervision of Prof. Ferdinando Di Cunto.  At Dartmouth, Dr. Darbos conducts a series of workshops and tutorials on Reproducible Research.  He has attended PSB since 2011.

Nicholas Wheeler, Ph.D.

Nicholas Wheeler, Ph.D. is a Research Associate in the Institute for Computational Biology and the Bush lab at Case Western Reserve University.  Dr. Wheeler is a macromolecular scientist and engineer by training with extensive expertise in the use of “big data” technologies for large scale data aggregation and analysis.  In the Bush Lab, Dr. Wheeler manages genomic datasets and their associated meta-data within a Spark/Hadoop cluster, with extensions to the open-source HAIL platform for genomic analysis, which ensures standardization and reproducibility of experimental analyses.  Over the course of his career, Dr. Wheeler has created, validated, and submitted multiple R and Python packages into public repositories.  2019 was Dr. Wheeler’s first PSB meeting, and he will attend PSB 2020.