Symposium Blog

A New Call for Manuscripts! Celebrating the Breadth of Diversity in Human Genomics

Dr. Lucia Hindorff of the National Human Genome Research Institute (NHGRI) and I are excited to announce a new call for manuscripts that highlight diversity in genetics and genomics. This call for manuscripts known as “Celebrating the Breadth of Diversity in Human Genomics” is organized as a special Research Topic with the journal Frontiers in Genetics. The topic under this call for manuscripts is broad, spanning basic and clinical human genetics and genomics research such as studies of specific outcomes, studies that include diverse populations, and methodology developed for population-specific or multi-population studies. The call for manuscripts also includes genetics and genomics policy statements and ethical, legal, and social implications (ELSI) research. We are interested in a variety of formats such as original research, case studies, and reviews. For more information regarding the call for manuscripts, as well as information regarding the submission and peer-review processes, please visit the Research Topic at Frontiers in Genetics.

NIH All of Us Launches

Photo credit: Janina Jeff, PhD, at the 2017 New Balance Bronx 10 Mile.

It’s official—the All of Us Research Program launched on May 6, 2018.  For those of you unfamiliar with the All of Us Research Program, formerly known as the Precision Medicine Initiative Cohort Program, it is an ambitious effort to recruit at least one million participants in the the United States for precision medicine research.  Recruitment efforts will focus on diversity, broadly defined as race/ethnicity, socioeconomic status, sexual orientation, and age.  Participants will be asked to donate biospecimens and data on health, lifestyle, exposures, and behaviors.  The program will use a variety of data collection methods, including tried-and-true surveys as well as emerging technologies such as wearables (think Fitbits) and electronic health records.

The All of Us Research Program is one of several programs under the Precision Medicine Initiative umbrella announced by the previous administration during the State of the Union address in January 2015 (PMID:26122056).  Following that January 2015 announcement was an infusion of $215 million to the National Institutes of Health’s (NIH) fiscal year 2016 budget.  The first “shovel ready” precision medicine projects were in cancer (PMC5101938), and the National Cancer Institute (NCI) seized the opportunity to expand clinical trials of tumor profiling and genomically-guided therapies for several cancers.  Sub-study data from these clinical trials has already been released, and the preliminary data are promising.

The remainder of the NIH budget focused on activities related to the Precision Medicine Initiative Cohort Program.  NIH first assembled a group of stakeholders including scientists, clinicians, patients, and industry in March 2015.  Between March and September, NIH held a series of workshops including the Public Workshop on Unique Scientific Opportunities for the National Research Cohort (April), the Digital Health Data in a Million-Person Precision Medicine Initiative Cohort (May), the Participant Engagement and Health Equity Workshop (July), and the Mobile and Personal Technologies in Precision Medicine Workshop (July), culminating in the final report delivered by the Precision Medicine Initiative Working Group in September 2015.

Soon after the final report was delivered, NIH released a series of funding opportunities to support cohort activities such as communication, direct volunteer pilot studies, biobanking, healthcare provider organization enrollment, regional medical care enrollment, and participant engagement.  All of these, including a coordinating center, have been awarded as of the end of 2017.  Pilot ascertainment studies began in June 2017, but fewer participants were enrolled by the end of the year than projected signalling a delay in the official launch of the program (PMID:28883052).  These delays have drawn criticism for the program, with some cautioning that the recruitment goals are too ambitious and the costs are too high.

Despite these criticisms, the All of Us Program forged ahead.  Approximately six weeks prior to launch, the All of Us Program hosted Research Priorities Workshop March 21-23, 2018.  The workshop attendees included investigators affiliated and unaffiliated with All of Us across a spectrum of science as well as participants already enrolled in the program.  Through this three-day workshop, the program requested and received input on data collection (what to collect and how to collect) and research priorities.  All input was recorded and is being analyzed to identify high-priority use cases and potential protocol elements for future versions developed for the cohort.

NIH and All of Us investigators anticipate high enrollment rates and rapid accrual of data, and a recent NIH survey in August 2016 suggests that, yes, this might be possible (PMC4988644).  The population-based survey of almost 5,000 adults demonstrated high support for the cohort among those surveyed.  While enthusiasm did not differ much by race/ethnicity, it is important to note that increasing levels of education and younger age were independently associated with willingness to participate.  Also, compared with non-Hispanic whites, Hispanics were significantly more enthusiastic about the cohort program.  Note, however, that this survey was conducted pre-2016.  It is unclear if attitudes and trust in government-sponsored research programs have significantly changed over the last year and a half.  Furthermore, the Facebook Cambridge Analytica scandal and the Golden State Killer case (PMID:29880677) prominently preceded the official launch of cohort, leaving some to speculate that these events among others may have a negative impact on research that relies on combining big data with genomics.

Regardless of recent events, we (PMID:29949895) and others have also noted that expressing support for a cohort or expressing willingness to participate in a cohort may not translate to actually signing up and contributing personal data and bodily fluids on a continuous basis.  Actual participation in an intensive and intimate study such as the proposed cohort may hinge on several practical factors, including availability (e.g., time off from work; free time in general), transportation, child care, and internet access, among other factors associated with burden.  Education and familiarity with technology may also be an important factor.  Much of the consenting process will be performed electronically, particularly for the direct volunteers.  Data collection will be app-driven.  As an example, Sync for Science was designed to enable sharing of electronic health records between participants and researchers.  Whether this and other recruitment and data collection strategies help All of Us reach its ascertainment goals remains to be seen.


A Spotlight on Diversity in Precision Medicine Research and Call for Manuscripts

The consensus is clear:  there is a need for diversity in precision medicine research.  Most of the current dialogue on diversity centers on study populations.  Genome-wide association studies (GWAS), which began in earnest in 2007 with publication of the landmark Wellcome Trust Case Control Consortium study of seven common diseases (PMIC2719288) and a trio of type 2 diabetes studies (PMC3214617, PMC3772310, and PMID17463246), focused almost exclusively on populations of European-descent.  The GWAS that immediately followed mostly repeated this population pattern.

The importance of diversity, limited here to race/ethnicity and genetic ancestry, in study populations has long been recognized by genomicists and clinicians alike.  Even before the sequencing of the human genome, genetic anthropologist L. Lucas Cavalli-Sforza and colleagues recognized the importance of sampling variation from many populations (PMC1683673) and called for a Human Genome Diversity Project in the early 1990s (PMID1769670 and PMID155803201).  The International HapMap Project (PMC1880871 and PMC2689609) and other efforts (PMID16124863) to catalog human genetic variation and linkage disequilibrium, which included multiple populations, were natural extensions of the Human Genome Project and necessary steps in the evolution of genomic resources necessary for GWAS that now include the 1000 Genomes Project (PMC4750478 and PMC4617611) and the Exome Aggregation Consortium (PMC5018207).

On the clinical or biomedical side, the use and importance of race/ethnicity in research was (PMID12646675) and continues to be controversial (PMID26912690).  The labels used to describe race/ethnicity are social constructs, not scientific terms rooted in biology, and there is a long history of misuse and confusion (PMID10655044 and PMID1243037) surrounding these labels.  Still, there are legitimate arguments for the usefulness of these labels in biomedical research (PMID12646676), leading some prominent journals to call for their careful use in the biomedical research literature (PMID12771118).

The importance of diversity in basic and biomedical research was clearly recognized long before GWAS, so why is the research community still discussing this and discussing it ever so loudly?  Arguably, it was the creation (PMC2687147) of an on-line GWAS catalog ( that facilitated analyses clearly showing not just a disparity in which populations were included in GWAS but also the magnitude (PMID21753830).  Updated analyses demonstrate that the situation has not improved much, particularly for Blacks or African Americans and American Indians (PMID27734877).  The lack of progress coupled with recent reports of misdiagnoses related to a dearth of basic frequency data in non-European populations (PMID27532831) has led to an urgent call for action from investigators and funders alike (PMID29151588).  As a result, major new research initiatives such as the Precision Medicine Initiative Cohort Program or All of Us aim to recruit participants from diverse groups (PMID25635347 and PMID2722314).

The North Coast Conference on Precision Medicine series at Case Western Reserve University (CWRU) fully embraces this call and aims to highlight pressing and emerging issues in precision medicine research with an emphasis on diversity.  Established in 2015, the conference series has featured local and national speakers presenting topics ranging from statistical considerations when working with ancestrally diverse data to pharmacogenomics and African Americans to biobanks in the Caribbean islands.  The upcoming 2018 conference promises to feature speakers presenting work related to the return of genomic results, and confirmed speakers’ topics range from barrier to delivery of results through the internet to special considerations for American Indian populations.

New to this conference series is a call for manuscripts for topics related to diversity in precision medicine research, including diversity in the biomedical workforce (of which deserves its own blog post).  The call for manuscripts is in response to a Frontiers in Genetics Research Topic, organized by Dr. Dana Crawford, specialty chief-editor for the Applied Genetic Epidemiology section, along with Drs. William Bush and Jessica Cooke Bailey of CWRU.  Abstracts and inquiries are due June 1, 2018, and manuscripts are due October 1, 2018.  All manuscripts will be peer-reviewed prior to consideration for publication.  We anticipate that this Research Topic will further the discussion of diversity in precision medicine and help to both advance and sustain the current call for the necessary action to ensure that precision medicine is truly for all.

An update on the Lacks family

Our second annual symposium featured members of the Henrietta Lacks family Shirley Lacks and Veronica (Robinson) Spencer.  Ms. Lacks is the sister-in-law of the late Henrietta Lacks, and she was good friends with Henrietta’s daughter Deborah.  Veronica (Robinson) Spencer is the great-granddaughter of Henrietta Lacks.

The Lacks family story has become and continues to be a significant source of academic and legal discussion concerning many modern bioethical issues.  At the heart of the story are Henrietta’s cells, which famously became immortalized after physicians biopsied her aggressive cervical cancer in 1951.   As is standard even today, a biopsy taken for clinical care can be used for research without a patient’s explicit consent if the samples are de-identified.  We can quibble whether or not using the patient’s initials “HeLa” to label the cell lines was a sufficient de-identification strategy (hint: it was not; PMID:4942173), but the Common Rule is clear on this issue and has not changed with its recent revision.

According to Rebecca Skloot’s nonfiction book The Immortal Life of Henrietta Lacks, among other sources, the lack of information and economic opportunities were central to the family’s issues with the existence of the cell lines.  The Lacks family was re-contacted in the 1970s for follow-up blood samples drawn by Johns Hopkins University investigators associated with Dr. Victor McKusick’s lab.  Skloot makes it clear in her book that the Lacks family 1) had no idea the cell lines existed until the early 1970s and 2) had no understanding of McKusick’s genetic study.   The Lacks family was under the impression that they were being tested for their mother’s cancer and that they would be receiving results.  Instead, McKusick’s team was trying to establish a genetic marker (G6PD) for HeLa cells by inferring Henrietta’s genotype from the phenotypes of her descendants (PMID:1246620).   McKusick’s study was an important contribution to identifying cell lines contaminated by HeLa (PMID:1246601), but it had nothing to do with testing her descendants for cancer.

This horrible misunderstanding between the Lacks family and McKusick’s team is an example of informed consent gone wrong.   Technically, the Lacks family verbally assented to the blood draw but were not consented for the study associated with the blood draw, an approach that was par-for-the-course until informed consent was formalized and required soon after.  The HeLa case study also provides an embarrassing example of how poorly scientists communicate with participants.  A New York Times article that covered Skloot’s book and the HeLa story recounted an unflattering portrayal of Dr. McKusick in his interaction with family members when they had questions.  A signed copy of Mendelian Inheritance in Man was not the way to start a dialogue with study participants.

To be fair, the spirit and language of informed consent has changed over time due to this and other studies.  Policies and procedures related to privacy have also recently evolved as a result of the HeLa story.  In 2013, three years after Skloot’s book appeared in print, a German group published a manuscript describing the HeLa genome (PMID:23550136) and made these data available via the database of Genotypes and Phenotypes (dbGaP).  On the heels of this publication was an in-press publication from a US group funded by the National Institutes of Health (NIH) (PMID:23925245).  Investigators associated with these sequencing projects did not seek permission or consent from the Lacks family, and NIH did not require this permission prior to funding the study.

Deposition into dbGaP is required of NIH-funded human studies as part of the NIH Genomic Data Sharing Policy.  dbGaP has also evolved to become the de facto method to share data with the larger scientific community regardless of source of funding presuming the consent language allows for this deposition.  In the case of Henrietta Lacks, there was no consent and she was deceased.  To make matters more complicated, her family and their pedigree was now known world-wide.  Consequently, publication of Henrietta’s DNA sequence potentially divulges disease status or risk for her family members.  In light of these major privacy concerns, the NIH sought consent from living members of the Lacks family for the release of these sequencing data (PMID:23925224).  As a result of this process, the sequence is now accessible in dbGaP on a per-project basis following approval from the HeLa Genome Data Access working group.  At least two members of the Lacks family, David Lacks, Jr. and Veronica (Robinson) Spencer, sit on this working group.

Between Skloot’s book and the HeLa Genome Data Access agreement, the Lacks family have become popular speakers for meetings and educational events.  I first saw the Lacks family speak at the 2013 Annual Biomedical Research Conference for Minority Students (ABRCMS) in Nashville, TN where  David Lacks, Jr. and Shirley Lacks represented the family.  The Lacks family has since spoken at the ICB’s North Coast Conference on Precision Medicine (Veronica (Robinson) Spencer and Shirley Lacks) and has revisited ABRCMS in 2016 (Jeri and David Lacks, Jr).  These 2016 events represent only two of more than 100 appearances made by members of the Lacks family in the last few years, providing much needed opportunities for students and investigators to interact with family members affected by research to better understand and appreciate bioethical and privacy concerns and the need for transparent and truly informed consent.

These events also represent a service provided by the Lacks family compensated by speaker fees.  Many have expressed support in the Lacks family’s ability to earn an income somehow associated with the HeLa cells.  Skloot’s book and countless other articles have commented on the cruel irony of the commercial success of the HeLa cells in juxtaposition of the family’s poor financial state and health.  But, again, to be fair, Henrietta’s consent then or even now would not have translated into guaranteed financial gain.  As opposed to commercial laboratories, academic biorepositories like the one established at Johns Hopkins University rely on altruism and the donation of samples and data for research purposes.

The financial injustice angle has had a recent resurgence with anticipation of HBO’s adaptation of the book scheduled for release on April 22, 2017.  Starring and produced by none other than Oprah Winfrey, the movie was made with Skloot as executive co-producer and several of the Lacks family members as (paid) consultants.  As the release date for the movie neared, accusations from disgruntled Lacks family members started to fly.  Earlier this year, two Lacks family members declared a renewed interest in legally seeking compensation for the use of the HeLa cells by Johns Hopkins.  Among other accusations, these Lacks family members also reject the NIH agreement made by the other family members implying it is not valid without their participation.  After the HBO trailer dropped for the movie, the same two family members complained that they disagree with their family’s portrayal, and that they were not paid consultants for the movie (although they were offered and refused).  Oprah and NIH are not having it.

Perhaps the saddest detail of this family feud is the accusation that two of the great-granddaughters of Henrietta Lacks are in fact not biological great-granddaughters.  One of them, Veronica (Robinson) Spencer, spoke to us here in Cleveland.  The accusations are plausible, and Dr. Gonçalo Abecasis at the University of Michigan is working to reconstruct the family pedigree based on DNA from Henrietta Lacks’s cells and her living relatives who consented to providing a DNA sample.  Although more data are needed to confirm or refute the accusations, it is almost certain the damage has been done to this already fractured family with no relief in sight for the foreseeable future.

Welcome to the Symposium Blog!

We have now organized two symposia focusing on big data and precision medicine topics with a third one in the works focusing on measuring exposures.  In an effort to sustain the dialog initiated by each of these events, we are hosting a blog on this website.  This blog will feature short opinion pieces and articles featuring past symposia speakers and their current research efforts related to the original topics discussed in the symposia.  This space will also feature articles on diversity in research participants and the workforce, both major themes of all our symposia.

Although the majority of articles will likely be posted by yours truly, we intend to invite many voices to contribute to this blog.  Please check back regularly to see if your favorite speaker or topic is being featured!

Dr. Marylyn Ritchie and DiscovEHR

In biomedical research, “precision medicine” is the buzz-word or phrase du jour permeating the latest conference abstracts, manuscripts, and grant proposals.  Precision medicine is broadly defined as using a data-driven approach to offer tailored treatments or prevention strategies to patients.  The recent popularity of the precision medicine term is likely due to the 2015 White House launch of the Precision Medicine Initiative (PMI).  The PMI allocated federal funds to several agencies including the National Institutes of Health (NIH), National Cancer Institute (NCI), Food and Drug Administration (FDA), and Office of the National Coordinator for Health Information Technology (ONC) to establish infrastructure and research programs to accelerate the availability and delivery of tailored medical treatment to patients.

Precision medicine research, of course, did not develop overnight.  Several groups, in fact, have been laying the foundation for local precision medicine implementation efforts.  In our first Institute for Computational Biology Symposium in 2015, we heard about these efforts at Geisinger Health System from Dr. Marylyn Ritchie, then Paul Berg Professor of Biochemistry & Molecular Biology at the Pennsylvania State University and Director of Biomedical and Translational Informatics at Geisinger Health System.  Dr. Ritchie described Geisinger’s MyCode Community Health Initiative, a biobank of biospecimens from consented patients seen by providers in the Geisinger integrated health system serving central and northern Pennsylvania (PMID:26866580).  These biospecimens are linked to the patient’s electronic health records (EHRs), and these linked data can be accessed for research purposes.  As of September 2015, ~30,000 patient DNAs had genome-wide genotype data, which contributed to multiple genome-wide association studies for common clinical conditions ranging from cataracts (PMID:25982363) to resistant hypertension (PMID:28222112).

Figure. Dr. Marylyn Ritchie presenting DiscovEHR at ASHG 2016 in Vancouver, Canada.

Dr. Ritchie also reported at the time that ~50,000 patient DNAs had whole-exome sequence (WES) data.  The sequencing efforts actually represent a collaboration between Geisinger Health System and Regeneron Genetics Center, a wholly owned subsidiary of Regeneron Pharmaceuticals.  This collaboration, known as DiscovEHR, began in 2014 with MyCode participants consented for broad genomic research, re-contact, and return of clinically actionable results.  Since her Cleveland presentation in 2015, Dr. Ritchie announced the availability of DiscovEHR for research at the 2016 American Society of Human Genetics (ASHG) in Vancouver, Canada (Figure).  Soon after ASHG, the first set of clinically-relevant analyses in DiscovEHR were published in companion Science articles (PMID:28008009 and PMID:28008010) describing exome-based discovery efforts for HDL-C, LDL-C, triglycerides, and cholesterol levels; the frequency of potential loss of function (pLoF) rare variants; and the frequency of actionable or clinically returnable genetic variants, the latter of which included the frequency of mutations that cause familial hypercholesterolemia.

On the surface, these reports may seem nothing more than standard genome-wide association studies and counting variants in a large population.  Both scale and depth of data and set this resource apart from others.  In comparison to DiscovEHR’s samples size of ~50,000, there are larger DNA sample collections including EHR-linked biobanks such as the Veteran Administrations Million Veterans Program (~400,000; PMID:26441289), Vanderbilt University Medical Center’s BioVU (~225,000), Kaiser Permanente’s Genetic Epidemiology Research on Adult Health and Aging (~110,000; PMID:26092718); epidemiologic cohorts such as the UK Biobank (~500,000); and commercial juggernauts 23andMe (~1 million) and (> 1 million).  While these biobanks dwarf DiscovEHR in the number of DNA samples collected, they are limited in the number of samples with genomic data (BioVU), and they are all primarily limited to genome-wide genotype data.  Furthermore, the commercial biobanks are currently limited to self-reported health and lifestyle data.  The exquisitely phenotyped UK Biobank allows for exome sequencing but is only doing so on a project-by-project basis, and the Million Veteran Program has consented participants for both whole-exome and whole-genome sequencing, but it is not clear when those data will be generated.

As of today, DiscovEHR is the only game in town that offers sequence-level data linked to clinically-collected health outcomes for precision medicine research.  This resource cannot be underscored enough.  The one-two punch of sequencing data and EHR data in DiscovEHR provides a much needed catalog of the phenotypic consequences of DNA changes at the individual as well as population level.  Data from electronic health records are far from perfect, but their depth and longitudinal potential given a relatively stable patient population allow investigators such as Dr. Ritchie to ask the seemingly simplest of questions that have remained inadequately answered due to lack of data.  The initial DiscovEHR analyses demonstrate potential in identifying drug targets via patients homozygous for pLoF rare variants (PMID:26933753) and in characterizing penetrance of rare variants in disease-associated genes.  Near-term data mining will likely extend these analysis aimed to sort clinically actionable variants from variants of unknown significance.  Longer-term data mining will likely include further genomic discovery studies with an emphasis on potential drug targets, pharmacogenomics, and the consequences of pleiotropy.

Perhaps one major deficiency of DiscovEHR is the lack of diversity in study population with respect to race/ethnicity.  Approximately 98% of the study participants are European American consistent with the demographics of the ascertainment sites (93% European American).  It is interesting to note that 3% of Geisinger’s patient population is African American while only 1% of DiscovEHR is African American.  Regeneron recently partnered with Mount Sinai in New York to WES ~33,000 consented from BioMe, an EHR-linked biobank known for its diverse study population.  We aim to keep up with Dr. Ritchie and her colleagues as they further mine DiscovEHR in parallel with other WES-EHR efforts in more diverse settings.  These large-scale WES-EHR efforts and their anticipated discoveries will undoubtedly dominate and influence the discussion and direction of precision medicine for at least the next year.

Symposium Blog Testing

Hi Dana!