Data Management in the Cloud Conference

Today, IT professionals are challenged with finding solutions to manage big data being generated at research labs, pharmaceutical companies and medical centers. In order to do this, one must have the compute power, storage solutions, and analytic capability to make the data clinically actionable. Data from disparate sources including omics (genomics, proteomics, metabolomics, etc.), imaging, and sensors must be integrated. Cambridge Healthtech Institute's 2nd Annual Data Management in the Cloud program will bring together key leaders in the fields of cloud architecture and data management to share case studies and to discuss the challenges and solutions they face in their centers. Overall, this event will offer practical solutions for network engineers, data architects, software engineers, etc. to build data ecosystems which enable the goal of personalized medicine.

Final Agenda

Day 1 | Day 2 | Day 3 | Download Brochure

Arrive Early for:

SUNDAY, MARCH 10, 2:00 - 5:00 PM (AFTERNOON SHORT COURSES)

SC8: Data-Driven Process Development in the Clinical Laboratory - Detailed Agenda

SUNDAY, MARCH 10, 5:30 - 8:30 PM (DINNER SHORT COURSES)

SC12: Clinical Informatics: Returning Results from Big Data - Detailed Agenda

MONDAY, MARCH 11, 8:00 - 11:00 AM (MORNING SHORT COURSES)

SC23: Best Practices in Personalized and Translational Medicine

Monday, March 11

10:30 am Conference Program Registration Open (South Lobby)

11:50 Chairperson’s Opening Remarks

Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine

12:00 pm Explore the Genomic and Neoepitope Landscape of Pediatric Cancers on the Cloud

Zhang_Jinghui Jinghui Zhang, PhD, Chair, Member, Computational Biology, St. Jude Children’s Research Hospital

We will present the driver genes identified from a pan-cancer analysis of 1,699 pediatric cancers and neoepitopes identified from integrative analysis of whole-genome and RNA-seq.

12:30 The GenePattern Notebook Environment for Open Science and Reproducible Bioinformatics Research

Reich_Michael Michael Reich, Assistant Director, Bioinformatics, Department of Medicine, University of California San Diego

Interactive analysis notebook environments promise to streamline genomics research through interleaving text, multimedia, and executable code into unified, sharable, reproducible ‘‘research narratives.’’ However, current notebook systems require programming knowledge, limiting their wider adoption by the research community. We have developed the GenePattern Notebook environment (http://www.genepattern-notebook.org), to our knowledge the first system to integrate the dynamic capabilities of notebook systems with an investigator-focused, easy-to-use interface that provides access to hundreds of genomic tools without the need to write code.

1:00 Enjoy Lunch on Your Own

2:30 Chairperson’s Remarks

Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine

2:40 KEYNOTE PRESENTATION: Making STRIDES to Accelerate Discovery: The National Institutes of Health, Data Science, and the Cloud

Andrea Norris Andrea T. Norris, Director, NIH’s Center for Information Technology, CIO, NIH, Department of Health and Human Services, National Institutes of Health (NIH)

In 2018, NIH released its first-ever Strategic Plan for Data Science, providing a roadmap for modernizing the NIH-funded biomedical data science ecosystem. The Plan addresses high-priority goals to support a more efficient and effective biomedical research infrastructure that promotes better ways to find, access, and use data and analytical resources. To bolster these efforts, NIH launched the STRIDES Initiative to harness the power of commercial cloud computing and provide NIH biomedical researchers access to the most advanced, cost-effective computational infrastructure, tools, and services available.

The STRIDES Initiative, which launched with Google Cloud as its first industry partner and Amazon Web Services as its second, aims to reduce economic and technological barriers to accessing and computing on large biomedical data sets to accelerate biomedical advances.

3:40 UC Health Data Warehouse (UCHDW): An Azure Cloud Migration Case Study

Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine

The University of California Health System has built a secure data warehouse (UCHDW) for operational improvement, promotion of quality patient care, and clinical research. The repository currently holds EHR data on 5 million patients from six UC medical centers, treated by 100,000 clinicians. To support secure, cross-institutional access to this data and analytics platform, a multiphase project is underway to move UCHDW into a HIPAA-compliant Azure cloud.

4:10 Big Data Networking to Accelerate Scientific Discovery

Dan Taylor, Director, Internet2

Precision medicine and R&D breakthroughs will be increasingly driven by a global ecosystem enableing collaboration and access to cloud and compute resources. This session will discuss Internet2’s next-gen network, federated identity management and community resources available to life sciences organizations.

4:40 Refreshment Break and Transition to Plenary Session

8:00 Plenary Keynote Session (Room Location: 3 & 7)

6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing

7:30 Close of Day

Day 1 | Day 2 | Day 3 | Download Brochure

Tuesday, March 12

7:30 am Registration Open and Morning Coffee (South Lobby)

8:00 Plenary Keynote Session (Room Location: 3 & 7)

9:15 Refreshment Break in the Exhibit Hall with Poster Viewing

10:15 Chairperson’s Remarks

Ian Fore, PhD, Senior Biomedical Informatics Program Manager, Center for Biomedical Informatics and Information Technology, National Cancer Institute

10:25 FEATURED PRESENTATION: A Data Commons Framework for Data Management

Robert Grossman, PhD, Frederick H. Rawson Professor, Professor of Medicine and Computer Science, Jim and Karen Frank Director, Center for Data Intensive Science (CDIS), Co-Chief, Section of Computational Biomedicine and Biomedical Data Science, Deptartment of Medicine, University of Chicago

We describe how data commons and data ecosystems can be built using the Data Commons Framework Services (DCFS) and how the DCFS support the management of data objects, such as BAM files, CRAM files and images, and structured data, such as clinical data. By a data ecosystem we mean an interoperable collection of data commons, data and computing resources, and a set of applications that can access these through a well defined set of APIs. We also describe how the DCFS can support applications that access and integrate data from two or more data commons, and some of the issues that arise when accessing data in this way from data commons with different underlying data models.

10:55 Building an Internet of Genomics

Fiume_Marc Marc Fiume, PhD, Co-Lead, Discovery Work Stream, Global Alliance for Genomics and Health; Co-Founder, CEO, DNAstack; Co-Founder, Canadian Genomics Cloud

The Global Alliance for Genomics & Health (GA4GH) defines technical, ethical, security, and regulatory standards for sharing genomics data. This talk will describe ongoing efforts by the GA4GH Discovery Work Stream to build standards for powering global, distributed, realtime search applications to make shared genomics data more broadly findable, accessible, and useful.

11:25 Storage and Use of dbGap Data in the Cloud

Feolo_Mike Michael Feolo, Staff Scientist, dbGaP Team Lead, National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH)

The National Center for Biotechnology Information (NCBI) database of Genotypes and Phenotypes (dbGaP) is an NIH-sponsored archive charged to store information produced by genome-scale studies. The next-generation sequence data deposited to dbGaP are processed and distributed by NCBI’s Sequence Read Archive (SRA). This presentation will describe how NCBI and MITRE have implemented access to large genomic datasets provisioned on the cloud, via dbGaP approval, thereby eliminating the need for download.

11:55 Enjoy Lunch on Your Own

1:35 Refreshment Break in the Exhibit Hall with Poster Viewing

2:05 Chairperson’s Remarks

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

2:10 Cloud Transformation 2.0: Embracing the Multi-Cloud Future

Dwan_Chris Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

Cloud technologies are mature and have achieved broad adoption. While this has brought many benefits, it also means that organizations must deal with legacy and migration challenges around their aging decade-old cloud systems. The diversity of solutions in the marketplace mean that cross-cloud interoperability, data locality, and functional “skew” between clouds can be a significant challenge. This talk will share practical experience and success strategies for managing through this second decade of the cloud.

2:40 Overcoming Internal Hurdles to Cloud Adoption

Cashorali_Tanya Tanya Cashorali, CEO, Founder, TCB Analytics

With security, privacy, and performance concerns, many organizations in healthcare and life sciences are hesitant to rollout a cloud-based data and analytics environment. In this session, we’ll review common negative perceptions of the cloud, along with implementation strategies that help mitigate these concerns. We’ll also cover examples of healthcare and pharmaceutical companies that successfully moved to the cloud, and how they navigated pushback from IT and the business.

3:10 Genomics Analysis Powered by the Cloud

Munshi_Ruchi Ruchi Munshi, Product Manager, Data Sciences Platform, The Broad Institute

For years, computational biologists have used on-prem infrastructure for all their analytical needs. However, as the amount of genetic data grows, genomics analysis quickly becomes constrained by compute resources available. Today, cloud platforms provide researchers access to so much compute that the next problem is learning how to use those resources effectively. Let’s talk about various tools that leverage cloud resources to power analysis of genetic data.

3:40 Extended Q&A with Session Speakers

4:10 St. Patrick’s Day Celebration in the Exhibit Hall with Poster Viewing

5:00 Breakout Discussions in the Exhibit Hall

6:00 Close of Day

Day 1 | Day 2 | Day 3 | Download Brochure

Wednesday, March 13

7:30 am Registration Open and Morning Coffee (South Lobby)

8:00 Plenary Keynote Session (Room Location: 3 & 7)

10:00 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall

Moderator: Matthew Trunnell, Vice President, Chief Data Officer, Fred Hutchinson Cancer Research Center

10:50 Open and Distributed Approaches to Biomedical Research

Kellen_Mike Michael Kellen, PhD, CTO, Sage Bionetworks

Today’s biomedical researchers are increasingly challenged to integrate diverse, complex datasets and analysis methods into their work. Sage Bionetworks develops open tools that support distributed, data-driven science driven, and tests their deployment in a variety of research contexts. These experiences informed development of Synapse, a cloud-native informatics platform that serves as a data repository for dozens of multi-institutional research consortia working with large-scale genomics, bioimaging, clinical, and mobile health datasets.

11:00 The Data Commons/Data STAGE Initiatives

Ahalt_Stanley Stanley Ahalt, PhD, Director, Renaissance Computing Institute; Professor, Department of Computer Science, University of North Carolina, Chapel Hill

This talk describes the NIH Data Commons and NHLBI Data STAGE initiatives. The Data Commons aims to establish a shared, universal virtual space where scientists can work with the digital objects of biomedical research, including data and analytical tools. A closely related project, Data STAGE, aims to use the Data Commons to drive discovery using diagnostic tools, therapeutic options, and prevention strategies to treat heart, lung, blood, and sleep disorders.

11:10 Innovation through Collaboration: New Data-Driven Research Paradigms Being Developed by the Pediatric and Rare Disease Communities

Resnick_Adam Adam C. Resnick PhD, Director, Center for Data Driven Discovery in Biomedicine (D3b); Director, Neurosurgical Translational Research, Division of Neurosurgery; Director, Scientific Chair, Children’s Brain Tumor Tissue Consortium in Neurosurgery (CBTTC); Scientific Chair, Pediatric Neuro-Oncology Consortium (PNOC); Alexander B. Wheeler Endowed Chair in Neurosurgical Research, The Children’s Hospital of Philadelphia

11:20 Building Trust in Large Biomedical Data Networks

Ohno-Machado_Lucia Lucila Ohno-Machado, MD, PhD, Associate Dean, Informatics and Technology, University of California, San Diego Health

11:30 PANEL DISCUSSION: Definitions, Challenges and Innovations of Data Commons

Moderator:
Trunnell_Matthew Matthew Trunnell, Vice President, Chief Data Officer, Fred Hutchinson Cancer Research Center

Panelists:

Ahalt_Stanley Stanley Ahalt, PhD, Director, Renaissance Computing Institute; Professor, Department of Computer Science, University of North Carolina, Chapel Hill

Kellen_Mike Michael Kellen, PhD, CTO, Sage Bionetworks

Ohno-Machado_Lucia Lucila Ohno-Machado, MD, PhD, Associate Dean, Informatics and Technology, University of California, San Diego Health

What is a data commons and what are the common challenges in building and maintaining data commons?
Why should you organize your data into a commons?
NIH Data Commons Pilot Phase updates and future directions
The role of data commons in promoting open access and open science
Technology innovations

12:30 pm Enjoy Lunch on Your Own

1:10 Refreshment Break in the Exhibit Hall and Last Chance for Poster Viewing

1:50 Chairperson’s Remarks

Michael Feolo, Staff Scientist, dbGaP Team Lead, National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH)

2:00 FEATURED PRESENTATION: "Data Wars" What R&D Organizations Need to Do In Order to Survive The Near Future

John F. Conway, Global Head of R&D&C IT, Science and Enabling Units IT, AstraZeneca

R&D organizations, from startup to mature need to quickly transform a culture around Data, Information, and Knowledge as an Asset and Emulate a Data company. R&D organizations need improved stringency from data capture to contextualization to reuse. The FAIR principles are criteria to measure success in the journey but it starts with a written scientific data strategy that outlines the what, the who and the how from a change management and cadence perspective. Simply put we have to stop treating our data like trash but instead as another form of currency that has immense value.

2:30 Building an Enterprise Data Lake that is FAIR

Irene Pak, Lead R&D Data Architect, Bristol-Myers Squibb

As with many companies, Bristol-Myers Squibb has embarked on its journey to implement an enterprise data lake as one of the means to reach data nirvana, a state where human and machine can effectively mine our disparate digital data assets and turn them into business insights that will ultimately help our patients. The FAIR data principles play an important role in our undertaking by providing a framework to make our data findable, accessible, interoperable and reusable. In this presentation, I will share some of our learnings in the pursuit of FAIRness for our complex data ecosystem.

3:00 How the pRED Data Commons Facilitates Integration of –omics Data

Kuentzer_Jan Jan Kuentzer, Principal Scientist, Data Science, Data Science pRED Informatics, Roche Innovation Center Munich, Roche Diagnostics GmbH

Omics data increasingly influences clinical decision-making. Well-designed and highly integrated informatics platforms become essential for supporting structured data capturing, integration and analytics to enable effective drug development. This talk presents principles and key learnings in designing such a platform, and contrast our current approach to previous approaches in biomedical informatics. Finally, I will provide insights into the implementation of such a platform at Roche.

3:30 Session Break

3:40 Chairperson’s Remarks

Funda Meric-Bernstam, MD, Chair, Department of Investigational Cancer Therapeutics, MD Anderson Cancer Center

3:45 Precision Oncology Decision Support

Meric-Bernstam_Funda Funda Meric-Bernstam, MD, Chair, Department of Investigational Cancer Therapeutics, MD Anderson Cancer Center

Molecular profiling is increasingly utilized in the management of cancer patients. Decision support for precision oncology includes guidance of optimal testing, interpretation of test results including interpretation of functional impact of genomic alterations and therapeutic implications. We will review strategies for decision support and resources for identifying optimal approved or investigational therapies.

4:15 High-Performance Integrated Virtual Environment (HIVE) and BioCompute Objects for Regulatory Sciences

Mazumder_Raja Raja Mazumder, PhD, Associate Professor, Biochemistry and Molecular Medicine Georgetown Washington University

Advances in sequencing technologies combined with extensive systems level -omics analysis have contributed to a wealth of data which requires sophisticated bioinformatic analysis pipelines. Accurate communication describing these pipelines is critical for knowledge and information transfer. In my talk, I will provide an overview of how we have been engaging with the scientific community to develop BioCompute specifications to build a framework to standardize bioinformatics computations and analyses communication with US FDA. I will also describe how BioCompute Objects (https://osf.io/h59uh/) can be created using the High-performance Integrated Virtual Environment (HIVE) and other bioinformatics platforms.

4:45 Integrating Genomic and Immunologic Data to Accelerate Translational Discovery at the Parker Institute for Cancer Immunotherapy

Wells_Danny Danny Wells, PhD, Scientist, Informatics, Parker Institute for Cancer Immunotherapy

Immunotherapy is rapidly changing how we treat both solid and hematologic malignancies, and combinations of these therapies are quickly becoming the norm. For any given treatment strategy only a subset of patients will respond, and an emerging challenge is how to effectively identify the right treatment strategy for each patient. This challenge is compounded by a concomitant explosion in the amount of data collected from each patient, from high dimensional single cell measurements to whole exome tumor sequencing. In this talk I will discuss translational research at the Parker Institute, and how we are integrating multiple molecular and clinical data types to characterize the tumor-immune phenotype of each patient.

5:15 Close of Conference Program

Stay Late for:

MARCH 14-15

S10: Data Science, Precision Medicine and Machine Learning – Detailed Agenda

Day 1 | Day 2 | Day 3 | Download Brochure