Today, IT professionals are challenged with finding solutions to manage big data being generated at research labs, pharmaceutical companies and medical centers. In order to do this, one must have the compute power, storage solutions, and analytic capability
to make the data clinically actionable. Data from disparate sources including omics (genomics, proteomics, metabolomics, etc.), imaging, and sensors must be integrated. Cambridge Healthtech Institute's 2nd Annual Data Management in the Cloud program
will bring together key leaders in the fields of cloud architecture and data management to share case studies and to discuss the challenges and solutions they face in their centers. Overall, this event will offer practical solutions for network engineers,
data architects, software engineers, etc. to build data ecosystems which enable the goal of personalized medicine.
Final Agenda
Day 1 | Day 2 | Day 3 | Download Brochure
Arrive Early for:
SUNDAY, MARCH 10, 2:00 - 5:00 PM (AFTERNOON SHORT COURSES)
SC8: Data-Driven Process Development in the Clinical Laboratory - Detailed Agenda
SUNDAY, MARCH 10, 5:30 - 8:30 PM (DINNER SHORT COURSES)
SC12: Clinical Informatics: Returning Results from Big Data - Detailed Agenda
MONDAY, MARCH 11, 8:00 - 11:00 AM (MORNING SHORT COURSES)
SC23: Best Practices in Personalized and Translational Medicine
Monday, March 11
10:30 am Conference Program Registration Open (South Lobby)
11:50 Chairperson’s Opening Remarks
Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine
12:00 pm Explore the Genomic and Neoepitope Landscape of Pediatric Cancers on the Cloud
Jinghui Zhang, PhD, Chair, Member,
Computational Biology, St. Jude Children’s Research Hospital
We will present the driver genes identified from a pan-cancer analysis of 1,699 pediatric cancers and neoepitopes identified from integrative analysis of whole-genome and RNA-seq.
12:30 The GenePattern Notebook Environment for Open Science and Reproducible Bioinformatics Research
Michael Reich, Assistant Director,
Bioinformatics, Department of Medicine, University of California San Diego
Interactive analysis notebook environments promise to streamline genomics research through interleaving text, multimedia, and executable code into unified, sharable, reproducible ‘‘research narratives.’’ However, current notebook
systems require programming knowledge, limiting their wider adoption by the research community. We have developed the GenePattern Notebook environment (http://www.genepattern-notebook.org), to our knowledge the first system to integrate the dynamic
capabilities of notebook systems with an investigator-focused, easy-to-use interface that provides access to hundreds of genomic tools without the need to write code.
1:00 Enjoy Lunch on Your Own
2:30 Chairperson’s Remarks
Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine
2:40 KEYNOTE PRESENTATION: Making STRIDES to Accelerate Discovery: The National Institutes of Health, Data Science, and the Cloud
Andrea T. Norris, Director, NIH’s Center for Information Technology, CIO, NIH, Department of Health
and Human Services, National Institutes of Health (NIH)
In 2018, NIH released its first-ever Strategic Plan for Data Science, providing a roadmap for modernizing the NIH-funded biomedical data science ecosystem. The Plan addresses high-priority goals to support a more efficient and effective biomedical research
infrastructure that promotes better ways to find, access, and use data and analytical resources. To bolster these efforts, NIH launched the STRIDES Initiative to harness the power of commercial cloud computing and provide NIH biomedical researchers
access to the most advanced, cost-effective computational infrastructure, tools, and services available.
The STRIDES Initiative, which launched with Google Cloud as its first industry partner and Amazon Web Services as its second, aims to reduce economic and technological barriers to accessing and computing on large biomedical data sets to accelerate biomedical
advances.
3:40 UC Health Data Warehouse (UCHDW): An Azure Cloud Migration Case Study
Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine
The University of California Health System has built a secure data warehouse (UCHDW) for operational improvement, promotion of quality patient care, and clinical research. The repository currently holds EHR data on 5 million patients from six UC medical
centers, treated by 100,000 clinicians. To support secure, cross-institutional access to this data and analytics platform, a multiphase project is underway to move UCHDW into a HIPAA-compliant Azure cloud.
4:10 Big Data Networking to Accelerate Scientific Discovery
Dan Taylor, Director, Internet2
Precision medicine and R&D breakthroughs will be increasingly driven by a global ecosystem enableing collaboration and access to cloud and compute resources. This session will discuss Internet2’s next-gen network, federated identity management
and community resources available to life sciences organizations.
4:40 Refreshment Break and Transition to Plenary Session
8:00 Plenary Keynote Session (Room Location: 3 & 7)
6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing
7:30 Close of Day
Day 1 | Day 2 | Day 3 | Download Brochure
Tuesday, March 12
7:30 am Registration Open and Morning Coffee (South Lobby)
8:00 Plenary Keynote Session (Room Location: 3 & 7)
9:15 Refreshment Break in the Exhibit Hall with Poster Viewing
10:15 Chairperson’s Remarks
Ian Fore, PhD, Senior Biomedical Informatics Program Manager, Center for Biomedical Informatics and Information Technology, National Cancer Institute
10:25 FEATURED PRESENTATION: A Data Commons Framework for Data Management
Robert Grossman, PhD, Frederick H. Rawson Professor, Professor of Medicine and Computer Science,
Jim and Karen Frank Director, Center for Data Intensive Science (CDIS), Co-Chief, Section of Computational Biomedicine and Biomedical Data Science, Deptartment of Medicine, University of Chicago
We describe how data commons and data ecosystems can be built using the Data Commons Framework Services (DCFS) and how the DCFS support the management of data objects, such as BAM files, CRAM files and images, and structured data, such as clinical data.
By a data ecosystem we mean an interoperable collection of data commons, data and computing resources, and a set of applications that can access these through a well defined set of APIs. We also describe how the DCFS can support applications that
access and integrate data from two or more data commons, and some of the issues that arise when accessing data in this way from data commons with different underlying data models.
10:55 Building an Internet of Genomics
Marc Fiume, PhD, Co-Lead, Discovery Work
Stream, Global Alliance for Genomics and Health; Co-Founder, CEO, DNAstack; Co-Founder, Canadian Genomics Cloud
The Global Alliance for Genomics & Health (GA4GH) defines technical, ethical, security, and regulatory standards for sharing genomics data. This talk will describe ongoing efforts by the GA4GH Discovery Work Stream to build standards for powering
global, distributed, realtime search applications to make shared genomics data more broadly findable, accessible, and useful.
11:25 Storage and Use of dbGap Data in the Cloud
Michael Feolo, Staff Scientist, dbGaP Team
Lead, National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH)
The National Center for Biotechnology Information (NCBI) database of Genotypes and Phenotypes (dbGaP) is an NIH-sponsored archive charged to store information produced by genome-scale studies. The next-generation sequence data deposited to dbGaP are processed
and distributed by NCBI’s Sequence Read Archive (SRA). This presentation will describe how NCBI and MITRE have implemented access to large genomic datasets provisioned on the cloud, via dbGaP approval, thereby eliminating the need for download.
11:55 Enjoy Lunch on Your Own
1:35 Refreshment Break in the Exhibit Hall with Poster Viewing
2:05 Chairperson’s Remarks
Chris Dwan, Senior Technologist and Independent Life Sciences Consultant
2:10 Cloud Transformation 2.0: Embracing the Multi-Cloud Future
Chris Dwan, Senior Technologist and
Independent Life Sciences Consultant
Cloud technologies are mature and have achieved broad adoption. While this has brought many benefits, it also means that organizations must deal with legacy and migration challenges around their aging decade-old cloud systems. The diversity of solutions
in the marketplace mean that cross-cloud interoperability, data locality, and functional “skew” between clouds can be a significant challenge. This talk will share practical experience and success strategies for managing through this
second decade of the cloud.
2:40 Overcoming Internal Hurdles to Cloud Adoption
Tanya Cashorali, CEO, Founder, TCB Analytics
With security, privacy, and performance concerns, many organizations in healthcare and life sciences are hesitant to rollout a cloud-based data and analytics environment. In this session, we’ll review common negative perceptions of the cloud,
along with implementation strategies that help mitigate these concerns. We’ll also cover examples of healthcare and pharmaceutical companies that successfully moved to the cloud, and how they navigated pushback from IT and the business.
3:10 Genomics Analysis Powered by the Cloud
Ruchi Munshi, Product Manager,
Data Sciences Platform, The Broad Institute
For years, computational biologists have used on-prem infrastructure for all their analytical needs. However, as the amount of genetic data grows, genomics analysis quickly becomes constrained by compute resources available. Today, cloud platforms
provide researchers access to so much compute that the next problem is learning how to use those resources effectively. Let’s talk about various tools that leverage cloud resources to power analysis of genetic data.
3:40 Extended Q&A with Session Speakers
4:10 St. Patrick’s Day Celebration in the Exhibit Hall with Poster Viewing
5:00 Breakout Discussions in the Exhibit Hall
6:00 Close of Day
Day 1 | Day 2 | Day 3 | Download Brochure
Wednesday, March 13
7:30 am Registration Open and Morning Coffee (South Lobby)
8:00 Plenary Keynote Session (Room Location: 3 & 7)
10:00 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall
Moderator: Matthew Trunnell, Vice President, Chief Data Officer, Fred Hutchinson Cancer Research Center
10:50 Open and Distributed Approaches to Biomedical Research
Michael Kellen, PhD, CTO, Sage
Bionetworks
Today’s biomedical researchers are increasingly challenged to integrate diverse, complex datasets and analysis methods into their work. Sage Bionetworks develops open tools that support distributed, data-driven science driven, and tests
their deployment in a variety of research contexts. These experiences informed development of Synapse, a cloud-native informatics platform that serves as a data repository for dozens of multi-institutional research consortia working with large-scale
genomics, bioimaging, clinical, and mobile health datasets.
11:00 The Data Commons/Data STAGE Initiatives
Stanley Ahalt, PhD, Director,
Renaissance Computing Institute; Professor, Department of Computer Science, University of North Carolina, Chapel Hill
This talk describes the NIH Data Commons and NHLBI Data STAGE initiatives. The Data Commons aims to establish a shared, universal virtual space where scientists can work with the digital objects of biomedical research, including data and analytical
tools. A closely related project, Data STAGE, aims to use the Data Commons to drive discovery using diagnostic tools, therapeutic options, and prevention strategies to treat heart, lung, blood, and sleep disorders.
11:10 Innovation through Collaboration: New Data-Driven Research Paradigms Being Developed by the Pediatric and Rare Disease Communities
Adam C. Resnick PhD, Director,
Center for Data Driven Discovery in Biomedicine (D3b); Director, Neurosurgical Translational Research, Division of Neurosurgery; Director, Scientific Chair, Children’s Brain Tumor Tissue Consortium in Neurosurgery (CBTTC); Scientific
Chair, Pediatric Neuro-Oncology Consortium (PNOC); Alexander B. Wheeler Endowed Chair in Neurosurgical Research, The Children’s Hospital of Philadelphia
11:20 Building Trust in Large Biomedical Data Networks
Lucila Ohno-Machado,
MD, PhD, Associate Dean, Informatics and Technology, University of California, San Diego Health
11:30 PANEL DISCUSSION: Definitions, Challenges and Innovations of Data Commons
Moderator:
Matthew Trunnell, Vice President, Chief Data Officer, Fred Hutchinson Cancer Research Center
Panelists:
Stanley Ahalt, PhD, Director,
Renaissance Computing Institute; Professor, Department of Computer Science, University of North Carolina, Chapel Hill
Adam C. Resnick PhD, Director,
Center for Data Driven Discovery in Biomedicine (D3b); Director, Neurosurgical Translational Research, Division of Neurosurgery; Director, Scientific Chair, Children’s Brain Tumor Tissue Consortium in Neurosurgery (CBTTC); Scientific
Chair, Pediatric Neuro-Oncology Consortium (PNOC); Alexander B. Wheeler Endowed Chair in Neurosurgical Research, The Children’s Hospital of Philadelphia
Michael Kellen, PhD, CTO, Sage
Bionetworks
Lucila Ohno-Machado,
MD, PhD, Associate Dean, Informatics and Technology, University of California, San Diego Health
- What is a data commons and what are the common challenges in building and maintaining data commons?
- Why should you organize your data into a commons?
- NIH Data Commons Pilot Phase updates and future directions
- The role of data commons in promoting open access and open science
- Technology innovations
12:30 pm Enjoy Lunch on Your Own
1:10 Refreshment Break in the Exhibit Hall and Last Chance for Poster Viewing
1:50 Chairperson’s Remarks
Michael Feolo, Staff Scientist, dbGaP Team Lead, National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH)
2:00 FEATURED PRESENTATION: "Data Wars" What R&D Organizations Need to Do In Order to Survive The Near Future
John F. Conway, Global Head of R&D&C IT, Science and Enabling Units IT, AstraZeneca
R&D organizations, from startup to mature need to quickly transform a culture around Data, Information, and Knowledge as an Asset and Emulate a Data company. R&D organizations need improved stringency from data capture to contextualization
to reuse. The FAIR principles are criteria to measure success in the journey but it starts with a written scientific data strategy that outlines the what, the who and the how from a change management and cadence perspective. Simply
put we have to stop treating our data like trash but instead as another form of currency that has immense value.
2:30 Building an Enterprise Data Lake that is FAIR
Irene Pak, Lead R&D Data Architect, Bristol-Myers Squibb
As with many companies, Bristol-Myers Squibb has embarked on its journey to implement an enterprise data lake as one of the means to reach data nirvana, a state where human and machine can effectively mine our disparate digital data assets and
turn them into business insights that will ultimately help our patients. The FAIR data principles play an important role in our undertaking by providing a framework to make our data findable, accessible, interoperable and reusable.
In this presentation, I will share some of our learnings in the pursuit of FAIRness for our complex data ecosystem.
3:00 How the pRED Data Commons Facilitates Integration of –omics Data
Jan Kuentzer, Principal Scientist,
Data Science, Data Science pRED Informatics, Roche Innovation Center Munich, Roche Diagnostics GmbH
Omics data increasingly influences clinical decision-making. Well-designed and highly integrated informatics platforms become essential for supporting structured data capturing, integration and analytics to enable effective drug development. This
talk presents principles and key learnings in designing such a platform, and contrast our current approach to previous approaches in biomedical informatics. Finally, I will provide insights into the implementation of such a platform at Roche.
3:30 Session Break
3:40 Chairperson’s Remarks
Funda Meric-Bernstam, MD, Chair, Department of Investigational Cancer Therapeutics, MD Anderson Cancer Center
3:45 Precision Oncology Decision Support
Funda Meric-Bernstam, MD, Chair, Department of Investigational Cancer Therapeutics, MD Anderson Cancer Center
Molecular profiling is increasingly utilized in the management of cancer patients. Decision support for precision oncology includes guidance of optimal testing, interpretation of test results including interpretation of functional impact of
genomic alterations and therapeutic implications. We will review strategies for decision support and resources for identifying optimal approved or investigational therapies.
4:15 High-Performance Integrated Virtual Environment (HIVE) and BioCompute Objects for Regulatory Sciences
Raja Mazumder, PhD, Associate Professor, Biochemistry and Molecular Medicine Georgetown Washington University
Advances in sequencing technologies combined with extensive systems level -omics analysis have contributed to a wealth of data which requires sophisticated bioinformatic analysis pipelines. Accurate communication describing these pipelines
is critical for knowledge and information transfer. In my talk, I will provide an overview of how we have been engaging with the scientific community to develop BioCompute specifications to build a framework to standardize bioinformatics
computations and analyses communication with US FDA. I will also describe how BioCompute Objects (https://osf.io/h59uh/) can be created using the High-performance Integrated Virtual Environment (HIVE) and other bioinformatics platforms.
4:45 Integrating Genomic and Immunologic Data to Accelerate Translational Discovery at the Parker Institute for Cancer Immunotherapy
Danny Wells, PhD, Scientist, Informatics, Parker Institute for Cancer Immunotherapy
Immunotherapy is rapidly changing how we treat both solid and hematologic malignancies, and combinations of these therapies are quickly becoming the norm. For any given treatment strategy only a subset of patients will respond, and an
emerging challenge is how to effectively identify the right treatment strategy for each patient. This challenge is compounded by a concomitant explosion in the amount of data collected from each patient, from high dimensional single
cell measurements to whole exome tumor sequencing. In this talk I will discuss translational research at the Parker Institute, and how we are integrating multiple molecular and clinical data types to characterize the tumor-immune phenotype
of each patient.
5:15 Close of Conference Program
Stay Late for:
MARCH 14-15
S10: Data Science, Precision Medicine and Machine Learning – Detailed Agenda
Day 1 | Day 2 | Day 3 | Download Brochure