Research

Deep-CDS: Deep Learning Semantic Data Lake for Clinical Decision Support

National Institute of General Medical Sciences 1R44GM143996 

More than 5 million patients are admitted annually to United States ICUs with average mortality rate reported ranging from 8-19%, or about 500,000 deaths annually. Sepsis is the leading cause of in-hospital mortality, where one in three inpatient deaths are due to sepsis. Incidence of sepsis has been increasing with 1.7 million sepsis cases and 270,000 deaths per year. Early identification of deterioration has been shown to reduce the need for patient transfer to higher care units, reduce lengths of stay, and improve survival rates. Each hour of delay in ICU admission has been associated with a 1.5% increased risk of ICU death and a 1% increase in risk of hospital death. Many studies support that there is an increase in mortality rate for every hour delay in antibiotics. Pairing patient risk stratification with appropriate levels of hospital intervention is essential to reduce risk of mortality. Patients in intermediate units between the levels of monitoring found in floor units and ICUs are especially difficult to predict possibility of condition deterioration. Automated monitoring, alerts, and trend analysis are essential to identifying and proactively intervening patients under duress. Current methods of monitoring patient health have low specificity and have significant room for improvement. This project will develop Deep-CDS, a cloud-based deep learning system for context-sensitive clinical decision support in monitoring and predicting the deterioration of patient health and progression of sepsis risk factors in real-time to improve outcomes and optimize the management of care across the hospital population. To support the clinical care team, Deep-CDS provides team members with (a) a clinical care knowledgebase, (b) an early warning score for deteriorating health conditions, (c) a model for predicting septic conditions, (d) evidence-based clinical practice guidelines, and (e) visualization of patient health status trends. Deep-CDS addresses NIGMS Priorities for Small Business Development of Sepsis Diagnostics and Therapeutics, NOT-GM-20- 028: 1) Diagnostic tools for emergency department settings; 2) Predictive clinical algorithms and point-of-care diagnostics; 3) Technologies that combine various types of data for diagnosis of sepsis patients; and 4) Clinical decision support, including use of artificial intelligence and machine learning approaches, to develop tools for early recognition of sepsis, assessment of treatment responses and patient deterioration, and long-term prognosis prediction in various care settings.

Quantitative Radiogenomics for Precision Medicine in Breast Cancer

National Cancer Institute 75N91021C00025
The use of advanced medical imaging can provide a non-invasive means for accurately assessing molecular subtypes overcoming the limitations of biopsies. Medical images can capture a full picture of tumor phenotypes and their environments throughout treatment with a low risk to patients. Therefore, the combination of radiomics; which refers to the extraction and storage of quantitative data from digital images with clinical data in a shared database, and radiogenomics; which correlates genomic and radiomic data is poised to play a critical role in the diagnosis of individual lesions. The goal of this project is to develop an innovative platform combining radiomics and radiogenomics for diagnosing and characterizing breast cancer tumors throughout therapy, creating an efficient and robust prognostic model to aid clinicians making personalized treatment decisions for cancer patients. (1) Develop radiogenomics model combining radiomic and genomic features capable of inferring breast cancer subtype, stage, and response criteria for tumors captured within MRI images. (2) Validate the model’s diagnostic accuracy, utility for treatment planning and its generality across multiple sites and vendor platforms to show potential for widespread impact on personalized cancer treatment. (3) Develop intuitive user experience supporting the inspection and interpretation of model outputs in the context of source MRI images to support personalized treatment decisions.

Platform for High-Throughput Analysis of Integrated Cancer Imaging and Multi-Omics Data

National Cancer Institute 261201700040C-002
Cancer studies increasingly include medical imaging and measurements from multiple omics techniques. The main impetus for data integration is that, through these integrated data sets, an improved understanding of the underlying biology is obtained to be better able to predict a phenotype and to gain further insight into mechanistic aspects of the system at the molecular level. In this project we propose to develop innovative technologies to integrate metabolite data with multi-omic (metabolomics, proteomics and transcriptomics) and cancer imaging data to enable the detection of subtler and more complex associations among variables, with the medical imaging and the metabolome providing phenotypic measurements to which we can anchor the global measurements of the transcriptome and proteome. The proposed Multi-omics and Imaging Data Analysis System (MIDAS) will provide for the ingestion, annotation, quality control, and analysis of in vivo imaging data combined with ex vivo -omics data to advance research in cancer. MIDAS is aimed at helping the cancer and overall public health research communities advance faster towards the larger goal of precision medicine through valid and reliable data harmonization of metabolomics, transcriptomics, proteomics, radiomics and other imaging data.

Microbiome Meta-Analysis Platform

National Institute of General Medical Sciences 1R44GM123827-01A1
Researchers from very diverse fields are expanding their research to microbiome studies to understand the interactions between microbes, hosts, and the environment. As new technologies for accelerated production of microbiome sequence data have enabled this type of research, there is a pressing need for high performance computation resources that accommodate flexible and consistent configuration, deployment, and execution of constantly improving analytical pipelines and that enable data harmonization and interoperability. Standards for data and experimental representation are still being developed and enhanced, resulting in semantic inconsistency and incompatible data formats and conventions, and therefore presenting data integration and management challenges. Meta-analyses of pooled data are becoming more widespread as computational power increases. Assessment of the sources of variation in microbiota profiling is sorely needed to understand how to combine and integrate data from different studies. As the field rapidly evolves and new sequencing and processing techniques are developed, the use of hard-coded scientific pipelines limits the scope of biological interpretations. We propose to develop a “Microbiome Meta-Analysis Platform” (MIMAP) that takes advantage of cluster computing, software containerization, and semantic data integration technologies to enable building, modifying and evaluating alternative bioinformatics pipelines for reproducibility studies, new studies, and meta-analysis of microbiome data from different cohorts, from cross-sectional and longitudinal studies, from public sources, collaborators and in-house studies. It enables deployment and testing of existing and emerging bacterial identification and downstream analysis algorithms, substitution of tools to test new approaches, and semantic modeling of data for pooling of multiple studies and for integration of clinical information through a friendly user interface designed with guidance of an expert team of microbiome specialists. It also allows researchers to perform quality control evaluations using positive and negative controls and provenance data. In Phase I, the main workflow execution, data modeling, and evaluation strategies will be prototyped to demonstrate feasibility. During Phase II development, the complete MIMAP system will be created as a solution for the execution of microbiome research.

Immunology Information Integration

National Institute of Allergy and Infectious Diseases  272201700018C-001
The contractor will develop the Immunology Information Integration (I3) platform. I3 employs an ontology-based data model to index, query, and virtualize data/metadata, knowledge and tools of the DAIT repositories. I3 relies on GeneTegra data source models, which are representations of the data schema codified in the Web Ontology Language (OWL). A user interface will be developed to prototype integrated data query, retrieval, and analyses. The Contractor will work with the expert consultant team in defining common workflows, identifying the tools needed for such workflows to be executed, and building tool and workflow descriptors within Galaxy.

METABOLOMICS DATA INTEGRATION AND QUALITY ASSESSMENT

National Cancer Institute 261201600065C-0-0-1
Several studies suggest that metabolomics can play a significant role in the investigation of the etiology and progression for many diseases, including cancer. The variety of analytical technologies, laboratory platforms, sample preparation methods, and experimental designs used for metabolic profiling mean that the integration and harmonization of metabolomics data is a difficult challenge. To address these complexities the contractor proposes to develop the Metabolomics Data Integration System (MODIS). MODIS will employ cutting-edge cluster computing technologies to provide bioinformatics methods and database formats for the storage, processing, quality control, and integration of metabolite data across various laboratory platforms and analytical technologies, including LC-MS, GC-MS, and NMR. The development of these methods will position the research community to better leverage existing metabolomics data for the discovery of novel biomarkers and the understanding of the biology of cancer and other diseases.

Semantic Data Lake for Biomedical Research

National Cancer Institute, National Library of Medicine 1R44CA206782-01A1
Capitalizing on the transformative opportunities afforded by the extremely large and ever-growing volume, velocity, and variety of biomedical data being continuously produced is a major challenge. The development and increasingly widespread adoption of several new technologies, including next generation genetic sequencing, electronic health records and clinical trials systems, and research data warehouses means that we are in the midst of a veritable explosion in data production. This in turn results in the migration of the bottleneck in scientific productivity into data management and interpretation: tools are urgently needed to assist cancer researchers in the assembly, integration, transformation, and analysis of these Big Data sets. In this project, we propose to develop the Semantic Data Lake for Biomedical Research (SDL-BR) system, a cluster-computing software environment that enables rapid data ingestion, multifaceted data modeling, logical and semantic querying and data transformation, and intelligent resource discovery. SDL-BR is based on the idea of a data lake, a distributed store that does not make any assumptions about the structure of incoming data, and that delays modeling decisions until data is to be used. This project adds to the data lake paradigm methods for semantic data modeling, integration, and querying, and for resource discovery based on learned relationships between users and data resources.

Ontology-Based Knowledge and Belief Management System

National Institute of General Medical Sciences 5R44GM097851-04
Dramatic advances in the development of biomedical ontologies hold the promise of a deeper and clearer understanding of the molecular and genetic aspects that affect human health. Biomedical data and knowledge stored in ontologies and databases have the potential to empower researchers in the life sciences to access and find conclusive evidence that can be translated to medical diagnosis and treatment. Enormous effort is being expended to create suites of interoperable ontologies that can encompass the life sciences. However, the extensive knowledge codifications created and curated by the developers of existing ontologies rarely interact with the beliefs and hypotheses postulated by other researchers. This inability to make use of codified and established knowledge hinders the ability of researchers to take advantage of the capabilities afforded by Semantic Web technologies in terms of computational reasoning. In this project, we propose to develop the GeneBel system as a software solution to allow researchers in biology and genetics to postulate hypotheses, and to test and verify these hypotheses against the body of knowledge existing in multiple interconnected ontologies. At the core of the proposed GeneBel system, a belief and hypothesis encoding mechanism permits the creation of hypotheses as belief assertions, an ontology generation and alignment component allows the interconnection of multiple ontologies and data sources, and a process of hypothesis verification finds ontology assertions that either corroborate or contradict hypotheses. In Phase I of the project, the specific belief encoding techniques and underlying reasoning and hypothesis verification methods will be implemented in a prototype solution and tested against a set of predefined research scenarios and hypotheses. During Phase II, the complete GeneBel system will be constructed and evaluated in the execution of real-world hypothesis verification in genetics and biomedical research.

Scalable Automated Brain Tumor Segmentation

National Cancer Institute 261201400045C-0-0-1
Brain tumor segmentation in Magnetic Resonance Imaging is an important task for neurosurgeons, oncologists, and radiologists to assess disease burden and measure tumor response to treatment. Over 237,000 individuals worldwide are estimated to have been diagnosed with malignant brain and CNS with over 174,000 deaths. In the United States alone, over 66,000 new cases of primary malignant and non-malignant brain and CNS tumors are expected to be diagnosed in 2014. Detection of brain tumors with the exact location and orientation is extremely important for effective diagnosis, treatment planning, and analysis of treatment effectiveness; however, manual delineation of the tumor takes considerable time and is prone to error and wide variability. The overall goal of this proposal is to develop a scalable and automated approach for the segmentation of brain tumors. The aims of the project are: 1) Produce a clinic ready software package with user-friendly graphical user interface to manage the process of brain tumor segmentation and quantitative imaging. 2) Implement the production software module to accurately detect and classify brain tissues from multi-channel MRI data. 3) Support quantitative imaging, system interoperability, structured reporting, and knowledge integration through the use of semantics and annotation standards. 4) Demonstrate the software produces clinically validated results for accurate assessment from MRI data of the brain under varying conditions of noise, spatial inhomogeneities, localized scanner settings and vendor equipment. 5) Package, deploy, and test the SABTS tools to be used in clinical practice for the accurate detection, visualization, and assessment of disease progression in patients with brain tumors.

Multi-Model Detection and Quantification of Multiple Sclerosis in MR Imaging

National Institute of Neurological Disorders and Stroke 1R41NS060473-01A2
Multiple sclerosis (MS), a neurodegenerative disease that afflicts the central nervous system, is characterized by lesion formation and atrophy of the brain and spinal cord. Atrophy was reported to occur early in the disease and to increase with the disease progression in various cortical and sub-cortical regions, reflecting widespread loss of myelin, axons and neural cell bodies. Published studies in the past decade have demonstrated that recent advances in Magnetic Resonance Imaging (MRI) have exhibited great progress in the detection, visualization and quantification of the onset and progression of MS disease. In order for these advances in the diagnosis and assessment of the progression of the disease to continue and systematically change the clinical evaluation of MS, techniques for accurate, automated, and robust detection of MS lesions and quantification of brain atrophy must be developed to enable increased utilization in clinical settings. The main objective of this proposal is to develop an artificial immune classification (AIC) technique for accurate, automated and robust MRI data analysis for the purpose of MS lesion detection and quantification of regional brain atrophy. The proposed AIC technique for quantitative measurement of the effect and progression of MS disease aims to tackle current challenges in assessing MS through a generic and unified approach that relies on artificial immune functions to enable accurate identification of different tissue classes in the brain. During phase I of the grant, a prototype of the proposed AIC technique will be developed and evaluated in a pilot study involving real MRI data of MS patients and controls. In addition, the evaluation will involve simulated MRI data at varying levels of MS disease burden, noise, and intensity in-homogeneity. Phase I will provide a proof-of-concept of the proposed AIC technique as well as demonstrate its practical feasibility for assessment of MS lesion burden and regional accuracy for quantifying white matter and gray matter.

Semantic Data Integration for Integrative Cancer Biology Research

National Cancer Institute 2R44CA132293-02A2
Recent advances in high-throughput measurements of critical parameters related to cancer genesis and development have led to a wealth of cancer-related information available in public and private databases. To realize the promise of dramatic advancement in integrative cancer research enabled by this rapidly expanding information, novel informatics tools that allow researchers to efficiently integrate this available data are needed. The main objective of this proposal is to enable enhanced understanding and modeling of cancer processes through the development of the Cancer Biology Data Integration (CBDI) System, a collection of caGrid services and grid client applications capable of integrating data and information from disparate sources. The CBDI System aims to exploit the rich semantic metadata information and the robust data exchange standards established through the Cancer Biomedical Informatics Grid (caBIG) Initiative, enhancing its use through the provision of a coherent ontological view of caBIG semantics, so that data sources can be queried using a standard semantic query language. The CBDI System contains an Ontology View Generator to expose caBIG semantics using the Web Ontology Language (OWL) and a distributed Semantic Query Processor that implements the semCDI query model to execute RDF-based queries using SPARQL against caGrid data services. The CBDI System also enables the integration of local private data collected by investigators and research institutions. An intuitive user interface providing multiple visualization and concept searching abilities is used to build queries and view results. In Phase I of the CBDI project, key algorithms and mechanisms of the CBDI System were developed, including the exposure of ontology views and the conversion of queries from SPARQL into caBIG’s common query language. In addition the overall feasibility of the CBDI System was demonstrated with proof-of-concept prototypes of system components. During Phase II, the complete CBDI System will be implemented and tested with caBIG data services at six research institutions, evaluating its use in a variety of real-world operating conditions and functional scenarios.

A Hidden Markov Model Based Segmentation Framework for MR Spectroscopy Imaging

National Institute of Biomedical Imaging and Bioengineering 1R41EB005520-01A1
The convergence of biomedicine and computation in the form of biomedical computing has demonstrated gains in the areas of genetic sequences, biomedical images, qualitative descriptors for health and social science and geospatial images and chemical formulae. This provides a unique opportunity to utilize novel image processing techniques to more accurately and objectively characterize anatomical and molecular structures and establish tractable measurements and quantification of normal and disease states. Accurate quantification of brain metabolites from Magnetic Resonance Spectroscopic Imaging (MRSI) is becoming increasingly important in the examination of long-term effects of disease and monitoring of the effects of treatment in cancer, neuro-degenerative diseases, and mental health. During Phase I, this proposal will develop an innovative image segmentation framework for the analysis of MRSI data, which is accurate, robust, and computationally efficient for eventual use as a tool in monitoring cancer treatment. The proposed framework is based on a novel utilization of Hidden Markov Models (HMM) that are traind to recognize different tissue parameters in the brain and is then used for the segmentation of MR data. The HMM-based segmentation is used for its attractive accuracy, robustness and computational efficiency (as tradeoff with accuracy) characteristics which are demonstrated from the mathematical foundation of the HMM as well as the preliminary results. The segmentation framework will then be used as part of a system to provide reproducible and tractable quantification of brain metabolites in MRSI analysis for cancer treatment analysis. Based on the success of Phase I, in Phase II, the tools developed from the framework will be integrated within a Picture Archiving and Communication Systems (PACS) for clinical use utilizing MRSI datasets for monitoring the effects of cancer treatment. Accurate in vivo quantification of brain metabolites is useful in examination of long-term effects of disease and monitoring the effects of treatment. This provides a unique opportunity to utilize novel image processing techniques to more accurately and objectively characterize anatomical and molecular structures and establish tractable measurements and quantification of normal and disease states. Co-analysis of segmented MR imaging data and “functional” MR data can improve the accuracy of assessing the burden of disease in patients with neurodegenerative, inflammatory/infectious, and neurovascular disorders.

Information Integration of Heterogeneous Data Sources

National Center for Research Resources 5R43RR018667-02
The overall goal of this proposal is to develop an information integration architecture and associated tools to support rapid integration of data and knowledge from distributed heterogeneous data sources. The architecture aims to play a significant role in extracting coherent knowledge bases for biomedical research and improving the accuracy, completeness and quality of the extracted knowledge. Towards achieving these goals, the proposed scalable architecture includes new innovative generalized integration algorithms and tools for the generation of mediators to capture the functional behavior of data sources, semantic representation of data sources to support automated generation of integration agents, and optimization of integrated data queries. The information integration architecture keeps pace with the evolving Internet-based XML electronic data interchange, semantic web services, and web services discovery standards. Thus, leveraging the Internet technologies and standards for the purpose of providing lasting state-of-the-art solutions for information integration. In addition, the proposed architecture is inherently scalable in terms of the number of data sources that can be integrated, the number of users of the integrated system, and the range of biomedical problems that can be tackled. During phase I of the project, prototypes of the proposed integration algorithms and tools will be developed as proofs of concept and to form the foundation for evaluation and pilot testing of the proposed integration mechanisms, using private and public data sources, in terms of scalability and integration capabilities.

An Intelligent System for Clinical Trials

National Institute of Mental Health 1R43MH070977-01A1
The main objective of this proposal is to develop an innovative state-of-the-art intelligent system for managing mental health clinical trials. This system is designed to greatly improve usability and reduce errors and costs, while enabling and supporting the advancement of psychiatric and behavioral research. In the context of a highly scalable architecture, innovative techniques are introduced for(1) a human-computer interaction (HCI) design methodology to define interaction patterns by applying analysis techniques solidly grounded on cognitive theory; (2) a multi-machine user interaction algorithm that uses a common underlying form structure and intelligence to enable the collection of data by different users of the system through a diversity of interface devices, such as Web thin clients, mobile and desktop computers, handheld computers and personal digital assistants (PDAs), and interactive voice response (IVR) tools; and (3) the tight integration of sophisticated, intelligent data representations for complex psychometric instruments and detailed clinical trial protocol modeling. The proposed system is specifically designed to effectively interact with trial investigators and developers, clinical personnel, and patients in order to provide efficient end-to-end trial data acquisition, management, and analysis, and incorporates advanced graphical configuration and visualization tools for protocol and form design, automated decision-support, compliance and adverse event tracking, autonomous data exchange with external systems, and alerting mechanisms. Throughout the project, INFOTECH Soft will rely on its user-centered formative design process and quality assurance strategy to ensure that the Psych Trials System adheres to its functional requirements and involves the end-user in important usability analysis. During phase I, cognitive dimensions analysis will be undertaken for a tightly defined set of activities that subjects are expected to accomplish using PDA and Web-based interfaces. This analysis will result in the definition of interaction patterns specific to each device. A prototype of the multi-machine user interaction algorithm will be developed as a proof-of-concept to assess the viability and clinical acceptance of the technique and to identify technical issues to be resolved towards phase II development. At the end of Phase II, the Psych Trials System will be pilot tested under a diverse set of realistic conditions in clinical trials to ensure that the system meets the functional specification, usability requirements, and performance demands in real-world clinical environments.

Integrated and Distributed Electronic Clinical Trials

National Center for Research Resources 5R44RR017110-03
The main objective of this proposal is to improve the development and administration of clinical trials by developing an innovative state-of-the-art distributed electronic clinical trial management system along with an integrated suite of clinical trial development tools. The proposed system effectively interacts with clinical trial developers, clinical investigators, and staff conducting clinical trials, and interfaces with external data sources in order to provide efficient end-to-end clinical trial data acquisition, management, and analysis. The integrated suite of clinical trial development tools enables clinical investigators to easily design clinical forms and trial protocols through a graphical user interface, while automating the tasks of form and protocol representation, database creation and configuration, and interfacing with external data sources. The proposal builds upon the technical feasibility established during phase I in four key areas: (1) electronic representation of clinical trial protocols, forms, and data using the open extensible Markup Language (XML); (2) design of a web-based trial administration tool incorporating decision support logic to generate clinical alerts for handling adverse events; and (3) architecture of a clinical trial development environment for enabling the definition of custom clinical trial protocols and forms; and (4) development of a novel technique for automated generation of the clinical trial database and associated synchronization schemes among sites participating in a clinical trial. The inherent automation, customization, portability and scalability characteristics of the proposed system establish its potential for efficient and effective management of a wide range of clinical trials involving multiple sites and varying requirements, complexities, and protocols. In the context of this proposal, the system will be evaluated and assessed in terms of clinical acceptance and usability through close collaboration with clinical consultants.

Automated Development of Electronic Data Capture for Clinical Trials

National Institute on Drug Abuse 4R44DA024911 
This proposal is in response to PHS 2007-2 NIDA Topic “Automation of the Development of Electronic Data Capture System for Clinical Trials Data Collection and Management”. We propose to develop the Automated Development of Electronic Data Capture (AD-EDC) System, a set of graphical tools and automated software applications that will help simplify, automate, standardize, and reduce the cost of creating and reporting the research instruments used in substance abuse clinical trials. The main objectives of the project are to develop technologies that will: (1) significantly reduce the costs and errors involved in developing electronic data capture (EDC) instruments for clinical trials; (2) significantly reduce the costs and errors involved in reporting and sharing study results; (3) interoperate with a wide-range of information systems and commercial, off-the-shelf clinical trial management systems (CTMS) and utilize Clinical Data Interchange Standards Consortium (CDISC) standards; and (4) provide non-technical, clinical users an easy to use system that includes accessibility support, which will allow users with visual, hearing, physical, and cognitive impairments to effectively participate in clinical trials. The Phase I research effort will focus on defining a high-level architecture for the AD-EDC System, refining functional specifications and user interface designs for the AD-EDC software components, defining an open format for electronic data dictionaries, and performing a cost-benefit analysis to demonstrate that, through software automation and computer-assisted design tools, the project’s specific aims are feasible. The Phase II effort will focus on implementing the complete set of study design tools, study reporting tools, and system interoperability components that will be necessary to achieve the project’s aims. At the end of Phase II, the AD-EDC System will be pilot tested in a diverse set of real-world clinical settings and a cost-benefit analysis will be performed to validate that the system achieves the project aims. PUBLIC HEALTH RELEVANCE: The AD-EDC System consists of set of graphical tools and automated software modules that will help simplify, automate, standardize, and reduce the cost of creating and reporting clinical research instruments used in substance abuse clinical trials. The AD-EDC System is designed to interoperate with commercial, off-the-shelf clinical trials management systems either through a standard CDISC- compliant ODM interface or through custom external system adapters. Standard data representations based on CDISC standards are used by the tools and components of the AD-EDC System to support interoperability and persistence.