-
GeneTegra at the Medicine X Conference
posted on Wednesday, September 26, 2012
GeneTegra will be presented at the Medicine X conference at Stanford University from September 28 - 30, 2012. In this presentation, we present the GeneTegra system, an ontology-based information integration environment. We show its ability to query multiple diverse data sources, and we evaluate the relative performance of different data repositories. GeneTegra uses Semantic Web standards to resolve the semantic and syntactic diversity of the large and increasingly complex body of publicly available data. GeneTegra contains mechanisms to create ontology models of data sources using the OWL Web Ontology Language, and to define, plan, and execute queries against these models using the SPARQL query language. Data source formats supported include relational databases, XML and RDF data sources, and delimited text files. Experimental results have been obtained to show that GeneTegra obtains equivalent results from different data repositories containing the same data, illustrating the ability of the methods proposed in querying heterogeneous sources using the same modeling paradigm.
Learn more at http://www.genetegra.com
About Medicine X
Medicine X is a catalyst for new ideas about the future of medicine and health care. The initiative explores how emerging technologies will advance the practice of medicine, improve health, and empower patients to be active participants in their own care. The “X” is meant to encourage thinking beyond numbers and trends—it represents the infinite possibilities for current and future information technologies to improve health. Under the direction of Dr. Larry Chu, Assistant Professor of Anesthesia, Medicine X is a project of the Stanford AIM Lab.
Learn more at http://medicinex.stanford.edu
-
Scalable Automated Brain Tumor Segmentation Funded by NCI
posted on Friday, August 31, 2012
INFOTECH Soft has been awarded a contract by the National Cancer Institute to develop Scalable Automated Brain Tumor Segmentation software. Brain tumor segmentation in Magnetic Resonance Imaging is an important task for neurosurgeons, oncologists, and radiologists to assess disease burden and measure tumor response to treatment. In 2008, over 237,000 individuals worldwide are estimated to have been diagnosed with malignant brain and central nervous system tumors with over 174,000 deaths. Detection of brain tumors with the exact location and orientation is extremely important for effective diagnosis, treatment planning, and analysis of treatment effectiveness; however, manual delineation of the tumor takes considerable time and is prone to error and wide variability. The overall goal of this proposal is to develop a scalable and automated approach for the segmentation of brain tumors based on Hidden Markov Models (HMMs). The objectives of the project are: 1) Develop a tumor segmentation approach based on a novel utilization of HMMs for automated segmentation of multi-sequence brain MRI data for accurate and robust determination of tumor volume; 2) Design a MapReduce model for the HMM-based brain tumor segmentation approach to enable scalable development of the segmentation processes in a cluster environment; 3) Evaluate the HMM-based brain tumor segmentation framework in terms of accuracy, robustness, and performance in the context of multi-sequence MRI data.
-
Data Mining Software for Large-Scale Analyses of Hepatitis Infections Funded by the CDC
posted on Monday, August 20, 2012
INFOTECH Soft has been awarded a contract by the Centers for Disease Control and Prevention (CDC) National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP) to develop data mining software for large-scale analyses of infections caused by hepatitis viruses. The goal of this project is to develop data mining software that extracts, transforms, and loads structured data relating to infection with hepatitis viruses from diverse sources into a warehouse appropriate for mainframe, client/server, and PC platforms. This data will include but may not be limited to demographic, clinical, epidemiological, laboratory and phylogenetic information. The software will store and manage the data in a data warehouse system with a web-based interface to provide data access to the scientific community and analysis of relationships in the stored data using end-user defined queries to discover disease patterns and trends. It is expected that the software will generate associations between epidemiological and laboratory data leading to the discovery of new disease patterns, epidemiological trends and proteomic associations. Such discoveries are expected to lead to new strategies for public health interventions, surveillance, prophylaxis and the development of antivirals and vaccines. This software tool will be applicable not only to hepatitis viruses but other pathogens in the areas of epidemiology, laboratory research and public health.
About NCHHSTP
The National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP) is one of the larger centers at CDC, with a budget of approximately $1 billion. The workforce consists of more than 1,800 full-time employees and contractors, including approximately 300 who are assigned to state and local health departments in the United States. NCHHSTP is composed of an Office of the Director (OD) and four divisions, each of which is defined by the diseases it addresses. Although the divisions have their own missions, the National Center OD provides leadership to help coordinate their efforts and foster collaboration. Center staff work in collaboration with governmental and nongovernmental partners at community, state, national, and international levels to accomplish the NCHHSTP mission. Read more at http://www.cdc.gov/nchhstp/
-
Y3 EC-funded Grant for Transatlantic Tumour Model Repositories
posted on Sunday, July 01, 2012
The project aims at developing a European clinically oriented semantic-layered cancer digital model repository from existing EU projects that will be interoperable with the US grid enabled semantic layered digital model repository platform at CViT.org (Center for the Development of a Virtual Tumor, Massachusetts General Hospital (MGH), Boston, USA) which is NIH/NCI-caGRID compatible. This interoperable, CViT interfaced, environment will offer a range of services to international cancer modelers, bio-researchers and eventually clinicians aimed at supporting both basic cancer quantitative research and individualized optimization of cancer treatment. This Transatlantic project will therefore be the starting point for an international validation environment which will support joint applications, verification and validation of the clinical relevance of cancer models. To ensure the clinical relevance of this joint effort, the development of the project will be based upon specific clinical scenarios that will be implemented within an integrated EU-US workflow environment prototype for predictive, In Silico Oncology-guided clinical studies that will be deployed towards the end of the project. As an end result, a specific, clinically relevant workflow involving both EU and CViT models will be demonstrated, which will clearly highlight the need for and added value of interoperability. To achieve these goals, multiscale models/tools developed and data collected within the framework of three ongoing EC funded research projects namely ACGT [Advancing Clinicogenomic Trials on Cancer], ContraCancrum [Clinically Oriented Cancer Multilevel Modeling] and the VPH NoE [Virtual Physiological Human Network of Excellence], in conjunction with models and data from the NIH supported ICBP Program CViT.org will drive the development, optimization and validation of the integrated system. Thus, a new module of the VPH environment will emerge.
MGH-CViT will assign as a third party carrying out part of the work, some limited and well defined software development work to INFOTECH Soft Inc. in Miami, Fl (US). MGH-CViT has worked successfully with INFOTECH Soft before on DMR Phase I (2008-09), delivering a NIH/NCI caBIG-compliance package ahead of schedule and under budget. MGH-CViT and INFOTECH Soft are currently collaborating on DMR Phase II (2009-10), under a NCI supported and caBIG mentored NIH contract.
-
"Developing Applications for Elsevier's SciVerse Platform: Developer Perspective" at the 2012 STM Spring Conference
posted on Wednesday, April 25, 2012
INFOTECH Soft will be participating in the International Association of Scientific, Technical & Medical Publishers STM Spring Conference 2012 in Washington D.C. to present "Developing Applications for Elsevier's SciVerse Platform: Developer Perspective."
The presentation will focus on INFOTECH Soft's partnership with Massachusetts General Hospital to develop the Cancer Research Project for Elsevier's new SciVerse Applications platform. The goal of the Cancer Research Project is to provide cancer researchers with tools to help locate relevant in vivo, in vitro, and in silico biological models.
INFOTECH Soft and Massachusetts General Hospital have co-developed two applications for Elsevier's SciVerse platform to enable cross-database semantic search of National Cancer Institute cancer Biomedical Informatics Grid (caBIG) resources. The Cancer Models Gateway links Science Direct searches to computational cancer models in the Center for the Development of a Virtual Tumor’s Digital Model Repository (CViT DMR) and animal models of cancer in the Cancer Models Database (caMOD). The Cancer Images Gateway links Science Direct full-text articles to matching content in the Cancer Images Database (caIMAGE).
About the STM
STM is the leading global trade association for academic and professional publishers. It has over 110 members in 21 countries who each year collectively publish nearly 66% of all journal articles and tens of thousands of monographs and reference works. STM members include learned societies, university presses, private companies, new starts and established players.
For more information about the STM Spring Conference 2012:
http://www.stm-assoc.org/events/stm-spring-conference-2012/
-
GeneTegra at the 8th Data Integration in the Life Sciences Workshop (DILS 2012)
posted on Friday, April 13, 2012
A paper on the GeneTegra Information Integration Platform has been accepted for the 8th Data Integration in the Life Sciences Workshop (DILS 2012). In this paper, we present the GeneTegra system, an ontology-based information integration environment. We show its ability to query multiple diverse data sources, and we evaluate the relative performance of different data repositories. GeneTegra uses Semantic Web standards to resolve the semantic and syntactic diversity of the large and increasingly complex body of publicly available data. GeneTegra contains mechanisms to create ontology models of data sources using the OWL Web Ontology Language, and to define, plan, and execute queries against these models using the SPARQL query language. Data source formats supported include relational databases, XML and RDF data sources, and delimited text files. Experimental results have been obtained to show that GeneTegra obtains equivalent results from different data repositories containing the same data, illustrating the ability of the methods proposed in querying heterogeneous sources using the same modeling paradigm.
-
Roswell Park Cancer Institute licenses GeneTegra Information Integration Platform
posted on Thursday, March 15, 2012
Roswell Park Cancer Institute recently ordered an institutional license for use of the GeneTegra system. RPCI, located in Buffalo, New York, is America’s oldest cancer center, and is designated a Comprehensive Cancer Center by NCI. A thorough trial period lasting several months was engaged with IT research personnel at RPCI, during which the application was stringently evaluated for functionality, usability, and security, and during which INFOTECH Soft’s development and support team provided updates reflecting customer requirements. GeneTegra is currently being used by RPCI cancer researchers to integrate information from several databases containing genetic, tissue, pathology, and clinical data.
About GeneTegra:
GeneTegra is a federated information integration solution that provides a common interaction environment to query data and knowledge from multiple sources The GeneTegra project implements a local-as-view approach to information integration, useful for situations where there does not exist a global reference ontology or taxonomy linked to data sources. The system uses ontologies as a common model for the syntax and semantics of data sources; it is designed to automatically generate ontologies as common representations of heterogeneous data sources, and to align these ontologies with each other to provide a common environment for the execution of queries. As such, it has resulted in a series of algorithms that are currently being implemented as a comprehensive integration solution, including our ASMOV algorithm for ontology alignment, which has consistently been one of the best performing solutions in the Ontology Alignment Evaluation Initiative (OAEI) contest, our semQA query algebra extension for distributing SPARQL queries, and the semCDI query formulation model, which enables the utilization of terminological and ontological relationships for data aggregation to join and query different data models
GeneTegra was developed under grant (1R43RR018667) with the National Center for Research Resources (NCRR).
-
GeneTegra 1.0 Federated Information Integration Platform Released
posted on Wednesday, February 29, 2012
GeneTegra version 1.0 has been released.GeneTegra is a federated information integration solution that provides a common interaction environment to query data and knowledge from multiple sources The GeneTegra project implements a local-as-view approach to information integration, useful for situations where there does not exist a global reference ontology or taxonomy linked to data sources. The system uses ontologies as a common model for the syntax and semantics of data sources; it is designed to automatically generate ontologies as common representations of heterogeneous data sources, and to align these ontologies with each other to provide a common environment for the execution of queries. As such, it has resulted in a series of algorithms that are currently being implemented as a comprehensive integration solution, including our ASMOV algorithm for ontology alignment, which has consistently been one of the best performing solutions in the Ontology Alignment Evaluation Initiative (OAEI) contest, our semQA query algebra extension for distributing SPARQL queries, and the semCDI query formulation model, which enables the utilization of terminological and ontological relationships for data aggregation to join and query different data models
GeneTegra was developed under grant (1R43RR018667) with the National Center for Research Resources (NCRR).
-
cancermodels.infotechsoft.com Cross-Database Search Service for Cancer Models Launches
posted on Thursday, March 01, 2012
INFOTECH Soft has launched http://cancermodels.infotechsoft.com, a cross-database search service to assist cancer researchers in locating relevant cancer research resources.
The Cancer Model Gateway applies caBIG® technologies to provide research scientists with content and information relevant to in-vivo, in-vitro, and in-silico biological models through the Elsevier SciVerse Applications portal. It exploits the synergies between Elsevier document content and access to biological models with related information afforded by The Massachusetts General Hospital’s Center for the Development of a Virtual Tumor, CViT, and other accessible, web-based resources. Specifically, using the CViT semantic infrastructure, the Model Gateway provides access to numerous existing databases and repositories, including public resources such as the Cancer Models Database (caMOD), CViT’s Digital Model Repository (DMR), and Cancer Images Database (caIMAGE).
About caBIG:
caBIG® stands for the cancer Biomedical Informatics Grid.® caBIG® is an information network enabling members of the cancer community – researchers, physicians, and patients – to share data and knowledge. The components of caBIG® are widely applicable beyond cancer as well. The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.
-
Y3 Phase II NCI-funded for Semantic Data Integration for Integrative Cancer Biology Research
posted on Wednesday, June 01, 2011
INFOTECH Soft begins the final year of funding for the NCI-funded project for Semantic Data Integration for Integrative Cancer Biology Research.
Recent advances in high-throughput measurements of critical parameters related to cancer genesis and development have led to a wealth of cancer-related information available in public and private databases. To realize the promise of dramatic advancement in integrative cancer research enabled by this rapidly expanding information, novel informatics tools that allow researchers to efficiently integrate this available data are needed. The main objective of this proposal is to enable enhanced understanding and modeling of cancer processes through the development of the Cancer Biology Data Integration (CBDI) System, a collection of caGrid services and grid client applications capable of integrating data and information from disparate sources. The CBDI System aims to exploit the rich semantic metadata information and the robust data exchange standards established through the Cancer Biomedical Informatics Grid (caBIG) Initiative, enhancing its use through the provision of a coherent ontological view of caBIG semantics, so that data sources can be queried using a standard semantic query language. The CBDI System contains an Ontology View Generator to expose caBIG semantics using the Web Ontology Language (OWL) and a distributed Semantic Query Processor that implements the semCDI query model to execute RDF-based queries using SPARQL against caGrid data services. The CBDI System also enables the integration of local private data collected by investigators and research institutions. An intuitive user interface providing multiple visualization and concept searching abilities is used to build queries and view results. In Phase I of the CBDI project, key algorithms and mechanisms of the CBDI System were developed, including the exposure of ontology views and the conversion of queries from SPARQL into caBIG's common query language. In addition the overall feasibility of the CBDI System was demonstrated with proof-of-concept prototypes of system components. During Phase II, the complete CBDI System will be implemented and tested with caBIG data services at six research institutions, evaluating its use in a variety of real-world operating conditions and functional scenarios. PUBLIC HEALTH RELEVANCE: The Cancer Biology Data Integration System is a collection of caBIG-compatible services that formulate a coherent ontological view of caBIG semantics so that ontology-based queries can be performed using the SPARQL query language over distributed caBIG-compatible data services.
-
Y2 EC-funded Grant for Transatlantic Tumour Model Repositories
posted on Sunday, May 01, 2011
The project aims at developing a European clinically oriented semantic-layered cancer digital model repository from existing EU projects that will be interoperable with the US grid enabled semantic layered digital model repository platform at CViT.org (Center for the Development of a Virtual Tumor, Massachusetts General Hospital (MGH), Boston, USA) which is NIH/NCI-caGRID compatible. This interoperable, CViT interfaced, environment will offer a range of services to international cancer modelers, bio-researchers and eventually clinicians aimed at supporting both basic cancer quantitative research and individualized optimization of cancer treatment. This Transatlantic project will therefore be the starting point for an international validation environment which will support joint applications, verification and validation of the clinical relevance of cancer models. To ensure the clinical relevance of this joint effort, the development of the project will be based upon specific clinical scenarios that will be implemented within an integrated EU-US workflow environment prototype for predictive, In Silico Oncology-guided clinical studies that will be deployed towards the end of the project. As an end result, a specific, clinically relevant workflow involving both EU and CViT models will be demonstrated, which will clearly highlight the need for and added value of interoperability. To achieve these goals, multiscale models/tools developed and data collected within the framework of three ongoing EC funded research projects namely ACGT [Advancing Clinicogenomic Trials on Cancer], ContraCancrum [Clinically Oriented Cancer Multilevel Modeling] and the VPH NoE [Virtual Physiological Human Network of Excellence], in conjunction with models and data from the NIH supported ICBP Program CViT.org will drive the development, optimization and validation of the integrated system. Thus, a new module of the VPH environment will emerge.
MGH-CViT will assign as a third party carrying out part of the work, some limited and well defined software development work to INFOTECH Soft Inc. in Miami, Fl (US). MGH-CViT has worked successfully with INFOTECH Soft before on DMR Phase I (2008-09), delivering a NIH/NCI caBIG-compliance package ahead of schedule and under budget. MGH-CViT and INFOTECH Soft are currently collaborating on DMR Phase II (2009-10), under a NCI supported and caBIG mentored NIH contract.
-
INFOTECH Soft at the 2010 "Ontology Alignment Evaluation Initiative"
posted on Friday, September 03, 2010
We have been invited to present our ASMOV system at the Ontology Alignment Evaluation Initiative (OAEI). According to the preliminary results published by the organizing committee, ASMOV shows the highest accuracy of all participants in the benchmark test. This is consistent with past results, as ASMOV has been one of the best performing systems in all four years in which we have participated. The 2010 OAEI campaign is being held in Shanghai, China, associated with the Ontology Matching Workshop at the International Semantic Web Conference (ISWC).
-
"Grid-based cancer model simulation with CViT’s Computational Model Execution Framework" accepted for the caBIG 2010 Annual Meeting, September 13-15
posted on Thursday, July 15, 2010
INFOTECH Soft will be presenting the poster "Grid-based cancer model simulation with CViT’s Computational Model Execution Framework" at the caBIG® 2010 Annual Meeting in Washington D.C. September 13-15, 2010.
The NIH/NCI-supported Center for the Development of a Virtual Tumor, CViT (PI: T. S. Deisboeck), brings together a multi-institutional, interdisciplinary group of investigators with interest in the biomedical, computational and mathematical aspects of cancer research. To foster the collection and sharing of in silico cancer models, simulation‐related workflow designs, and for access and integration of relevant tumor biology data from disparate sources CViT’s Digital Model Repository (DMR) was implemented as a semantically‐enabled Web-based data store. The CViT DMR was expanded in 2008 to provide a caBIG silver-level compliant data service in order to allow client applications to securely upload and access models and model metadata within the repository and is currently being made interoperable with an upcoming clinical cancer modeling repository in Europe with support by the European Commission. The Computational Model Execution Framework (CMEF) was developed in 2010 to enable the grid-based execution of the computational cancer models deposited within the DMR. Utilizing CMEF, members of the CViT community are able to select and configure models to be executed, determine the data to be used to run these models, and deposit simulation results back to the repository.
The CMEF integrates seamlessly into the CViT.org website providing grid-based model execution of Java, C/C++, and R programs on 32- and 64-bit Windows and Linux nodes. This expansion of CViT has been performed without affecting the caBIG silver-level compatibility of the CViT DMR data service. In addition, the functionality added to support the execution of models, and especially the semantic metadata added to the system, has been created complying with caBIG silver-level compatibility guidelines in preparation of releasing the CMEF as a silver-level compliant analytical service. This poster describes the expansion of CViT’s Digital Model Repository to support the Computational Model Execution Framework.
About the caBIG 2010 Annual Meeting:
The caBIG® Annual Meeting is intended to address the needs of the information technology, software development, research and informatics communities involved with deploying or using caBIG® tools and technology. The meeting content is designed to help current program participants address technical and cultural challenges related to caBIG® development and deployment, as well as provide ideas for innovative uses of caBIG® infrastructure and services in real-world research environments.
For more information about the caBIG 2010 Annual Meeting:
https://cabig.nci.nih.gov/2010AnnualMeeting
About caBIG:
caBIG® stands for the cancer Biomedical Informatics Grid.® caBIG® is an information network enabling members of the cancer community – researchers, physicians, and patients – to share data and knowledge. The components of caBIG® are widely applicable beyond cancer as well. The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.
-
"The GeneTegra Information Integration System for caGrid data services" accepted for the caBIG 2010 Annual Meeting, September 13-15
posted on Thursday, July 15, 2010
INFOTECH Soft will be presenting the poster "The GeneTegra Information Integration System for caGrid data services" at the caBIG® 2010 Annual Meeting in Washington D.C. September 13-15, 2010.
GeneTegra is a graphical information integration environment designed to facilitate concept-based search and querying of genetics and biomedical data from diverse and heterogeneous data sources. It utilizes Semantic Web technologies to address the two main obstacles in the integration of knowledge: syntactic heterogeneity, where data sources have different representation and access mechanisms, and semantic variability, where similar lexical terms may refer to multiple concepts or dissimilar terms may refer to the same concept. semCDI. a semantic representation of cancer-related data services available through caGrid and a methodology to formulate and execute queries against this representation, has been enhanced to work together with GeneTegra’s mechanisms to generate ontology representations of data sources, discover mappings between these ontology representations, and perform queries against these representations, enabling the integrated querying of sources within and outside of caGrid. In this poster, we present a description of GeneTegra and of its incorporation of semCDI, and show initial results that demonstrate the validity and utility of the system.
About the caBIG 2010 Annual Meeting:
The caBIG® Annual Meeting is intended to address the needs of the information technology, software development, research and informatics communities involved with deploying or using caBIG® tools and technology. The meeting content is designed to help current program participants address technical and cultural challenges related to caBIG® development and deployment, as well as provide ideas for innovative uses of caBIG® infrastructure and services in real-world research environments.
For more information about the caBIG 2010 Annual Meeting:
https://cabig.nci.nih.gov/2010AnnualMeeting
About caBIG:
caBIG® stands for the cancer Biomedical Informatics Grid.® caBIG® is an information network enabling members of the cancer community – researchers, physicians, and patients – to share data and knowledge. The components of caBIG® are widely applicable beyond cancer as well. The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.
-
"Planning and Execution of Queries against caGrid Data Services" accepted for the caBIG 2010 Annual Meeting, September 13-15
posted on Thursday, July 15, 2010
INFOTECH Soft will be presenting the poster "Planning and Execution of Queries against caGrid Data Services" at the caBIG® 2010 Annual Meeting in Washington D.C. September 13-15, 2010.
caGrid provides a standard mechanism for representing the semantics of grid-enabled cancer datasets; however, integrated querying of the datasets at a conceptual level remains a challenge. The semCDI query formulation defines a methodology to specify queries against ontology representations of caGrid data services utilizing the SPARQL Query Language for RDF. The execution of these queries within the caGrid environment requires the resolution of three important issues. First, the queries must be subdivided into sub-queries specific to each data service involved. Additionally, these SPARQL sub-queries must be converted into queries written in the CQL language used within caGrid. And further, the CQL queries must be planned and executed. In this poster, we present the design and implementation of a mechanism for query planning and execution for semCDI. Specifically, we illustrate the use of the semQA query algebra for SPARQL as the mathematical foundation for the subdivision of queries, we discuss the planning solutions implemented to overcome the limitations of CQL when joins between multiple objects are required, and we illustrate the process of execution of queries against multiple caGrid data services as implemented within our GeneTegra information integration system.
About the caBIG 2010 Annual Meeting:
The caBIG® Annual Meeting is intended to address the needs of the information technology, software development, research and informatics communities involved with deploying or using caBIG® tools and technology. The meeting content is designed to help current program participants address technical and cultural challenges related to caBIG® development and deployment, as well as provide ideas for innovative uses of caBIG® infrastructure and services in real-world research environments.
For more information about the caBIG 2010 Annual Meeting:
https://cabig.nci.nih.gov/2010AnnualMeeting
About caBIG:
caBIG® stands for the cancer Biomedical Informatics Grid.® caBIG® is an information network enabling members of the cancer community – researchers, physicians, and patients – to share data and knowledge. The components of caBIG® are widely applicable beyond cancer as well. The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.
-
"Ontology Modeling of caBIG Semantics" accepted for the caBIG 2010 Annual Meeting, September 13-15
posted on Thursday, July 15, 2010
INFOTECH Soft will be presenting the poster "Ontology Modeling of caBIG® Semantics" at the caBIG® 2010 Annual Meeting in Washington D.C. September 13-15, 2010.
The semCDI methodology has been proposed to provide knowledge representation and concept-based querying within the caGrid environment using Semantic Web standards. The ultimate objective of semCDI is to enable execution of queries against concepts in NCI Thesaurus capable of returning valid data instances from caGrid data services. The core of semCDI consists of the mechanisms designed to achieve a representation of the caBIG® semantics as ontologies formulated in the Web Ontology Language (OWL). In this poster, we present the rationale for two design choices made within semCDI. First, we discuss the modeling of a subsumption hierarchy consisting of UML domain model classes, caDSR object classes, and NCI Thesaurus concepts, which results in the presence of inconsistencies within the corresponding OWL ontologies . We argue that these inconsistencies are already present within the semantics of caBIG®, and we describe the way in which our querying methods take into account these inconsistencies. Second, we discuss the modeling of UML attributes as datatype properties and caDSR associations as object properties, and their relationship with associations in NCI Thesaurus, exploring the utility of these associations when constructing and executing queries.
About the caBIG® 2010 Annual Meeting:
The caBIG® Annual Meeting is intended to address the needs of the information technology, software development, research and informatics communities involved with deploying or using caBIG® tools and technology. The meeting content is designed to help current program participants address technical and cultural challenges related to caBIG® development and deployment, as well as provide ideas for innovative uses of caBIG® infrastructure and services in real-world research environments.
For more information about the caBIG 2010 Annual Meeting:
https://cabig.nci.nih.gov/2010AnnualMeeting
About caBIG:
caBIG® stands for the cancer Biomedical Informatics Grid.® caBIG® is an information network enabling members of the cancer community – researchers, physicians, and patients – to share data and knowledge. The components of caBIG® are widely applicable beyond cancer as well. The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.
-
Semi-Structured Assessment for the Genetics of Alcoholism II (SSAGA II) for Aspect Released
posted on Wednesday, July 14, 2010
INFOTECH Soft has completed coding the Semi Structured Assessment for the Genetics of Alcoholism (SSAGA-II) for the ASPECT platform.
ASPECT is a robust and scalable electronic data capture platform for the administration of complex assessments. Mainly targeted toward the complex assessments used in mental health and substance use research, ASPECT provides EDC forms such as the Diagnostic Interview for Genetics Studies (DIGS), Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA-I, SSAGA-II), and research version of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID).
The Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), expressly developed for COGA (http://zork.wustl.edu/niaaa), is a polydiagnostic psychiatric interview that covers the major psychiatric disorders in DSM-III-R and provides complete diagnoses in DSM-III-R and ICD-10 as well as diagnoses for Substance Dependence in Feighner and DSM-IV.
-
INFOTECH Soft Exhibitor at the caBIG 2010 Annual Meeting Worlds Fair, September 13-15
posted on Thursday, July 01, 2010
INFOTECH Soft will be an exhibitor at the World’s Fair exhibition at the caBIG® Annual Meeting “Building a Collaborative Biomedical Network.”. The caBIG® World's Fair is an annual showcase that highlights community contributions to the program through posters, exhibits, and technology demonstrations. Attendees will have an opportunity to network with colleagues and interact with domain experts and support resources.
About the caBIG 2010 Annual Meeting:
The caBIG® Annual Meeting is intended to address the needs of the information technology, software development, research and informatics communities involved with deploying or using caBIG® tools and technology. The meeting content is designed to help current program participants address technical and cultural challenges related to caBIG® development and deployment, as well as provide ideas for innovative uses of caBIG® infrastructure and services in real-world research environments.
For more information about the caBIG 2010 Annual Meeting:
https://cabig.nci.nih.gov/2010AnnualMeeting
About caBIG:
caBIG® stands for the cancer Biomedical Informatics Grid.® caBIG® is an information network enabling members of the cancer community – researchers, physicians, and patients – to share data and knowledge. The components of caBIG® are widely applicable beyond cancer as well. The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.
-
Y2 Phase II NCI-funded for Semantic Data Integration for Integrative Cancer Biology Research
posted on Tuesday, June 01, 2010
INFOTECH Soft begins the second year of funding for research and development for the NCI-funded project for Semantic Data Integration for Integrative Cancer Biology Research.
Recent advances in high-throughput measurements of critical parameters related to cancer genesis and development have led to a wealth of cancer-related information available in public and private databases. To realize the promise of dramatic advancement in integrative cancer research enabled by this rapidly expanding information, novel informatics tools that allow researchers to efficiently integrate this available data are needed. The main objective of this proposal is to enable enhanced understanding and modeling of cancer processes through the development of the Cancer Biology Data Integration (CBDI) System, a collection of caGrid services and grid client applications capable of integrating data and information from disparate sources. The CBDI System aims to exploit the rich semantic metadata information and the robust data exchange standards established through the Cancer Biomedical Informatics Grid (caBIG) Initiative, enhancing its use through the provision of a coherent ontological view of caBIG semantics, so that data sources can be queried using a standard semantic query language. The CBDI System contains an Ontology View Generator to expose caBIG semantics using the Web Ontology Language (OWL) and a distributed Semantic Query Processor that implements the semCDI query model to execute RDF-based queries using SPARQL against caGrid data services. The CBDI System also enables the integration of local private data collected by investigators and research institutions. An intuitive user interface providing multiple visualization and concept searching abilities is used to build queries and view results. In Phase I of the CBDI project, key algorithms and mechanisms of the CBDI System were developed, including the exposure of ontology views and the conversion of queries from SPARQL into caBIG's common query language. In addition the overall feasibility of the CBDI System was demonstrated with proof-of-concept prototypes of system components. During Phase II, the complete CBDI System will be implemented and tested with caBIG data services at six research institutions, evaluating its use in a variety of real-world operating conditions and functional scenarios. PUBLIC HEALTH RELEVANCE: The Cancer Biology Data Integration System is a collection of caBIG-compatible services that formulate a coherent ontological view of caBIG semantics so that ontology-based queries can be performed using the SPARQL query language over distributed caBIG-compatible data services.
-
Informatics for Data and Resource Discovery in Addiction Research Conference, July 8-9
posted on Tuesday, June 01, 2010
INFOTECH Soft is attending the Informatics for Data and Resource Discovery in Addiction Research conference sponsored by the National Institute on Drug Abuse. The conference takes place in Rockville, Maryland on July 8th and 9th.
About the Conference:
Addiction research is amassing increasing amounts of complex data, and creating greater numbers and types of research resources, ranging from software tools and chemical reagents to animal models, images, biomarkers, biological and behavioral assays, biomaterial repositories, specialized data sources, and web portals; however, most remain hidden in unstructured or semi-structured sources such as journal articles or web pages centered on particular laboratories, institutions or grants. Concurrently, the broader biomedical research community is developing additional tools and data which also can inform and advance addiction research. With over 1500 different databases, alone, useful to neuroscientists, how do addiction researchers find, query, compare, relate, and employ appropriate data and resources efficiently and effectively? Equally important, how do they collect, report and share their own data and resources to make them interoperable and discoverable beyond a single research paper or web posting? To foster knowledge growth in this complex environment, informaticians are turning to resource registries, data federation, semantic tools and other approaches to enable data and resource discovery and analyses, as well as hypothesis generation and testing.
For more information about the conference:
http://www.seiservices.com/nida/1014080/Default.aspx
-
NCI Recertifies INFOTECH Soft as a License caBIG Support Service Provider
posted on Sunday, May 03, 2009
INFOTECH Soft has been recertified by the National Cancer Institute as a Licensed caBIG® Support Service Provider. Our experienced team of researchers and technical staff applies the latest developments in the Semantic Web and a deep understanding of the caBIG® technology stack to create, adapt, and enhance your innovative software solutions.
Our support services enrich the caBIG® software tools and procedures with our own semDS Toolkit to achieve rapid development, deployment, and expansion of caBIG® compatible applications. semDS contains a set of tools that leverage Semantic Web standards to create domain information models, Resource Description Framework (RDF) data stores, and Web Ontology Language (OWL) knowledge models, and to enable semantic querying by converting expressions in the SPARQL Query Language for RDF into CQL, the caBIG® Common Query Language. Our semDS tools streamline the process of creation of caBIG applications and provide unparalleled flexibility to manage their evolution and growth.
About caBIG:
caBIG® stands for the cancer Biomedical Informatics Grid.® caBIG® is an information network enabling members of the cancer community – researchers, physicians, and patients – to share data and knowledge. The components of caBIG® are widely applicable beyond cancer as well. The mission of caBIG® is to develop a truly collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer, ultimately improving patient outcomes.
-
Y2 NIDA-funded for Automated Development of Electronic Data Capture
posted on Monday, May 03, 2010
INFOTECH Soft continues a second year of research and development on a NIDA-funded research grant for the Automated Development of Electronic Data Capture for Clinical Trials.
In this NIDA-funded research grant, INFOTECH Soft is developing the Automated Development of Electronic Data Capture (AD-EDC) System, a set of graphical tools and automated software applications that will help simplify, automate, standardize, and reduce the cost of creating and reporting the research instruments used in substance abuse clinical trials. The main objectives of the project are to develop technologies that will: (1) significantly reduce the costs and errors involved in developing electronic data capture (EDC) instruments for clinical trials; (2) significantly reduce the costs and errors involved in reporting and sharing study results; (3) interoperate with a wide-range of information systems and commercial, off-the-shelf clinical trial management systems (CTMS) and utilize Clinical Data Interchange Standards Consortium (CDISC) standards; and (4) provide non-technical, clinical users an easy to use system that includes accessibility support, which will allow users with visual, hearing, physical, and cognitive impairments to effectively participate in clinical trials. The Phase I research effort will focus on defining a high-level architecture for the AD-EDC System, refining functional specifications and user interface designs for the AD-EDC software components, defining an open format for electronic data dictionaries, and performing a cost-benefit analysis to demonstrate that, through software automation and computer-assisted design tools, the project's specific aims are feasible. The Phase II effort will focus on implementing the complete set of study design tools, study reporting tools, and system interoperability components that will be necessary to achieve the project's aims. At the end of Phase II, the AD-EDC System will be pilot tested in a diverse set of real-world clinical settings and a cost-benefit analysis will be performed to validate that the system achieves the project aims. PUBLIC HEALTH RELEVANCE: The AD-EDC System consists of set of graphical tools and automated software modules that will help simplify, automate, standardize, and reduce the cost of creating and reporting clinical research instruments used in substance abuse clinical trials. The AD-EDC System is designed to interoperate with commercial, off-the-shelf clinical trials management systems either through a standard CDISC- compliant ODM interface or through custom external system adapters. Standard data representations based on CDISC standards are used by the tools and components of the AD-EDC System to support interoperability and persistence.
-
EC Grant Funded for Transatlantic Tumour Model Repositories
posted on Saturday, May 01, 2010
The project aims at developing a European clinically oriented semantic-layered cancer digital model repository from existing EU projects that will be interoperable with the US grid enabled semantic layered digital model repository platform at CViT.org (Center for the Development of a Virtual Tumor, Massachusetts General Hospital (MGH), Boston, USA) which is NIH/NCI-caGRID compatible. This interoperable, CViT interfaced, environment will offer a range of services to international cancer modelers, bio-researchers and eventually clinicians aimed at supporting both basic cancer quantitative research and individualized optimization of cancer treatment. This Transatlantic project will therefore be the starting point for an international validation environment which will support joint applications, verification and validation of the clinical relevance of cancer models. To ensure the clinical relevance of this joint effort, the development of the project will be based upon specific clinical scenarios that will be implemented within an integrated EU-US workflow environment prototype for predictive, In Silico Oncology-guided clinical studies that will be deployed towards the end of the project. As an end result, a specific, clinically relevant workflow involving both EU and CViT models will be demonstrated, which will clearly highlight the need for and added value of interoperability. To achieve these goals, multiscale models/tools developed and data collected within the framework of three ongoing EC funded research projects namely ACGT [Advancing Clinicogenomic Trials on Cancer], ContraCancrum [Clinically Oriented Cancer Multilevel Modeling] and the VPH NoE [Virtual Physiological Human Network of Excellence], in conjunction with models and data from the NIH supported ICBP Program CViT.org will drive the development, optimization and validation of the integrated system. Thus, a new module of the VPH environment will emerge.
MGH-CViT will assign as a third party carrying out part of the work, some limited and well defined software development work to INFOTECH Soft Inc. in Miami, Fl (US). MGH-CViT has worked successfully with INFOTECH Soft before on DMR Phase I (2008-09), delivering a NIH/NCI caBIG-compliance package ahead of schedule and under budget. MGH-CViT and INFOTECH Soft are currently collaborating on DMR Phase II (2009-10), under a NCI supported and caBIG mentored NIH contract.
-
Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA-I) for Aspect Released
posted on Wednesday, April 14, 2010
INFOTECH Soft has completed coding the Semi Structured Assessment for the Genetics of Alcoholism (SSAGA-I) for the ASPECT platform.
ASPECT is a robust and scalable electronic data capture platform for the administration of complex assessments. Mainly targeted toward the complex assessments used in mental health and substance use research, ASPECT provides EDC forms such as the Diagnostic Interview for Genetics Studies (DIGS), Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), and research version of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID).
The Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), expressly developed for COGA (http://zork.wustl.edu/niaaa), is a polydiagnostic psychiatric interview that covers the major psychiatric disorders in DSM-III-R and provides complete diagnoses in DSM-III-R and ICD-10 as well as diagnoses for Substance Dependence in Feighner and DSM-IV.
-
Y3 NCRR-funded for Information Integration of Heterogeneous Data Sources
posted on Thursday, April 01, 2010
INFOTECH Soft is starting a third and final year on the NCRR-funded research grant for Information Integration of Heterogenous Data Sources.
The wealth of biological and biomedical data constantly being generated promises dramatic advancement in the life sciences. To realize this promise, this pool of rapidly expanding information needs to be efficiently integrated, that is, combined in such a way that it can be queried to extract relevant data that can be subsequently analyzed to answer meaningful research questions. The main objective of this proposal is to develop the GeneTegra System, an information integration solution that provides a common interaction environment to query data and knowledge from multiple sources. Two main obstacles have to be overcome in order to attain an effective integration of knowledge from different data sources: syntactic heterogeneity, where data sources have different representation and access mechanisms; and semantic variability, where similar lexical terms may refer to multiple concepts or dissimilar terms refer to the same concept. The GeneTegra System addresses these obstacles through the use of Semantic Web technologies: ontologies constructed using the Web Ontology Language (OWL) as a common data and knowledge representation for data sources of diverse formats, automated mechanisms for the generation and maintenance of these ontology representations, and a robust system architecture based on reusable, service-oriented mediators. The core of the proposed system consists of general algorithms, procedures, and mechanisms developed during Phase I of this project, that enable the automatic generation of ontologies, the automated identification of semantic correspondences between ontology models, and the creation and execution of queries over these ontology- modeled, distributed, heterogeneous sources. In Phase II, the GeneTegra System will be developed, implemented, and tested as a human-centered solution building on the core components developed during Phase I, incorporating a highly usable interface for query creation and execution, a mechanism for registration, sharing, and re-use of information using Web Services standards, a mechanism for determining quality of data and query reliability, and a security and privacy subsystem that allows the construction of collaborative communities while ensuring that users are properly authenticated and authorized to access information through the system. The GeneTegra System will be designed and evaluated to specifically address the integration of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions. PUBLIC HEALTH RELEVANCE The GeneTegra System is an information integration solution that provides a common interaction environment to query data and knowledge from multiple heterogeneous sources. It uses ontologies as the base formulism for semantic and syntactic modeling, and contains automated mechanisms for the generation of these ontologies, and for the reuse and sharing of integration configurations. It is specifically designed to address the integrated querying of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions.
-
Computational Model Execution Framework Completed for CViT.org
posted on Monday, March 08, 2010
INFOTECH Soft has completed work on the Computational Model Execution Framework (CMEF) for the Center for the Development of a Virtual Tumor (CViT).
The CMEF project aims to design, develop, implement, and test methodologies for the execution of computational cancer models deposited within the CViT Digital Model Repository (DMR). In this way, members of the CViT community will be able to select and configure models to be executed, determine the data to be used to run these models, and deposit simulation results back to the repository.
To share tools and data throughout the nine current ICBP Centers, the ICBP relies on access to and integration/usage of relevant in vitro, in vivo and in silico tumor biology data from disparate sources. Usage includes storage, search and retrieval of heterogeneous data formats and qualities. Therefore, a semantically enabled, preferably web-browser accessible knowledge warehouse platform is required - a platform that allows for data up - and download to enable community-wide collaborations. Such a platform can be best described as a “semantic digital model repository”. An innovative “semantic enabled digital model repository” must prove its value for the community in a set of use cases that include a range of likely scenarios, such as data sharing, simulation-related workflow designs, and related aspects, such as parallelization, multi-resolution modeling and visualization.
To enable the execution of computational cancer models within the CViT DMR, the project team will design the structure of the semantic metadata necessary to describe computational models in order to enable their deployment for execution. Currently, the CViT DMR contains semantic metadata that specifies a name and description of each model, its ownership and provenance information, and categorizations of the type of model according to its type and purpose. This semantic metadata will be expanded to include information pertaining to the execution characteristics of a model, including the types of input data parameters needed, its programming language, the type of computing environment or operating system required for its execution, and the nature of the output data to be produced.
-
Phase II NCI-funded for Semantic Data Integration for Integrative Cancer Biology Research
posted on Monday, June 01, 2009
INFOTECH Soft has started Phase II of research and development for the NCI-funded project for Semantic Data Integration for Integrative Cancer Biology Research.
Recent advances in high-throughput measurements of critical parameters related to cancer genesis and development have led to a wealth of cancer-related information available in public and private databases. To realize the promise of dramatic advancement in integrative cancer research enabled by this rapidly expanding information, novel informatics tools that allow researchers to efficiently integrate this available data are needed. The main objective of this proposal is to enable enhanced understanding and modeling of cancer processes through the development of the Cancer Biology Data Integration (CBDI) System, a collection of caGrid services and grid client applications capable of integrating data and information from disparate sources. The CBDI System aims to exploit the rich semantic metadata information and the robust data exchange standards established through the Cancer Biomedical Informatics Grid (caBIG) Initiative, enhancing its use through the provision of a coherent ontological view of caBIG semantics, so that data sources can be queried using a standard semantic query language. The CBDI System contains an Ontology View Generator to expose caBIG semantics using the Web Ontology Language (OWL) and a distributed Semantic Query Processor that implements the semCDI query model to execute RDF-based queries using SPARQL against caGrid data services. The CBDI System also enables the integration of local private data collected by investigators and research institutions. An intuitive user interface providing multiple visualization and concept searching abilities is used to build queries and view results. In Phase I of the CBDI project, key algorithms and mechanisms of the CBDI System were developed, including the exposure of ontology views and the conversion of queries from SPARQL into caBIG's common query language. In addition the overall feasibility of the CBDI System was demonstrated with proof-of-concept prototypes of system components. During Phase II, the complete CBDI System will be implemented and tested with caBIG data services at six research institutions, evaluating its use in a variety of real-world operating conditions and functional scenarios. PUBLIC HEALTH RELEVANCE: The Cancer Biology Data Integration System is a collection of caBIG-compatible services that formulate a coherent ontological view of caBIG semantics so that ontology-based queries can be performed using the SPARQL query language over distributed caBIG-compatible data services.
-
NCI Certifies INFOTECH Soft as a License caBIG Support Service Provider
posted on Sunday, May 03, 2009
INFOTECH Soft has been certified by the National Cancer Institute as a Licensed caBIG® Support Service Provider. Our experienced team of researchers and technical staff applies the latest developments in the Semantic Web and a deep understanding of the caBIG® technology stack to create, adapt, and enhance your innovative software solutions.
Our support services enrich the caBIG® software tools and procedures with our own semDS Toolkit to achieve rapid development, deployment, and expansion of caBIG® compatible applications. semDS contains a set of tools that leverage Semantic Web standards to create domain information models, Resource Description Framework (RDF) data stores, and Web Ontology Language (OWL) knowledge models, and to enable semantic querying by converting expressions in the SPARQL Query Language for RDF into CQL, the caBIG® Common Query Language. Our semDS tools streamline the process of creation of caBIG applications and provide unparalleled flexibility to manage their evolution and growth.
-
Y2 NCRR-funded for Information Integration of Heterogeneous Data Sources
posted on Wednesday, April 01, 2009
INFOTECH Soft is starting a second year on the NCRR-funded research grant for Information Integration of Heterogenous Data Sources.
The wealth of biological and biomedical data constantly being generated promises dramatic advancement in the life sciences. To realize this promise, this pool of rapidly expanding information needs to be efficiently integrated, that is, combined in such a way that it can be queried to extract relevant data that can be subsequently analyzed to answer meaningful research questions. The main objective of this proposal is to develop the GeneTegra System, an information integration solution that provides a common interaction environment to query data and knowledge from multiple sources. Two main obstacles have to be overcome in order to attain an effective integration of knowledge from different data sources: syntactic heterogeneity, where data sources have different representation and access mechanisms; and semantic variability, where similar lexical terms may refer to multiple concepts or dissimilar terms refer to the same concept. The GeneTegra System addresses these obstacles through the use of Semantic Web technologies: ontologies constructed using the Web Ontology Language (OWL) as a common data and knowledge representation for data sources of diverse formats, automated mechanisms for the generation and maintenance of these ontology representations, and a robust system architecture based on reusable, service-oriented mediators. The core of the proposed system consists of general algorithms, procedures, and mechanisms developed during Phase I of this project, that enable the automatic generation of ontologies, the automated identification of semantic correspondences between ontology models, and the creation and execution of queries over these ontology- modeled, distributed, heterogeneous sources. In Phase II, the GeneTegra System will be developed, implemented, and tested as a human-centered solution building on the core components developed during Phase I, incorporating a highly usable interface for query creation and execution, a mechanism for registration, sharing, and re-use of information using Web Services standards, a mechanism for determining quality of data and query reliability, and a security and privacy subsystem that allows the construction of collaborative communities while ensuring that users are properly authenticated and authorized to access information through the system. The GeneTegra System will be designed and evaluated to specifically address the integration of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions. PUBLIC HEALTH RELEVANCE The GeneTegra System is an information integration solution that provides a common interaction environment to query data and knowledge from multiple heterogeneous sources. It uses ontologies as the base formulism for semantic and syntactic modeling, and contains automated mechanisms for the generation of these ontologies, and for the reuse and sharing of integration configurations. It is specifically designed to address the integrated querying of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions.
-
Phase II NIDA-funded for Automated Development of Electronic Data Capture
posted on Sunday, May 03, 2009
INFOTECH Soft is beginning Phase II of research and development on a NIDA-funded research grant for the Automated Development of Electronic Data Capture for Clinical Trials.
In this NIDA-funded research grant, INFOTECH Soft is developing the Automated Development of Electronic Data Capture (AD-EDC) System, a set of graphical tools and automated software applications that will help simplify, automate, standardize, and reduce the cost of creating and reporting the research instruments used in substance abuse clinical trials. The main objectives of the project are to develop technologies that will: (1) significantly reduce the costs and errors involved in developing electronic data capture (EDC) instruments for clinical trials; (2) significantly reduce the costs and errors involved in reporting and sharing study results; (3) interoperate with a wide-range of information systems and commercial, off-the-shelf clinical trial management systems (CTMS) and utilize Clinical Data Interchange Standards Consortium (CDISC) standards; and (4) provide non-technical, clinical users an easy to use system that includes accessibility support, which will allow users with visual, hearing, physical, and cognitive impairments to effectively participate in clinical trials. The Phase I research effort will focus on defining a high-level architecture for the AD-EDC System, refining functional specifications and user interface designs for the AD-EDC software components, defining an open format for electronic data dictionaries, and performing a cost-benefit analysis to demonstrate that, through software automation and computer-assisted design tools, the project's specific aims are feasible. The Phase II effort will focus on implementing the complete set of study design tools, study reporting tools, and system interoperability components that will be necessary to achieve the project's aims. At the end of Phase II, the AD-EDC System will be pilot tested in a diverse set of real-world clinical settings and a cost-benefit analysis will be performed to validate that the system achieves the project aims. PUBLIC HEALTH RELEVANCE: The AD-EDC System consists of set of graphical tools and automated software modules that will help simplify, automate, standardize, and reduce the cost of creating and reporting clinical research instruments used in substance abuse clinical trials. The AD-EDC System is designed to interoperate with commercial, off-the-shelf clinical trials management systems either through a standard CDISC- compliant ODM interface or through custom external system adapters. Standard data representations based on CDISC standards are used by the tools and components of the AD-EDC System to support interoperability and persistence.
-
Phase II NCRR-funded for Information Integration of Heterogeneous Data Sources
posted on Tuesday, April 01, 2008
INFOTECH Soft is starting Phase II research and development on the NCRR-funded grant for Information Integration of Heterogenous Data Sources.
The wealth of biological and biomedical data constantly being generated promises dramatic advancement in the life sciences. To realize this promise, this pool of rapidly expanding information needs to be efficiently integrated, that is, combined in such a way that it can be queried to extract relevant data that can be subsequently analyzed to answer meaningful research questions. The main objective of this proposal is to develop the GeneTegra System, an information integration solution that provides a common interaction environment to query data and knowledge from multiple sources. Two main obstacles have to be overcome in order to attain an effective integration of knowledge from different data sources: syntactic heterogeneity, where data sources have different representation and access mechanisms; and semantic variability, where similar lexical terms may refer to multiple concepts or dissimilar terms refer to the same concept. The GeneTegra System addresses these obstacles through the use of Semantic Web technologies: ontologies constructed using the Web Ontology Language (OWL) as a common data and knowledge representation for data sources of diverse formats, automated mechanisms for the generation and maintenance of these ontology representations, and a robust system architecture based on reusable, service-oriented mediators. The core of the proposed system consists of general algorithms, procedures, and mechanisms developed during Phase I of this project, that enable the automatic generation of ontologies, the automated identification of semantic correspondences between ontology models, and the creation and execution of queries over these ontology- modeled, distributed, heterogeneous sources. In Phase II, the GeneTegra System will be developed, implemented, and tested as a human-centered solution building on the core components developed during Phase I, incorporating a highly usable interface for query creation and execution, a mechanism for registration, sharing, and re-use of information using Web Services standards, a mechanism for determining quality of data and query reliability, and a security and privacy subsystem that allows the construction of collaborative communities while ensuring that users are properly authenticated and authorized to access information through the system. The GeneTegra System will be designed and evaluated to specifically address the integration of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions. PUBLIC HEALTH RELEVANCE The GeneTegra System is an information integration solution that provides a common interaction environment to query data and knowledge from multiple heterogeneous sources. It uses ontologies as the base formulism for semantic and syntactic modeling, and contains automated mechanisms for the generation of these ontologies, and for the reuse and sharing of integration configurations. It is specifically designed to address the integrated querying of sources relevant to investigations of genotype-phenotype associations and to the identification of genes responsible for human diseases and conditions.