New Release! Version 2.2 of the ProvCaRe repository released with 166 million provenance triples extracted from more than 1.6 million full-text articles.
Publication: ProvCaRe paper finalist for AMIA distinguished paper award! Paper
Publication: ProvCaRe paper published in International Journal of Medical Informatics. Paper
What is ProvCaRe?
The Provenance for Clinical and Healthcare Research (ProvCaRe) framework aims to model, extract, and analyze provenance information to support reproducibility of research studies. The ProvCaRe framework consists of:
- S3 Model: that extends the World Wide Web Consortium (W3C) PROV specifications to model provenance metadata describing Study Method, Study Tools, Study Data in a research study. The S3 Model is formalized in the ProvCaRe ontology (available for download from this site).
- Provenance Text Processing Pipeline: that uses the ProvCaRe ontology to identify and extract provenance metadata from published literature describing biomedical research studies.
- Provenance Knowledge Repository: consisting of provenance information extracted from published research studies that can be queried and explored by users using "hypothesis-based search queries". The query results are ranked using a provenance-based ranking approach that characterizes the reproducibility of the research studies based on associated provenance metadata.
What is Provenance?
Provenance describes the source or history of data, for example the study protocol, methods used to assign values to research study variables, and procedure used to curate or process data.
Features of the ProvCaRe Framework
ProvCaRe S3 Model
The S3 framework represents the three core categories of provenance metadata required to support scientific reproducibility:
- Study Method describes the approach used to conduct the research study
- Study Data describes the category, type, and threshold value of data recorded and analyzed in the study
- Study Tool describes the instruments used to record data, instrument parameters, and capability of the instruments.
The S3 Model is formalized in the ProvCaRe ontology, which extends the W3C PROV Ontology, using the W3C Web Ontology Language (OWL2).
The ProvCaRe ontology is designed to support provenance extraction from unstructured text, querying of the ProvCaRe knowledgebase, and ranking of query results.
The ProvCaRe ontology OWL file is available for download here.
The ProvCaRe platform supports hypothesis-driven search. This allows users to find research studies that focus on a given hypothesis together with the provenance information extracted from the articles describing the research studies (from PubMed).
This search function uses a provenance-based ranking algorithm to enumerate research studies that have provide appropriate provenance metadata to support scientific reproducibility.
For More Details On Our Work Please See Our Publications: Sahoo SS, Valdez J, Kim M, Rueschman M, Redline S, ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata. International Journal of Medical Informatics, 121, pp.10-18., 2019