Masterarbeit / Bachelorarbeit
A reproducibility service for extracting metadata from PubMedCentral
Research Area
Advisers
Description
Reproducibility is a fundamental aspect of scientific
research, ensuring that results can be independently verified and extended by other
researchers. PubMed Central, a digital repository of biomedical and life sciences journal
literature, hosts a vast array of publications. However, the extraction and organization
of reproducibility metadata from these publications remain a challenging task. This thesis
aims to develop a cutting-edge reproducibility service built on top of state-of-the-art
tools, designed to automate the extraction of metadata and facilitate the reproduction of
scientific results.
This service will regularly collect
publications from PubMed Central, extract relevant information, and identify references to
GitHub repositories and Jupyter Notebooks. By cloning these repositories and setting up
the appropriate execution environments, the service will enable seamless reproduction of
the computational aspects of research findings. This comprehensive approach will
significantly enhance the efficiency and reliability of reproducibility efforts in the
scientific community.
This thesis aims to develop an advanced
reproducibility service, utilizing state-of-the-art tools to automate the extraction of
reproducibility metadata from PubMed Central publications. The user interface of this
service should be intuitive, supporting researchers and data scientists in efficiently
accessing and reproducing research results. The system architecture must ensure robust and
accurate metadata extraction, repository cloning, and environment setup. The key feature
of the solution will include Automated Metadata Extraction, which will utilize advanced
tools to automatically extract metadata from PubMed Central publications, including
references to GitHub repositories and Jupyter Notebooks. The service will continuously
collect and update publications from PubMed Central to ensure the service maintains a
current and comprehensive dataset. The solution will provide a facility to clone
referenced GitHub repositories and set up the necessary execution environments for Jupyter
Notebooks, facilitating seamless reproduction of computational research. A web-based UI
will allows users to easily search for publications, review extracted metadata, and
initiate the reproduction process. The solution will support seamless integration with
existing research tools and platforms, enhancing the overall research workflow.The systems
should be designed to be easily extendable to accommodate new metadata types,
repositories, and computational environments.
The objective of this
thesis is to analyze the current state of reproducibility efforts in scientific research,
identify existing challenges, and develop a comprehensive solution that addresses these
needs. This includes designing and implementing the reproducibility service, followed by
an experimental evaluation through a pilot study to demonstrate its effectiveness and
usability. By advancing the reproducibility services through this automated system, this
thesis aims to significantly improve the efficiency and reliability of reproducing
scientific results, thereby enhancing the credibility and robustness of biomedical and
life sciences research.