Jump to main content Hotkeys
Distributed and Self-organizing Systems
Distributed and Self-organizing Systems

Masterarbeit / Bachelorarbeit

A reproducibility service for extracting metadata from PubMedCentral
A reproducibility service for extracting metadata from PubMedCentral

Research Area

Web Engineering

Advisers

samuel

Description

Reproducibility is a fundamental aspect of scientific research, ensuring that results can be independently verified and extended by other researchers. PubMed Central, a digital repository of biomedical and life sciences journal literature, hosts a vast array of publications. However, the extraction and organization of reproducibility metadata from these publications remain a challenging task. This thesis aims to develop a cutting-edge reproducibility service built on top of state-of-the-art tools, designed to automate the extraction of metadata and facilitate the reproduction of scientific results.

This service will regularly collect publications from PubMed Central, extract relevant information, and identify references to GitHub repositories and Jupyter Notebooks. By cloning these repositories and setting up the appropriate execution environments, the service will enable seamless reproduction of the computational aspects of research findings. This comprehensive approach will significantly enhance the efficiency and reliability of reproducibility efforts in the scientific community.

This thesis aims to develop an advanced reproducibility service, utilizing state-of-the-art tools to automate the extraction of reproducibility metadata from PubMed Central publications. The user interface of this service should be intuitive, supporting researchers and data scientists in efficiently accessing and reproducing research results. The system architecture must ensure robust and accurate metadata extraction, repository cloning, and environment setup. The key feature of the solution will include Automated Metadata Extraction, which will utilize advanced tools to automatically extract metadata from PubMed Central publications, including references to GitHub repositories and Jupyter Notebooks. The service will continuously collect and update publications from PubMed Central to ensure the service maintains a current and comprehensive dataset. The solution will provide a facility to clone referenced GitHub repositories and set up the necessary execution environments for Jupyter Notebooks, facilitating seamless reproduction of computational research. A web-based UI will allows users to easily search for publications, review extracted metadata, and initiate the reproduction process. The solution will support seamless integration with existing research tools and platforms, enhancing the overall research workflow.The systems should be designed to be easily extendable to accommodate new metadata types, repositories, and computational environments.

The objective of this thesis is to analyze the current state of reproducibility efforts in scientific research, identify existing challenges, and develop a comprehensive solution that addresses these needs. This includes designing and implementing the reproducibility service, followed by an experimental evaluation through a pilot study to demonstrate its effectiveness and usability. By advancing the reproducibility services through this automated system, this thesis aims to significantly improve the efficiency and reliability of reproducing scientific results, thereby enhancing the credibility and robustness of biomedical and life sciences research.


Powered by DGS
Edit list (authentication required)

Press Articles