Jump to main content Hotkeys
Distributed and Self-organizing Systems
Distributed and Self-organizing Systems

Masterarbeit / Bachelorarbeit

A web-based solution for metatda management in Jupyter Notebooks
A web-based solution for metatda management in Jupyter Notebooks

Research Area

Web Engineering

Advisers

samuel

gaedke

Description

Efficient metadata management is crucial for the organization, retrieval, and analysis of data within Jupyter Notebooks. As data scientists and researchers increasingly rely on Jupyter Notebooks for their interactive and iterative data exploration and analysis, the need for a streamlined solution to manage metadata becomes evident. A robust metadata management system enhances productivity, collaboration, and reproducibility by systematically organizing information about data sources, analysis workflows, and results. Current methods for managing metadata in Jupyter Notebooks are often ad hoc and manually intensive, leading to inconsistencies and inefficiencies. There is a pressing need for an automated, web-based solution that integrates seamlessly with Jupyter Notebooks, enabling users to efficiently manage metadata without interrupting their workflow. This solution should not only support the capture and organization of metadata but also facilitate easy retrieval, sharing, and updating of this information.

This thesis aims to develop a web-based software tool, designed to automate metadata management in Jupyter Notebooks. The user interface of this web-based system must be intuitive and tailored to support data scientists and researchers in automatically capturing and managing metadata, guided by insights from previous studies on metadata management and existing tools in the field. The key features of the solution needs to include Automated Metadata Capture, User-Friendly Interface, Extensibility, Persistent and Referencable Metadata. The system needs to automatically extract metadata from Jupyter Notebooks, including information about data sources, code execution, results, and workflow context. The user interface that facilitates easy viewing, editing, and updating of metadata. The system must be easily extendable to accommodate new types of metadata and integrate with additional tools and workflows. The solution must ensure that metadata entries are persistable and can be referenced via persistent URLs, supporting reproducibility and long-term data management needs. The metadata captured from the Jupyter notebooks is stored as a knowledge graph, enabling querying of the metadata.

The objective of this thesis is to analyze the current state of metadata management in Jupyter Notebooks, identify existing gaps and requirements, and develop a comprehensive solution that addresses these needs. This includes designing and implementing the web-based tool, followed by an experimental evaluation through a pilot study to demonstrate its effectiveness and usability. By advancing metadata management in Jupyter Notebooks through this web-based system, this thesis aims to significantly enhance the efficiency and effectiveness of data science workflows, promoting better data organization, collaboration, and reproducibility.


Powered by DGS
Edit list (authentication required)

Press Articles