Masterarbeit
Building a Web Crawler to collect a dataset of websites with chatbots
Completion
2024/12
Research Area
Advisers
Description
Chatbots are an upcoming technology which is more often used
in the last few years to interact with users on websites. The research on chatbots in the
domain of human-computer interaction is currently spread over several areas, both in the
graphical and conversational side of the interface. As of today, there is no dataset of
websites with chatbots available which could be used as a common basis for research "in
the wild". Most datasets either concentrate on a specific way of chatbot integration or
include conversations from a specific chatbot itself. Thus there is a need for a holistic
dataset of websites with chatbots.
The topic is to research on web
crawlers before building and evaluating a web crawler which can solve the above mentioned
problem. The crawler should find chatbots including the following types (and their
variations): implemented by different third-party platforms, developed and included in the
website itself, websites with an API to generative AI chatbots. The following chatbots can
be included but are not a focus: chatbots on social media spaces or social networks,
generative AI websites like Bard or ChatGPT. The thesis must include a state of the art on
the current technology of web crawlers and available chatbot datasets, which has to be
evaluated according to previous elicitated requirements. Out of this evaluation a concept
for the webcrawler has to be designed and implemented. The fesibility of the web crawler
has to be evaluated according to the number of found instances, the inclusion of the
before mentioned chatbot types and the width of the dataset. It should also be tested
against currently available webcrawlers which are going in a similar direction.