We live in the era of data. Data stands as the new essential resource, proving crucial not only for sophisticated mathematical models but also for commonplace applications. Regardless the field or domain, data needs to be shared in order to improve these tools, which means that s
...
We live in the era of data. Data stands as the new essential resource, proving crucial not only for sophisticated mathematical models but also for commonplace applications. Regardless the field or domain, data needs to be shared in order to improve these tools, which means that systems must speak the same language. To address this challenge, there is a growing popularity in utilizing ontologies. These are formal structures of knowledge that enable a common understanding of concepts related to a specific domain, both by humans and by machines. But their generation is a complex task that requires knowledge from both the domain and the ontology engineering field. Many ontologies already exist, and extending those with new concepts and paradigms is a common practice. It is also essential to promote reusability of standards and standardization, and to avoid “reinventing the wheel”.
In parallel, the fast evolution of Artificial Intelligence is leading the emergence of infinite possibilities in terms of automation of tasks that have hitherto been completely manual. In particular, the use of Large Language Models is booming, though its use to help in the extension of existing ontologies has not been exploited yet. This research proposal aims to answer the question of: How can LLMs be integrated into a semi-automated and domain-independent process for the extension of existing ontologies?
This master’s thesis, carried out in collaboration with TNO, follows the Design Science Research Approach to analyze the problems associated with ontology engineering and the integration of LLMs in the ontology extension process by interviewing 11 ontology engineers and experts in the field. The analysis of the interviews is used to generate a model of the current ontology extension process and to produce a set of 22 high-level design requirements to guide the design a process framework for human-LLM collaboration for the ontology extension process. The design prototype proposed consists of an extended version of the current ontology extension process model augmented by the assistance of LLMs on various ontology extension tasks and incorporating additional ontology engineering tools and stakeholders in the process. The design includes a set of prompt templates that can be customized by the ontology engineer to extend an ontology in any domain. The final design is demonstrated and evaluated with a real use case, which consists of extending the Common Greenhouse Ontology (CGO), developed by TNO, with the use case Semantic Explanation and Navigation System (SENS), previously executed by TNO. The demonstration and evaluation is performed using OpenAI’s model GPT-4 Omni, through the creation of a GPT Assistant.
The generated ontology extension using the process framework and the GPT assistant is similar to the manually crafted extension. Remarkably good results are achieved in tasks such as defining business scenarios and creating a glossary of terms, generating Competency Questions, formalizing the ontology (in OWL) and the CQs (in SPARQL queries), and verifying the generated ontology extension using mock data and CQs. An end-user (ontology engineer) is asked to use the proposed process framework prototype and qualitatively evaluate the tasks, concluding that this approach is very useful for ontology engineers with various level of expertise to structure and increase the quality of the current ontology extension process.
Although the proposed design holds a great potential to be implemented and integrated with the current methodologies and tools used by ontology engineers, several challenges need to be tackled before we see a wide adoption and integration of LLMs in the ontology extension process. These challenges go beyond the technical performance of the LLMs, but revolve around the societal implications of the use of this technology instead. The loss of enriching human interactions and expertise, and the environmental and ethical impact are big concerns of the ontology engineers. These must be addressed before the potential of LLMs for the ontology extension process can be unveiled.