Uncovering Domain Relations by Using Large Language Models as Domain Experts


thumbnail

Suitable for:
Master project

Description:

Imagine an artificial agent that is tasked with explaining a particular domain, e.g., in the context of assisting students with homework. This agent should have access to a model of knowledge it aims to explain, but the construction of a high-quality knowledge representation is usually expensive and the automation of this process in the context of explanation generation is especially difficult, as causal relations between entities as well as their relations to the so-called “common sense knowledge” are often not explicitly stated in available written knowledge sources.

In classical approaches to knowledge base construction the process of knowledge elicitation from human experts has been widely researched (Cooke, 1994). Modern large language models (LLMs) could potentially play the role of an expert that can explicate relations within particular domains due to the massive amounts of data they were trained on, as well as the quality of representations in their embedding space.

The goal of this thesis is to investigate the use of LLMs for knowledge base construction in the context of explanation generation by researching, integrating and implementing existing methods from classical knowledge elicitation research with LLM-based approaches to entity and relation extraction (Xu et al., 2024), knowledge distillation (Xu et al., 2024), retrieval-augmented generation (Gao et al., 2024), etc.

References:

Cooke, N. J. (1994). Varieties of knowledge elicitation techniques. International Journal of Human-Computer Studies, 41, 801–849.
Xu, D., Chen, W., Peng, W., Zhang, C., Xu, T., Zhao, X., Wu, X., Zheng, Y., Wang, Y., & Chen, E. (2024). Large Language Models for Generative Information Extraction: A Survey (arXiv:2312.17617). arXiv.
Xu, X., Li, M., Tao, C., Shen, T., Cheng, R., Li, J., Xu, C., Tao, D., & Zhou, T. (2024). A Survey on Knowledge Distillation of Large Language Models (arXiv:2402.13116). arXiv.
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-Augmented Generation for Large Language Models: A Survey (arXiv:2312.10997). arXiv.

Contact:
Lina Mavrina