Multimodal Creativity in Speech-Gesture Production

This project is part C02 of the CRC 1646 “Linguistic Creativity in Communication”. It investigates how speakers make creative use of their verbal and gestural resources in communicatively challenging situations. We combine psycholinguistics experiments with A.I.-based, computational cognitive modeling to develop a model of speech-gesture creativity and create simulations of robust, effective multimodal speakers.

Generating dialog-based analogical explanations about everyday tasks

An image expressing the major points of the textual project description

Generating dialog-based analogical explanations about everyday tasks

This project is part of the SAIL network (Sustainable Life-Cycle of Intelligent Socio-Technical Systems) funded by the Ministry of Culture and Science of the State of NRW.

Within SAIL we contribute to the research theme R1: “Human agency to shape cooperative intelligence”. Various tasks and domains exist where humans can benefit from a cooperation with intelligent systems, provided that the system can correctly respond to the needs of the user and the user has the agency to influence the system’s response, for example, by setting goals or providing feedback. We are developing an assistive system with the goal to support users in the accomplishment of everyday tasks at home. The system interacts with the users by means of spoken dialogue and provides analogical explanations that build up on the user’s existing knowledge and enable its transfer to a new domain. By using analogies as a natural means of explanation, we aim to make the system cognitively ergonomic. We are also researching the effects of such a system by evaluating its use under field conditions.

Our approach

We tackle the problem of planning an appropriate analogical explanation based on both (1) domain knowledge and (2) mechanisms of dialog interaction. Both aspects are captured in the system’s conversational memory in terms of a graph-based representation integrating domain knowledge with dialogue state and history information. When trying to explain a new domain to the user, the system applies the structure-mapping theory of human analogical reasoning (by Gentner) in order to identify candidate domains to be the source of an analogy for the current explanandum (the thing being explained). At the same time, the system evaluates these candidates with respect to various constraints imposed by the social nature of adaptive explanations.

Based on observable signals from the explainee (the agent receiving the explanation, in this case the user), the explainer makes predictions about relevant mental states of the explainee, e.g., understanding, interest, engagement, and adapts her behaviour and the presentation of the explanandum accordingly. We are working on mechanisms that allow the system to predict what the user might have inferred from previously established information, how well previous information might have been retained, what kind of user feedback is expected in response to an explanation, etc. These predictions are the constraints for the explanation planning and analogy source selection.

Eventually, the system is supposed to interact with users by means of spoken dialogue. Large language models (LLMs) offer a lot of advantages in natural language processing, e.g., they can be used for language understanding to allow users to speak freely about complex concepts, or they can be used for automated generation of domain knowledge representations from text. However, the operation of LLMs requires significant resources, especially under low-latency conditions for live interaction. We are thus investigating and evaluating the use of smaller LLMs, fine-tuned for specific tasks in our dialogue processing pipeline and will apply our findings by building an LLM-supported dialogue system that can process spoken language in an incremental fashion (based on our dialogue management framework flexdiam).

Contact: Lina Mavrina, Stefan Kopp

Project website

Scalable hybrid Avatar-Agent-Technologies for everyday social interaction in XR (HiAvA)

HiAvA investigates and develops technologies for enabling multi-user applications in Social VR, mitigating the challenges of social distancing. The goal is to improve upon current solutions by maintaining immersion and social presence even on hardware devices that only allow for limited tracking or rendering. The resulting system should exceed the capabilities of current video communication in terms of scalability, immersion and comfort. Our group contributes work on AI-based models for the speech-driven generation of non-verbal behavior of human avatars, in particular meaningful human-like gesticulation that is suitable for use with avatar-based face-to-face interaction systems.

Full Project Page

Adaptive generative models for interaction-aware conversational behavior

A key challenge for interactive artificial agents is to produce communicative multimodal behavior that is communicatively effective and robust in a given, dynamically evolving interaction context. This project investigates the automatic generation of speech and gesture. We develop cognitive, generative models that incorporate information about the realtime interaction context to allow for adaptive multimodal behavior that can steer and support the conversational interaction. Our goal is to (a) learn models for generating speech and meaningful (representational) gestures in realtime, (b) make these models adaptive to changes of the interlocutor’s behavior, and (c) validate them empirically in human-agent studies.

Contact: Hendric Voß

Creating explanations in collaborative human-machine knowledge exploration

The Transregional Collaborative Research Center (TRR 318) „Constructing Explainability“ investigates how explanations of algorithmic decisions can be jointly constructed by the explainer and the explainee. The project C05 investigates how human decision makers and intelligent systems can collaboratively explore a decision problem to make a decision that is accountable and hence explainable. The goal is to enable medical experts to understand and assess medical decisions and their implications by posing queries and receiving back causal or counterfactual answers from the intelligent system. The project develops the required methods for probabilistic causal reasoning, decision-process analysis, and language-based interaction. We study medical decision-making processes, develop a formal exploration and decision process model, and apply it in an interactive system that guides a medical expert towards a better, more explainable decision.

Adaptive Explanation Generation

The Transregional Collaborative Research Center (TRR 318) „Constructing Explainability“ investigates how explanations of algorithmic decisions can be made more efficient by constructing them jointly by the explainer and the explainee. The project A01 “Adaptive explanation generation” investigates the cognitive and interactive mechanisms of adaptive explanations. The goal of our work is to develop a dynamic, computational model of the representations and decision processes with which “pragmatic explainers” adapt their explanations to their addressees. To that end, cognitively motivated methods for model-based machine learning are developed, applied, and evaluated.

Computational cognitive modeling of the predictive active self in situated action (COMPAS)

The COMPAS project aims to develop a computational cognitive model of the execution and control of situated action in an embodied cognitive architecture that allows for (1) a detailed explanation, in computational terms, of the mechanisms and processes underlying the sense of agency; (2) simulation of situated actions along with the subjectively perceived sense of control and its impact on how actions are regulated, (3) empirical validation through comparison with empirical data obtained in experimental studies with human participants.

Realtime Mentalizing in Human-Agent Collaboration

This projects explores how AI-based agents can be equipped with an ability to cooperate grounded in a Theory of Mind, i.e. attribution of hidden mental states to other agents inferred from their observable behavior. In contrast to the usual approach to study this capability in offline, observer-based settings, we aim to fuse mentalizing with strategic planning and interacting in realtime situated cooperation. Previous work has developed Bayesian ToM models capable of performing mentalizing in adaptive, “satisficing” ways, i.e. solving the trade-off between accuracy and efficiency. Ongoing work looks at how this can be integrated bi-directionally with realtime planning and monitoring of cooperative behavior. We also investigate the cooperative abilities of LLM-based agents as well as how humans cooperate and communicate in the game environment “Overcooked”.

Contact: Florian Schröder (fschroeder@techfak.uni-bielefeld.de), Stefan Kopp (skopp@techfak.uni-bielefeld.de)

Publications: