• Keine Ergebnisse gefunden

A system for interactive learning in dialogue with a tutor

N/A
N/A
Protected

Academic year: 2022

Aktie "A system for interactive learning in dialogue with a tutor"

Copied!
1
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

A system for interactive learning in dialogue with a tutor

Danijel Skočaj, Matej Kristan Miroslav Janíček Michael Zillich Marc Hanheide Thomas Keller

University of Ljubljana, Slovenia

1. Introduction

DFKI, Saarbrücken, Germany TU Vienna, Austria

URGB URGB URGB URGB URGB URGB

Interactive continuous learning

Interactive continuous learning is an important characteristic of a cognitive agent that is supposed to operate and evolve in an everchanging environment. We present representationsrepresentations and mechanismsmechanisms that are necessary for continuous learning of visual concepts in dialogue with a tutor. We present an approach for modelling beliefsmodelling beliefs and we show how these beliefs are created by processing visual and visual and linguistic information

linguistic information. Based on the detected knowledge gaps represented in the beliefs, the motiva- tion and planning mechanism implements four

types of interaction for learninglearning. These principles have been implemented in an integrated system.

4. Modelling beliefs and intentions

5. Situated dialogue 3. Learning visual concepts

The visual concepts are represented as generative models that take the form of probability density functions over the feature space. They are based on multivariate online discriminative Kernel Density Estionline discriminative Kernel Density Esti- mator

mator (odKDE) [2] and are constructed in an online fashion from new observations by adapting from the positive examples (learninglearning) as well as negative exam- ples (unlearningunlearning) and by taking into account the prob- ability that a concept that has not been observed before has been encountered by maintaining the rep- resentation of the unknown modelunknown model.

Detection of incompleteness in knowledge Detection of incompleteness in knowledge:

AP of M0 is the best -> knowledge gap AP of the best Mi is low -> uncertainty

Situated dialogue understanding and production is treated as an abductive problem. Language un- derstanding is treated as inference to the most appropriate intention and beliefs behind a communi- cative act, whereas production is inference to the most appropriate realization of the robot's (communicative) intention and beliefs.

Given a goal, the abductive reasoner builds up and continually refines a set of partial defeasible expla- nations of the input, conditioned on the verification of the knowledge gaps they contain. This verifica- tion is done by executing test actions, thereby going beyond the initial context [3].

Beliefs express factual information about the state of the world. In our approach, they are relational structures that account for the inherent uncertainty using multivariate probability distributions over properties and their values. They are situated, anchored to a given situation, and mutually inter- linked. We model three degrees of belief attribution, which we call the epistemic status: private, at- tributed and shared. Private beliefs are internal to the robot, and are usually the result of percep- tion or deliberation. Attributed beliefs are beliefs that other agents expressed by communicative means. Finally, shared beliefs form the common ground established in the interaction.

Intentions, on the other hand, are closely related to rational aspects of interaction. Behind every (intentional) action, there is an underlying intention. For instance, when asking a question, the inten- tion is to elicit an answer, i.e. to get to a state in which the question is answered. We use intentions as a unified representation for actions of both the robot and the human.

8. The system

Acknowledgment References

The system architecture is based on CAS (CoSy Architecture Schema) [7]. The schema is essentially a distributed working memory model, where representations are linked within and across the working memories, and are updated asynchronously and in parallel. Using this architecture, a complex, distrib- uted, asynchronous, and heterogenuous system has been built [8].

[1] K. Zhou et.al. (2011). Visual Information Abstraction For Interactive Robot Learning. In proceedings of ICAR 2011, pages 328-334, Tallin, Estonia.

[2] M. Kristan and A. Leonardis (2010). Online Discriminative Kernel Density Estimation. In Proceedings of the ICPR 2010, pages 581-584, Istanbul, Turkey.

[3] M. Janíček (2011). Abductive Reasoning for Continual Dialogue Understanding. In Proceedings of the ESSLLI 2011, Ljubljana, Slovenia.

[4] A. Vrečko, A. Leonardis, and D. Skočaj (2012). Modeling Binding and Cross-modal Learning in Markov Logic Networks. Neurocomputing.

[5] M. Hanheide et.al. (2010). A Framework for Goal Generation and Management. In Proc. of the AAAI Workshop on Goal-Directed Autonomy, Atlanta, Georgia.

[6] M. Brenner and B. Nebel (2009). Continual planning and acting in dynamic multiagent environments. JAAMAS, 19(3):297-331.

[7] N. Hawes and J. Wyatt (2010). Engineering intelligent information processing systems with CAST. Advanced Engineering Infomatics, 24(1):27–39, 2010.

[8] D. Skočaj et. al. (2011). A system for interactive learning in dialogue with a tutor. In IROS 2011, pages 3387 – 3394, San Francisco, CA, USA.

odKDE

Video at http://cogx.eu/results/george

Alen Vrečko, Marko Mahnič Geert-Jan M. Kruijff Kai Zhou Nick Hawes

Albert-Ludwigs-Uni.

Uni. of Birmingham, UK Freiburg, Germany

7. Behaviour generation

The motivation management [5] monitors the beliefs and based on them creates goals and selects which of them to pass on to planning. The planner [6] then builds a plan to satisfy a given goal, which is subsequently executed. In this way a system behaviour is generated and controlled.

Implemented Learning mechanims:

Situated tutor driven learning The human drives the learning.

H: "The box is red.“

Situated tutor assisted learning The robot takes the initiative.

R: "Is this yellow?“

Non-situated tutor assisted learning Introspection - model analysis.

R: “Could you show me something red?“

Autonomous learning

Robot automatically updates the models.

Kinect server

Image 3D points

Coarse PPO

Coarse SOI

CSOI analyser

Object analyser

Visual mediator

Segmentor

Private belief Proto-logical

form

Dialogue SA Binder SA

Visual SA

Dialogue interpretation

Pre-attentive layer

Mediative layer Proto

object Speech

recognition Parsing Word lattice

Linguistic meaning

Output planning Speech

synthesis

Generated utterance

Attributed belief

Shared belief

Planning SA

Goal generator

Epistemic goal

Active goal

Goal management

Planner

Plan

Executor

Learning instruction

Motivation layer

Execution layer Dialogue production

Dialogue comprehension

User intention

Goal realisation

Planning layer Reference

resolution

Model status Robot

intention

Video/Stereo server

Image 3D points

Fine PPO

Fine SOI

Visual object

Object recognizer

Concept model

Object model

Explore goal

Proto belief

Attend instruction

Visual learner/

recognizer

Spatial SA

PTU control

Attentive layer

Binding maps

Arm control Manipulation

SA Situation

awareness

Answer goal Interaction

monitor

Attend goal

Move action

Point action

Visual

processing Situated

dialogue Behaviour generation

Beliefs

6. Binding and reference resolution

2. Visual processing

Visual processing serves to provide the object hypotheses together with visual properties about which the system will subsequently learn. Given that the system learns from a variety of as yet un- known objects, we implemented a generic segmentation scheme, exploiting the fact that objects are presented on planar supporting surfaces [1]. The vision subsystem is an active observer using a wide field of view Kinect sensor and a pair of narrow field of view stereo cameras for foveated vision, both mounted on a pan-tilt unit (PTU).

The system switches between different behaviours:

Answer questions:

Answer the question verbally.

Point at an object.

Learn object properties:

Invoke different learning mechanisms.

Look around:

Look around the scene and try to recognize all objects.

This work was supproted by the EC FP7 IST project CogX-215181 Binding - the ability to combine two or more

modal representations of the same entity into a single shared representation is vital for every cognitive system operating in a com- plex environment. Reference resolution is a process akin to binding that relates informa- tion attributed to another agent to the ro- bot's own perceptions. We developed a gen- eral probabilistic binding method based on Markov Logic Networks and applied it to the problem of reference resolution in our cognitive system [4].

The objects are detected based on the plane-pop-out approach using the Kinect 3D point cloud. Then, every object is attended by moving the PTU accordingly, and segmented in the higher resolution 2D image using the graph-cut algorithm initialized by 2D and 3D data. The fea- tures are then extracted from the segmented image regions and corre- sponding 3D data, which are then used for recognition and learning of objects and their colors and shapes.

inference (binding) union configuration (grounded beliefs) High-level cognition (planning, dialogue, ...)

percept configuration weight

learning concept

grounding

instance grounding

graphical model (MN) predicates, rule templates

cross-modal knowledge:

weighted concept- grounded rules

Perceptual layer sensory information processing modal learning and recognition

recognized features entity estimation

evidence learned

concepts

Referenzen

ÄHNLICHE DOKUMENTE

This approach represents the state-action space of a re- inforcement learning dialogue agent with relational re- presentations for fast learning, and extends it with belief

To enable interactive learning, the system has to be able to, on one hand, perceive the scene and (partially) interpret the visual information and build the

We focus here on the former type of interaction between modalities and present the representations that are used for continuous learning of basic visual concepts in a dialogue with

1, learns and refines visual conceptual models, either by attending to informa- tion deliberatively provided by a human tutor (tutor- driven learning: e.g., H: This is a red box.) or

In addition to per- formance considerations, ontology learning tools need to be fully integrated into the knowledge engineering life-cycle, working in the background and providing

The goal of this con- tribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent

LEARNING SYMMETRY TRANSFORMATIONS USING MUTUAL INFORMATION REGULARISATION The self-consistency of the properties of the generated molecules are checked by using the model to find

The research projects conducted at the IASS build on key national and international sustainability agreements such as the 2030 Agenda for Sustainable Development, the Paris