• Keine Ergebnisse gefunden

information fusion. In probabilistic graphical models, the joint probability distri-bution of all involved variables, including a priori independence assumptions, is modeled, and the parameters are estimated from data. The structure of the model has to be determined beforehand and can thus be designed such that it reflects the prior assumptions about the modeled heterogeneous system. Special models have been proposed for modeling multi-modal data, e.g., [135, 65, 101]. Also, ker-nel systems such as support vector machines have been extended to model-based fusion methods. In multiple kernel learning, the features of each input chan-nel are modeled with separate kerchan-nels, which are best suited for the respective modality; see [61]. Popular methods for parameter estimation in sensor fusion are the Kalman filter and non-linear methods such as the extended Kalman filter, unscented Kalman filter, and the particle filter. These methods are used for esti-mating the state of a system, e.g., the position of a moving object, by integrating uncertain measurements from either a single or multiple sensors; see [77].

experimental results on multiple datasets show that the efficient multi-linear mod-els can reach similar performance as non-linear modmod-els, by reduced computational complexity.

In Chapter 4, we deal with the challenging problem of fusing semantic and sensory information. We propose novel machine learning models for this task based on the combination of tensor factorization for semantic modeling and deep learning models, which work well for modeling sensory data. Experiments on the task of visual relationship detection in images show promising results for this novel direction of research.

All significant contributions of this thesis have been published in conferences as peer-reviewed papers, as listed below.

[9] Stephan Baier, Denis Krompass, and Volker Tresp. Learning representations for discrete sensor networks using tensor decompositions.IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2016

[14] Stephan Baier and Volker Tresp. Factorizing sparse tensors for supervised machine learning. NIPS workshop on tensor methods, 2016

[12] Stephan Baier, Sigurd Spieckermann, and Volker Tresp. Attention-based information fusion using multi-encoder-decoder recurrent neural networks.

Proceedings of the European Symposium on Artificial Neural Networks, Com-putational Intelligence and Machine Learning, 2017

[13] Stephan Baier, Sigurd Spieckermann, and Volker Tresp. Tensor decompo-sitions for modeling inverse dynamics. Proceedings of the Congress of the International Federation of Automatic Control, 2017

[10] Stephan Baier, Yunpu Ma, and Volker Tresp. Improving visual relation-ship detection using semantic modeling of scene descriptions. International Semantic Web Conference, 2017

[11] Stephan Baier, Yunpu Ma, and Volker Tresp. Improving information ex-traction from images with learned semantic models. International Joint Conference on Artificial Intelligence, 2018

I am the main author of all the listed publications. The papers have been written by me, and all the experiments have been conducted by me. At the beginning of each chapter we clearly state where the following contributions are published, and which parts are taken from the original publications.

Attention-based Representation Fusion

In this chapter, we first propose a neural network architecture for fusing latent representations from multiple data channels. We then address the problem of how to dynamically determine what information to fuse. We therefore extend the proposed architecture using a self-attention mechanism, which automatically determines the importance of each data channel based on the current system state.

We apply our model to the modeling of sensor networks, where we derive the latent representations for each sensor station using recurrent neural networks. The main contributions of this chapter are published in:

[12] Stephan Baier, Sigurd Spieckermann, and Volker Tresp. Attention-based information fusion using multi-encoder-decoder recurrent neural networks.

Proceedings of the European Symposium on Artificial Neural Networks, Com-putational Intelligence and Machine Learning, 2017

Sections 2.1, 2.2, 2.5, and 2.6 of this thesis correspond to Sections 1, 2, 3 and 4 of [12], but have been extended and edited to a large extent. Figure 2.1 has been copied and modified from [12]. Section 2.3 and 2.4 are entirely new. I am the main author of [12] . The paper has been written by me, and all the experiments have been conducted by me.

2.1 Introduction

Traditionally, multi-channel data is often processed using multivariate models, where the data from all channels is concatenated at the model input. From an information fusion perspective this corresponds to an early fusion approach. In this chapter, we propose a model-based fusion approach using neural networks.

Our proposed architecture extends the popular encoder-decoder framework by applying dedicated encoders to each input channel. The latent representations from the different channels are then fused and fed into one or multiple decoder functions, which generate the predictions.

One problem in information fusion is that of determining which channels to fuse. An approach to this problem is to determine the cross-correlations between multiple channels and integrate those that show high cross-correlation. However, in dynamic systems, cross-correlations between different channels may vary in time.

It is therefore desirable to also adjust the fusion process dynamically in time. Con-sequently we extend the proposed neural network architecture to incorporate an attention-based fusion layer that assesses the importance of the different input channels dynamically. Attention mechanisms have become popular in neural net-work research over recent years. We apply the attention in a novel way to address the dynamic fusion problem.

We apply our architecture to the modeling of distributed sensor networks, in which information from multiple data streams is combined. The sensor networks considered consist of multiple stations, where each station can measure multiple features at a single location. For each station, we implement an encoder function using recurrent neural networks. The latent representations from the dedicated RNN models are combined in the attenion-based fusion layer. After the represen-tations are fused, they are passed to a decoder model which makes a prediction.

We address the task of sequence-to-sequence prediction where the decoder network is another RNN that predicts the future behavior of a particular sensor station.

Moreover, the proposed architecture can be easily generalized to other settings, such as classification or anomaly detection, by using different decoder functions.

Given the rising number of connected devices and sensors, often referred to as the Internet of Things (IoT), modeling sensor networks and multi-agent systems is

attracting increasing interest. We discuss the parallelizable nature of our proposed architecture in both training and inference contexts and show how this could be helpful when deploying the model in distributed environments.

We demonstrate the effectiveness of the multi-sequence-to-sequence network on three datasets. Two of these datasets are drawn from various sensor stations spread across Quebec and Alberta that measure climatological data. The third dataset contains energy load profiles from multiple zones of a smart energy grid. In our experiments on sensory data, we show that the proposed architecture outperforms both, purely local models for each agent as well as one central model of the whole system. This can be explained by the fact that the local sub-models learn to adapt to the peculiarities of the respective sensor station and, at the same time, integrate relevant information from other stations through the interconnection layer, which allows the model to exploit cross-correlations between the data streams of multiple stations.

The remainder of this chapter is organized as follows. In Section 2.2, we ex-plain both the architecture of our proposed model and the attention-based fusion mechanism. Section 2.3 elaborates on the model’s distributed training and infer-ence. In Section 2.4, we discuss related work. Section 2.5 shows the experimental settings and results for the different experiments. Section 2.6 concludes our work and discusses possible directions of future research.