Conclusion - Metric Learning for Structured Data

To investigate H2, we repeated our experiments six times, each time excluding one of the atomic hand motions from the training data for transfer learning. We also exper-imented with omitting more than one class in the training data but observed that no transfer method outperformed the baseline of naively applying the source model to the target space data.

The average results across participants and trials are depicted in Figure 8.4. Ta-ble 8.3shows the results without extension motions in the training data. We observe the following significant effects using a one-sided Wilcoxon signed rank test.

1. If at least 32 data points are available for training,EMtransfer learning outperforms a naive application of the source space model (p<10⁻³).

2. Irrespective of the number of available data points,EMtransfer learning outper-forms a retrained model on the target data (p<₁₀⁻³_).

3. If at least 12 data points are available for training,EMtransfer learning outperforms thea-SVM(p<10⁻³).

4. If extension, pronation, supination, or spread are excluded and at 32 data points are available for training,EMtransfer learning outperformsGLVQtransfer learning function (p<0.01).

In conjunction, these results support H2. We also note again that ARC-t and HFA resulted in errors consistently above 70% on these data, such that our method significantly outperforms these references across all conditions.

9

C O N C L U S I O N S A N D O U T L O O K

In this dissertation, I have addressed the challenge of metric learning for structured data and enhanced the utility of a learned metric. In Chapter3, I have developed a gradient-based metric learning scheme for all sequence edit distancesthat can be expressed in terms of asignature, a differentiablealgebra, and anedit tree grammar. Experimentally, I have shown that this scheme can improve classification of biological sequences and computer programs. Further, I have extended this scheme totreesin Chapter4, decreased the runtime complexity, thus making metric learning applicable to much larger data sets, and parametrized the edit distancein terms of symbol embeddings, which guarantees metric properties, is more interpretable, and simplifies the application to largealphabets. I also demonstrated experimentally that my proposed metric learning scheme outperforms a state-of-the-art method in terms of metric learning for structured data.

Once we have learned a metric, we typically wish to apply it for downstream tasks.

Existing methods already cover mappings to vectorial outputs, such as dimensionality reduction, classification, clustering, and regression. However, mapping to a distance representationas output has not yet been subject to extensive research. In Chapter5, I established such an approach based onGaussian process regressionto perform time series prediction on structured data. Experimentally, I have shown that my proposed scheme outperforms baselines such as one-nearest neighbor regressionand kernel regression.

I applied this novel technique in Chapter 6to support students in learning computer programming. Whenever a student gets stuck before completing a programming task, my proposed scheme can predict what a capable student would do in the student’s situation and I can infer aneditthat guides a student closer to a correct solution along a path that a capable student would take. In experiments on real-world student data, I showed that my proposed model could accurately predict what capable students would do and that the pedagogical quality of the resulting hints was on par with state-of-the-art baselines.

Another challenge in applying a learned metric is that the distribution or representa-tion of target data may differ from the source data on which the metric was learned. In Chapter 7, I have developed a novel framework to address this challenge by learning a transfer mapping from the target space to the source space, such that the learned source space metric is applicable again. I have provided two implementations of this framework, one for transfer learning ongeneralized matrix learning vector quantizationclassifiers, and one for transfer learning on labeled Gaussian Mixture Model. Further, I applied transfer learning in Chapter 8to counteract disturbances in bionic prostheses control.

To date, such disturbances prevent patients from using bionic prostheses to their full potential because the prostheses fail to execute the desired motions in everyday life.

Using transfer learning, I could clean up electrode shifts in the data and thus enhance the accuracy of a bionic prosthesis user interface. I also showed that transfer learning needs much less data and computation time compared to several baselines.

Limitations: The work presented in this dissertation still offers opportunity for further improvement. First, as mentioned in Chapter3, the gradient computation viaADPfor edit distanceslearning is too slow to be applicable for large-scale tasks and our proposed improved version of the method from Chapter 4, embedding edit distance learning (BEDL), has not yet been combined with theADPframework, which is a gap in this work.

Second,BEDLdoes not yet reliably improve classification accuracy on all tasks, which indicates that there are still generalization issues to be addressed.

Third, the time series prediction method via Gaussian process regression (GPR) suggested in Chapter5still relies on an eigenvalue correction, which distorts the space and complicates the application to novel data. Further, our proposed method requires storing all training samples to perform predictions, which may become prohibitive for very large structured datasets. In such large-scale scenarios, a parametric model with an explicit vectorial embedding, such as a recursive neural network, may be more promising.

Fourth, while we could improve predictive performance over several baselines, these results did not translate to significantly better hint quality for intelligent tutoring systems in Chapter6, indicating that the translation from kernel to primal space still could be improved, either by secondary criteria like syntactic correctness or unit test performance, or by using multiple edits instead of a single edit.

Fifth, our transfer learning method proposed in Chapter7is currently limited to linear functions, which may be insufficient for more complicated disturbances. Conversely, a full linear transformation may entail too many free parameters for very simple disturbances like electrode shifts in Chapter8. In this scenario, we could inject more prior knowledge to simplify the problem further and thus achieve better results with even less data, especially less classes to record.

Outlook: Beyond improvements of the methods presented in this paper, this thesis opens up multiple exciting avenues for further research.

First, I have shown that grammars and automata can serve as efficient and general interfaces to compute continuous gradients over discrete structures. In Chapter3, I have used this connection to compute gradients over general stringedit distances. Beyondedit distances, this connection could be useful for any domain that can be modeled in terms of formal grammars, such as computer programs (Aho et al.2006), biological structures (Searls2012), or chemical molecules (Weininger1988). Kusner, Paige, and Hernández-Lobato (2017) have done first promising steps in this direction by modeling chemical molecules via a grammar and then learning continuous vectorial representations for the words produced by said grammar.

Second, this work has explored the connection between representation learning and metric learning. In vectorial metric learning, this connection is obvious since metric learning corresponds to a linear mapping of the input data into an alternative space, i.e. an alternative representation (Bunte et al.2012). However, this connection has not yet been well explored for structured data. Previous work has shown that any pseudo-Euclidean distanceand anykernel, including those for structured data, correspond to an implicit vectorial representation (Pekalska and Duin2005, also refer to Section2.1). In this work, I have developed metric learning foredit distanceson structured data by learning an explicit vectorial representation of symbols (refer to Chapter4), which can be seen as a supervised version of word embedding learning (Mikolov et al.2013; Pennington, Socher, and Manning2014). I have also shown that we can translate affine combinations in thepseudo-Euclideanspace ofedit distancesback to actual structured data (refer to Chapter6). Future work could extend this link between metric learning on and vectorial representations of structured data with the aim to make such representations easier to learn, easier to interpret, and easier to invert.

Third, we have seen that we can interpretedit distancesas shortest paths in a graph

this application in more detail, for example in the form of classroom studies regarding how much students actually profit from edit hints, and by incorporating additional constraints for possible edits, such as syntactic of semantic correctness. Beyond this application, edit distancesprovide an avenue towards interpreting learned models in machine learning more generally. For example, we could ask whicheditswe would need to apply to a structured datum such that it is classified differently, maximizes a certain property, or moves along a desired trajectory in the space of possible structured data.

Finally, I posed the general problem of supervised transfer learning with explicit transfer functions, and achieved a particularly data- and time-efficient expectation maxi-mization transfer learning algorithm in order to make a learned model from one domain applicable in another domain. This makes bionic hand prostheses easy to re-calibrate after everyday disturbances. Future work in this regard could go further and incorporate more domain-specific knowledge regarding the form of the transfer function and evaluate the utility of transfer learning in clinical studies. Supervised transfer learning could also be applicable far beyond prosthetic research. By exploring nonlinear transfer functions, alternative parametrizations, and transfer functions for structured data, supervised trans-fer learning could become a useful tool in transtrans-ferring machine learning models from the lab to actual, real-world applications using only minimal data and computational effort.

Overall, this thesis provides ample opportunity for further research incorporating knowledge from classical grammar theory, representation learning, and application domains to push the boundaries of machine learning on structured data.

P U B L I C AT I O N S I N T H E C O N T E X T O F T H I S T H E S I S

Mokbel, Bassam, Benjamin Paaßen, et al. (2015). “Metric learning for sequences in relational LVQ”. English. In: Neurocomputing 169, pp. 306–322. d o i: 10 . 1016 / j . neucom.2014.11.082.

Paaßen, Benjamin, Bassam Mokbel, and Barbara Hammer (2015a). “A Toolbox for Adap-tive Sequence Dissimilarity Measures for Intelligent Tutoring Systems”. In:Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015). (Madrid, Spain). Ed. by Olga Christina Santos et al. International Educational Datamining Soci-ety, pp. 632–632. u r l:http://www.educationaldatamining.org/EDM2015/uploads/

papers/paper_257.pdf.

— (2015b). “Adaptive structure metrics for automated feedback provision in Java pro-gramming”. English. In:Proceedins of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015). (Bruges, Bel-gium). Ed. by Michel Verleysen.Best student paper award. i6doc.com, pp. 307–312.

u r l:http://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2015-43.pdf.

Göpfert, Christina, Benjamin Paaßen, and Barbara Hammer (2016). “Convergence of Multi-pass Large Margin Nearest Neighbor Metric Learning”. In: Proceedings of the 25th International Conference on Artificial Neural Networks (ICANN 2016). (Barcelona, Spain).

Ed. by Alessandro E.P. Villa, Paolo Masulli, and Antonio Javier Pons Rivero. Vol. 9886.

Lecture Notes in Computer Science. Springer, pp. 510–517.d o i: 10.1007/978-3-319-44778-0_60.

Paaßen, Benjamin, Christina Göpfert, and Barbara Hammer (2016). “Gaussian process prediction for time series of structured data”. In: Proceedings of the 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2016). (Bruges, Belgium). Ed. by Michel Verleysen. i6doc.com, pp. 41–46.

u r l:http://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2016-109.pdf.

Paaßen, Benjamin, Joris Jensen, and Barbara Hammer (2016). “Execution Traces as a Pow-erful Data Representation for Intelligent Tutoring Systems for Programming”. English.

In:Proceedings of the 9th International Conference on Educational Data Mining (EDM 2016).

(Raleigh, North Carolina, USA). Ed. by Tiffany Barnes, Min Chi, and Mingyu Feng.

Exemplary Paper. International Educational Datamining Society, pp. 183–190. u r l: http://www.educationaldatamining.org/EDM2016/proceedings/paper_17.pdf.

Paaßen, Benjamin, Bassam Mokbel, and Barbara Hammer (2016). “Adaptive structure metrics for automated feedback provision in intelligent tutoring systems”. In: Neuro-computing192, pp. 3–13. d o i:10.1016/j.neucom.2015.12.108.

Paaßen, Benjamin, Alexander Schulz, and Barbara Hammer (2016). “Linear Supervised Transfer Learning for Generalized Matrix LVQ”. In:Proceedings of the Workshop New Challenges in Neural Computation (NC²2016). (Hannover, Germany). Ed. by Barbara Hammer, Thomas Martinetz, and Thomas Villmann.Best presentation award, pp. 11–

18. u r l:https://www.techfak.uni- bielefeld.de/~fschleif/mlr/mlr_04_2016.

pdf#page=14.

Prahm, Cosima et al. (2016). “Transfer Learning for Rapid Re-calibration of a Myoelectric Prosthesis after Electrode Shift”. In:Proceedings of the 3rd International Conference on NeuroRehabilitation (ICNR 2016). (Segovia, Spain). Ed. by Jaime Ibáñez et al. Vol. 15.

Converging Clinical and Engineering Research on Neurorehabilitation II. Biosystems

& Biorobotics. Runner-Up for Best Student Paper Award. Springer, pp. 153–157.

d o i:10.1007/978-3-319-46669-9_28.

Paaßen, Benjamin et al. (2017). “An EM transfer learning algorithm with applications in bionic hand prostheses”. In:Proceedings of the 25th European Symposium on Artifi-cial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2017).

(Bruges, Belgium). Ed. by Michel Verleysen. i6doc.com, pp. 129–134. u r l: http : //www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2017-57.pdf.

Paaßen, Benjamin (2018).Revisiting the tree edit distance and its backtracing: A tutorial. arXiv:

1805.06869 [cs.DS].

Paaßen, Benjamin, Claudio Gallicchio, et al. (2018). “Tree Edit Distance Learning via Adaptive Symbol Embeddings”. In:Proceedings of the 35th International Conference on Machine Learning (ICML 2018). (Stockholm, Sweden). Ed. by Jennifer Dy and Andreas Krause. Vol. 80. Proceedings of Machine Learning Research, pp. 3973–3982. u r l: http://proceedings.mlr.press/v80/paassen18a.html.

Paaßen, Benjamin, Christina Göpfert, and Barbara Hammer (2018). “Time Series Predic-tion for Graphs in Kernel and Dissimilarity Spaces”. In:Neural Processing Letters48.2, pp. 669–689. d o i:10.1007/s11063-017-9684-5.

Paaßen, Benjamin, Barbara Hammer, et al. (2018). “The Continuous Hint Factory -Providing Hints in Vast and Sparsely Populated Edit Distance Spaces”. In:Journal of Educational Datamining10.1, pp. 1–35. u r l:https://jedm.educationaldatamining.

org/index.php/JEDM/article/view/158.

Paaßen, Benjamin et al. (2018). “Expectation maximization transfer learning and its application for bionic hand prostheses”. In:Neurocomputing298, pp. 122–133. d o i: 10.1016/j.neucom.2017.11.072.

R E F E R E N C E S

Adamatzky, Andrew (2002). Collision-Based Computing. Berlin/Heidelberg, Germany:

Springer. i s b n: 978-1-4471-0129-1.

Ahmad, A.S. et al. (2014). “A review on applications of ANN and SVM for building electrical energy consumption forecasting”. In: Renewable and Sustainable Energy Reviews33, pp. 102–109. d o i:10.1016/j.rser.2014.01.069.

Ahmad, Farooq and Grzegorz Kondrak (2005). “Learning a Spelling Error Model from Search Query Logs”. In:Proceedings of the Conference on Human Language Technology (HLT 2005). Ed. by Raymond Mooney, pp. 955–962. d o i:10.3115/1220575.1220695.

Aho, Alfred et al. (2006).Compilers: Principles, Techniques, and Tools. 2nd ed. Boston, MA, US: Addison Wesley. i s b n: 978-0321486813.

Aiolli, Fabio and Michele Donini (2015). “EasyMKL: a scalable multiple kernel learning algorithm”. In:Neurocomputing169, pp. 215–224. d o i:10.1016/j.neucom.2014.11.

078.

Aiolli, Fabio, Giovanni Da San Martino, and Alessandro Sperduti (2015). “An Efficient Topological Distance-Based Tree Kernel”. In:IEEE Transactions on Neural Networks and Learning Systems26.5, pp. 1115–1120. d o i:10.1109/TNNLS.2014.2329331.

Akutsu, Tatsuya (2010). “Tree Edit Distance Problems: Algorithms and Applications to Bioinformatics”. In:IEICE Transactions on Information and SystemsE93-D.2, pp. 208–218.

d o i:10.1587/transinf.E93.D.208.

Aleven, Vincent, Bruce M. McLaren, et al. (2006). “The Cognitive Tutor Authoring Tools (CTAT): Preliminary Evaluation of Efficiency Gains”. In:Proceedings of the 8th Inter-national Conference on Intelligent Tutoring Systems (ITS 2006). Ed. by Mitsuru Ikeda, Kevin D. Ashley, and Tak-Wai Chan. Springer, pp. 61–70.d o i:10.1007/11774303_7.

Aleven, Vincent, Ido Roll, et al. (2016). “Help Helps, But Only So Much: Research on Help Seeking with Intelligent Tutoring Systems”. In:International Journal of Artificial Intelligence in Education26.1, pp. 205–223. d o i:10.1007/s40593-015-0089-1.

Augsten, Nikolaus, Michael Böhlen, and Johann Gamper (2008). “The pq-gram Distance Between Ordered Labeled Trees”. In: ACM Transactions on Database Systems 35.1, 4:1–4:36. d o i:10.1145/1670243.1670247.

Bacciu, Davide, Federico Errica, and Alessio Micheli (2018). “Contextual Graph Markov Model: A Deep and Generative Approach to Graph Processing”. In:Proceedings of the 35th International Conference on Machine Learning (ICML 2018). Ed. by Jennifer Dy and Andreas Krause. Vol. 80. Proceedings of Machine Learning Research, pp. 294–303.

u r l:http://proceedings.mlr.press/v80/bacciu18a.html.

Bacciu, Davide, Claudio Gallicchio, and Alessio Micheli (2016). “A reservoir activation kernel for trees”. In: Proceedings of the 24th European Symposium on Artificial Neural Networks (ESANN 2016). Ed. by Michel Verleysen. u r l:http://www.elen.ucl.ac.

be/Proceedings/esann/esannpdf/es2016-172.pdf.

Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf (2003). “Learning to Find Pre-images”. In:Proceedings of the 16th International Conference on Neural Information Processing Systems (NIPS 2003). Ed. by S. Thrun, L. K. Saul, and B. Schölkopf, pp. 449–

456. u r l:https://papers.nips.cc/paper/2417-learning-to-find-pre-images.

Bakır, Gökhan H., Alexander Zien, and Koji Tsuda (2004). “Learning to Find Graph Pre-images”. In: Proceedings of the fourth German Conference on Pattern Recognition (DAGM 2004), pp. 253–261. d o i:10.1007/978-3-540-28649-3_31.

Balcan, Maria-Florina, Avrim Blum, and Nathan Srebro (2008). “A theory of learning with similarity functions”. In: Machine Learning72.1, pp. 89–112. d o i: 10.1007/s10994-008-5059-5.

Barabási, Albert-László and Réka Albert (1999). “Emergence of Scaling in Random Networks”. In:Science286.5439, pp. 509–512. d o i:10.1126/science.286.5439.509.

Barber, David (2012).Bayesian Reasoning and Machine Learning. Cambridge, UK: Cambridge University Press. i s b n: 978-0-521-51814-7. u r l:http://www0.cs.ucl.ac.uk/staff/

d.barber/brml/.

Barnes, Tiffany, Behrooz Mostafavi, and Michael Eagle (2016). “Data-driven domain models for problem solving”. In:Domain Modeling. Ed. by Robert A. Sottilare et al.

Vol. 4. Design Recommendations for Intelligent Tutoring Systems. US Army Research Laboratory, pp. 137–145. i s b n: 978-0-9893923-9-6. u r l:https://gifttutoring.org/

documents/105.

Barnes, Tiffany and John Stamper (2008). “Toward Automatic Hint Generation for Logic Proof Tutoring Using Historical Student Data”. In:Proceedings of the 9th International Conference on Intelligent Tutoring Systems (ITS 2008). Ed. by Beverley P. Woolf et al., pp. 373–382. d o i:10.1007/978-3-540-69132-7_41.

Barnett, Susan and Stephen Ceci (2002). “When and where do we apply what we learn?: A taxonomy for far transfer”. In:Psychological bulletin128.4, p. 612.d o i: 10.1037/0033-2909.128.4.612.

Barrett, Christopher L., Henning S. Mortveit, and Christian M. Reidys (2000). “Elements of a theory of simulation II: sequential dynamical systems”. In:Applied Mathematics and Computation107.2-3, pp. 121–136. d o i:10.1016/S0096-3003(98)10114-5.

— (2003). “ETS IV: Sequential dynamical systems: fixed points, invertibility and equiv-alence”. In:Applied Mathematics and Computation 134.1, pp. 153–171. d o i:10.1016/

S0096-3003(01)00277-6.

Barrett, Christopher L. and Christian M. Reidys (1999). “Elements of a Theory of Com-puter Simulation I: Sequential CA over Random Graphs”. In:Applied Mathematics and Computation98.2-3, pp. 241–259. d o i:10.1016/S0096-3003(97)10166-7.

Bellet, Aurélien, Amaury Habrard, and Marc Sebban (2012). “Good edit similarity learning by loss minimization”. In: Machine Learning89.1, pp. 5–35. d o i: 10.1007/s10994-012-5293-8.

— (2014). A Survey on Metric Learning for Feature Vectors and Structured Data. arXiv:

1306.6709 [cs.LG].

Ben-David, Shai et al. (2006). “Analysis of representations for domain adaptation”. In:

Proceedings of the 19th Advances in Neural Information Processing Systems Conference (NIPS 2006). Ed. by Bernhard Schölkopf, John C. Platt, and T. Hoffman, pp. 137–144.

u r l: https://papers.nips.cc/paper/2983-analysis-of-representations-for-domain-adaptation.

Bengio, Yoshua, Aaron Courville, and Pascal Vincent (2013). “Representation Learning: A Review and New Perspectives”. In:IEEE Transactions on Pattern Analysis and Machine Intelligence35.8, pp. 1798–1828. d o i:10.1109/TPAMI.2013.50.

Bergstra, James and Yoshua Bengio (2012). “Random Search for Hyper-Parameter Op-timization”. In: Journal of Machine Learning Research 13, pp. 281–305. u r l: http : //www.jmlr.org/papers/v13/bergstra12a.html.

Biddiss, Elaine A. and Tom T. Chau (2007). “Upper limb prosthesis use and abandonment:

A survey of the last 25 years”. In:Prosthetics and Orthotics International31.3, pp. 236–

257. d o i:10.1080/03093640600994581.

Biehl, Michael et al. (2015). “Stationarity of Matrix Relevance LVQ”. In:Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN 2015). Ed. by Yoonsuck Choe De-Shuang Huang, pp. 1–8. d o i:10.1109/IJCNN.2015.7280441.

Bille, Philip (2005). “A survey on tree edit distance and related problems”. In:Theoretical Computer Science337.1, pp. 217–239. d o i:10.1016/j.tcs.2004.12.030.

Bishop, Christopher M. (2006).Pattern Recognition and Machine Learning. Berlin/Heidel-berg, Germany: Springer. i s b n: 0387310738.

Blitzer, John, Ryan McDonald, and Fernando Pereira (2006). “Domain Adaptation with Structural Correspondence Learning”. In:Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006). Ed. by Dan Jurafsky and Eric Gaussier, pp. 120–128. u r l:https://aclanthology.info/pdf/W/W06/W06-1615.pdf.

Blöbaum, Patrick, Alexander Schulz, and Barbara Hammer (2015). “Unsupervised Di-mensionality Reduction for Transfer Learning”. In:Proceedings of the 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015). Ed. by Michel Verleysen, pp. 507–512. u r l:http://www.elen.ucl.

ac.be/Proceedings/esann/esannpdf/es2015-134.pdf.

Borgwardt, Karsten and Hans-Peter Kriegel (2005). “Shortest-path kernels on graphs”. In:

Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005). Ed. by Jiawei Han et al. d o i:10.1109/ICDM.2005.132.

Boyer, Laurent, Amaury Habrard, and Marc Sebban (2007). “Learning Metrics Between Tree Structured Data: Application to Image Recognition”. In:Proceedings of the 18th European Conference on Machine Learning (ECML 2007). Ed. by Joost N. Kok et al., pp. 54–66. d o i:10.1007/978-3-540-74958-5_9.

Bunte, Kerstin et al. (2012). “Limited Rank Matrix Learning, discriminative dimension reduction and visualization”. In:Neural Networks26, pp. 159–173. d o i: 10.1016/j.

neunet.2011.10.001.

Casteigts, Arnaud et al. (2012). “Time-varying graphs and dynamic networks”. In: In-ternational Journal of Parallel, Emergent and Distributed Systems27.5, pp. 387–408.d o i: 10.1080/17445760.2012.668546.

Chang, Chih-Chung and Chih-Jen Lin (2011). “LIBSVM: A Library for Support Vector Machines”. In:ACM Transactions on Intelligent Systems and Technology2.3. Software available at http : / / www . csie . ntu . edu . tw / ~cjlin / libsvm, 27:1–27:27. d o i: 10 . 1145/1961189.1961199.

Cho, Kyunghyun et al. (2014). “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation”. In:Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014). Ed. by Alessandro Moschitti, Bo Pang, and Walter Daelemans, pp. 1724–1734.u r l:https://www.aclweb.

org/anthology/D14-1179.

Choudhury, Rohan Roy, Hezheng Yin, and Armando Fox (2016). “Scale-Driven Automatic Hint Generation for Coding Style”. In:Proceedings of the 13th International Conference on Intelligent Tutoring Systems (ITS 2016). Ed. by Alessandro Micarelli, John Stamper, and Kitty Panourgia, pp. 122–132. d o i:10.1007/978-3-319-39583-8_12.

Chung, Junyoung et al. (2015). “A Recurrent Latent Variable Model for Sequential Data”.

In:Proceedings of the 28th Conference on Advances in Neural Information Processing Systems (NIPS 2015). Ed. by C. Cortes et al., pp. 2980–2988. u r l:http://papers.nips.cc/

paper/5653-a-recurrent-latent-variable-model-for-sequential-data.

Clauset, Aaron (2013). “Generative Models for Complex Network Structure”. In: Proceed-ings of the 8th International School and Conference on Network Science (NetSci 2013). u r l: http://www2.imm.dtu.dk/~tuhe/cnmml/pdf/clauset.pdf.

Cortes, Corinna et al. (2008). “Sample Selection Bias Correction Theory”. In:Proceedings of the 19th International Conference on Algorithmic Learning Theory (ALT 2008). Ed. by Yoav Freund et al., pp. 38–53. d o i:10.1007/978-3-540-87987-9_8.

Cover, Thomas and Peter Hart (1967). “Nearest neighbor pattern classification”. In:IEEE Transactions on Information Theory13.1, pp. 21–27. d o i:10.1109/TIT.1967.1053964.

Da San Martino, Giovanni and Alessandro Sperduti (2010). “Mining Structured Data”. In:

Computational Intelligence Magazine5.1, pp. 42–49. d o i:10.1109/MCI.2009.935308.

Damerau, Fred (1964). “A Technique for Computer Detection and Correction of Spelling Errors”. In: Communications of the ACM 7.3, pp. 171–176. d o i: 10 . 1145 / 363958 . 363994.

Davis, Jason et al. (2007). “Information-theoretic Metric Learning”. In:Proceedings of the 24th International Conference on Machine Learning (ICML 2007). Ed. by Claude Sammut and Zoubin Ghahramani, pp. 209–216. d o i:10.1145/1273496.1273523.

— (2010). “Metric learning to Rank”. In:Proceedings of the 27th International Conference on Machine Learning (ICML 2010). Ed. by Stefan Wrobel, Johannes Fürnkranz, and Thorsten Joachims, pp. 775–782.u r l:https://bmcfee.github.io/papers/mlr.pdf.

De Vries, Harm, Roland Memisevic, and Aaron Courville (2016). “Deep learning vector quantization”. In:Proceedings of the 2th European Symposium on Artificial Neural Networks (ESANN 2016). Ed. by Michel Verleysen. u r l: https : / / www . elen . ucl . ac . be / Proceedings/esann/esannpdf/es2016-112.pdf.

Deisenroth, Marc Peter and Jun Wei Ng (2015). “Distributed Gaussian Processes”. In:

Proceedings of the 32nd International Conference on Machine Learning (ICML 2015). Ed. by Francis Bach and David Blei, pp. 1481–1490. u r l:http://proceedings.mlr.press/

v37/deisenroth15.html.

Demaine, Erik D. et al. (2009). “An Optimal Decomposition Algorithm for Tree Edit Distance”. In:ACM Transactions on Algorithms6.1, 2:1–2:19. d o i:10.1145/1644015.

1644017.

Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). “Maximum likelihood from incomplete data via the EM algorithm”. In:Journal of the Royal Statistical Society. Series B39.1, pp. 1–38. u r l:https://www.jstor.org/stable/2984875.

Ditzler, Gregory et al. (2015). “Learning in Nonstationary Environments: A Survey”. In:

IEEE Computational Intelligence Magazine 10.4, pp. 12–25. d o i: 10.1109/MCI.2015.

2471196.

Duan, Lixin, Dong Xu, and Ivor Tsang (2012). “Learning with Augmented Features for Heterogeneous Domain Adaptation”. In:Proceedings of the 29th International Conference on Machine Learning (ICML 2012). (Edinburgh, UK). Ed. by Andrew McCallum, John Langford, and Joelle Pineau. u r l:https://arxiv.org/abs/1206.4660.

Eagle, Michael and Tiffany Barnes (2013). “Evaluation of automatically generated hint feedback”. In:Proceedings of the 6th International Conference on Educational Data Mining (EDM 2013). Ed. by S. K. D’Mello, R. A. Calvo, and A. Olney, pp. 372–374. u r l: http://www.educationaldatamining.org/EDM2013/papers/rn_paper_87.pdf.

Eagle, Michael, Matthew Johnson, and Tiffany Barnes (2012). “Interaction Networks:

Generating High Level Hints Based on Network Community Clustering”. In: Proceed-ings of the 5th International Conference on Educational Data Mining (EDM 2012). Ed. by K. Yacef et al., pp. 164–167. u r l:https://eric.ed.gov/?id=ED537223.

Emms, Martin (2012). “On Stochastic Tree Distances and Their Training via Expectation-Maximisation”. In:Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (ICPRAM 2012). Ed. by Pedro Carmona, Salvador Sánchez, and Ana Fred, pp. 144–153.

Fackler, Paul L. (2005).Notes on matrix calculus. Tech. rep. North Carolina State University.

u r l:http://www4.ncsu.edu/~pfackler/MatCalc.pdf.

Farina, Dario et al. (2014). “The Extraction of Neural Information from the Surface EMG for the Control of Upper-Limb Prostheses: Emerging Avenues and Challenges”. In:

IEEE Transactions on Neural Systems and Rehabilitation Engineering22.4, pp. 797–809.

d o i:10.1109/TNSRE.2014.2305111.

Feragen, Aasa et al. (2013). “Scalable kernels for graphs with continuous attributes”. In:

Proceedings of the 26th conference on Advances in Neural Information Processing Systems (NIPS 2013). Ed. by C. J. C. Burges et al., pp. 216–224.u r l:http://papers.nips.cc/

paper/5155-scalable-kernels-for.

Filippone, Maurizio et al. (2008). “A survey of kernel and spectral methods for clustering”.

In:Pattern Recognition41.1, pp. 176–190. d o i:10.1016/j.patcog.2007.05.018.

Fleming, Malcolm L. and W. Howard Levie (1993).Instructional Message Design: Principles from the Behavioral and Cognitive Sciences. Englewood Cliffs, NJ, USA: Educational Technology Publications. i s b n: 978-0877782537.

Floyd, Robert W. (1962). “Algorithm 97: Shortest Path”. In: Communications of the ACM 5.6, pp. 345–345. d o i:10.1145/367766.368168.

Freeman, Paul, Ian Watson, and Paul Denny (2016). “Inferring Student Coding Goals Using Abstract Syntax Trees”. In: Proceedings of the 24th International Conference on Case-Based Reasoning Research and Development (ICCBR 2016). Ed. by Ashok Goel, M Belén Díaz-Agudo, and Thomas Roth-Berghofer, pp. 139–153. d o i: 10.1007/978-3-319-47096-2_10.

Gallicchio, Claudio and Alessio Micheli (2010). “Graph Echo State Networks”. In: Proceed-ings of the 23rd International Joint Conference on Neural Networks (IJCNN 2010). Ed. by Pillar Sobrevilla et al., pp. 1–8. d o i:10.1109/IJCNN.2010.5596796.

— (2013). “Tree Echo State Networks”. In: Neurocomputing101, pp. 319–337. d o i: 10.

1016/j.neucom.2012.08.017.

Gao, Xinbo et al. (2010). “A survey of graph edit distance”. In: Pattern Analysis and Applications13.1, pp. 113–129. d o i:10.1007/s10044-008-0141-y.

Garcia Duran, Alberto and Mathias Niepert (2017). “Learning Graph Representations with Embedding Propagation”. In:Proceedings of the 30th Conference on Advances in Neural Information Processing Systems (NIPS 2017). Ed. by I. Guyon et al., pp. 5119–5130. u r l: http://papers.nips.cc/paper/7097- learning- graph- representations- with-embedding-propagation.

Garcia, Dan, Brian Harvey, and Tiffany Barnes (2015). “The Beauty and Joy of Computing”.

In:ACM Inroads6.4, pp. 71–79. d o i:10.1145/2835184.

Gardner, Martin (1970). “Mathematical Games – The fantastic combinations of John Conway’s new solitaire game ‘life’”. In:Scientific American223, pp. 120–123.

Gentner, Dedre and Arthur Markman (1997). “Structure mapping in analogy and similar-ity”. In:American psychologist52.1, p. 45. d o i:10.1037/0003-066X.52.1.45.

Gepperth, Alexander and Barbara Hammer (2016). “Incremental learning algorithms and applications”. In:Proceedings of the 24th European Symposium on Artificial Neural Networks (ESANN 2016). Ed. by Michel Verleysen. u r l:http://www.elen.ucl.ac.

be/Proceedings/esann/esannpdf/es2016-19.pdf.

Giegerich, Robert, Carsten Meyer, and Peter Steffen (2004). “A discipline of dynamic programming over sequence data”. In:Science of Computer Programming51.3, pp. 215–

263. d o i:10.1016/j.scico.2003.12.005.

Girard, Agathe et al. (2003). “Gaussian process priors with uncertain inputs-application to multiple-step ahead time series forecasting”. In:Proceedings of the 15th conference on Advances in neural information processing systems (NIPS 2002). Ed. by S. Becker, S.

Im Dokument Metric Learning for Structured Data (Seite 143-177)