Conclusion - Online learning of latent linguistic structure with approximate search

G´omez-Rodr´ıguez and Fern´andez-Gonz´alez (2015) present a non-deterministic oracle for Covington’s (2001) unrestricted parsing algorithm which could be argued to be transition-based (Nivre, 2008), but is strictly slower with anO(n²)time complexity with regard to the input.

Finally, as an alternative to dynamic oracles, recent work has also focused on develop-ing approximate dynamic oracles usdevelop-ing machine learndevelop-ing techniques. The basic idea is to use machine learning to try to decide what the latent transitions should be. This involves ideas such as searching for the best transitions in the presence of mistakes (Straka et al., 2015), or trying to directly learn a function that returns the highest possible mistake in the presence of mistakes (Le and Fokkens, 2017). Recently Yu et al. (2018) combined these ideas using reinforcement learning, where a function is learned using the gold standard tree as features.

same as when using a static oracle. In a broad sense, the conclusion is that the non-deterministic oracles are never harmful compared to their static counterparts, although they sometimes also do not yield any improvements.

A secondary result of our experiments is a thorough comparison of all static and non-deterministic oracles. The general result could be summed up by saying that fewer swaps in the training sequences tends to improve the performance. This is apparent from the comparison between the static oracles EAGER and LAZY, corroborating and extending Nivre et al.’s (2009) results. While the static MINIMAL oracle theoretically reduces the number of swaps even more, the reduction is in practice rather small due to the nature of the treebanks and since the difference in swaps when moving from LAZYto MINIMALis rather minor. The results from the experiments with non-deterministic oracles also points to he fact that fewer swaps leads to better performance. We saw this in the analysis of Hungarian, where the ND-ALLoracle had a tendency to overswap greatly, yielding more swaps and also worse results than the EAGERoracle.

Chapter 5

Joint Sentence Segmentation and Dependency Parsing

5.1 Introduction

In the previous chapter we studied the utility of non-deterministic oracles for transition-based dependency parsing. The novelty with respect to previous work was that we used the idea of latent structures for learning search-based transition-based dependency parsers. In this chapter we will look at another aspect that the framework from Chapter 2 is concerned with: the update methods required and their importance vis-`a-vis the length of the sequences that need to be learned. We will extend the dependency parsing task to not just parse single sentences, but to parse a sequence of tokens (i.e., a document), where the beginnings and ends of the sentences are not known.

The default approach to parse documents is to build a pipeline of NLP components, solving a number of sub-tasks sequentially. Such a pipeline would start with a sentence boundary detector which splits the input document into sentences. Then, each sentence would be fed through a tokenizer followed by a part-of-speech tagger and morpholog-ical analyzer only after which the parser would step in. When working with carefully copy-edited text documents, sentence boundary detection can be viewed as a minor pre-processing task in such a pipeline, solvable with very high accuracy. However, when dealing with the output of automatic speech recognition or “noisier” texts such as blogs and emails, non-trivial sentence segmentation issues do occur. Dridan and Oepen (2013), for example, show that fully automatic preprocessing can result in considerable drops in parsing quality when moving from well-edited to less-edited text.

Two possible strategies to approach this problem are (i) to exploit other cues for

sen-tence boundaries, such as prosodic phrasing and intonation in speech (Kol´aˇr et al., 2006) or formatting cues in text documents (Read et al., 2012), and (ii) to emulate the human ability to exploit syntactic competence for segmentation. By coupling the prediction of sentence boundaries with syntax we will aim for the latter. The basic intuition is that segmentations that would give rise to suboptimal syntactic structures will also be more difficult to parse. Therefore, erroneous segmentations can be caught early during search and discarded in favor of segmentations where the syntactic structure receives a high score by the parsing model.

Our technical approach will be to extend the transition system from the previous chapter to predict sentence boundaries and syntaxjointly. We will refine the transition system by a dedicated transition to label sentence boundaries and augment the states to keep track of this information. We characterize the necessary preconditions for the transition system in order to keep the resulting output well-formed.

Although this joint system and, consequently, the machine-learning problem are by and large similar to what we saw in the previous chapter, they differ strongly in terms of the length of the transition sequences. We instantiate the framework from Chapter 2 sim-ilarly as in the previous chapter, using both a greedy, classifier-based model as a baseline and a beam search parser as the contrastive system. We evaluate the update methods for the approximate search setting and find, similarly to the results on coreference resolu-tion, that the update methods that discard training data are inadequate for this problem as they fail to outperform the baseline. However, when we apply DLaSO we find that the beam search parser outperforms the baseline.

From the computational linguistics perspective, the joint system allows us to test the utility of syntax when predicting sentence boundaries. We demonstrate empirically that syntactic information can make up to a large extent for missing or unreliable cues from punctuation. The joint system allows us to test the influence of syntactic information on the prediction of sentence boundaries as compared to a pipeline baseline where both tasks are performed independently of each other. With a thoughtful selection of data sets and baselines for the sentence segmentation problem, we are able to demonstrate that syntactic information is helpful for the task of sentence segmentation.

For our analysis, we use the Wall Street Journal as the standard benchmark set and as a representative for copy-edited text. We also use the Switchboard corpus of transcribed dialogues as a representative for data where punctuation cannot give clues to a sentence boundary predictor. Other types of data that may exhibit this property to varying de-grees are web content data, e.g. forum posts or chat protocols, or (especially historical) manuscripts. While the Switchboard corpus gives us a realistic scenario for a setting with

unreliable punctuation, the syntactic complexity of telephone conversations is rather low compared to the Wall Street Journal. Therefore, as a controlled experiment for assess-ing how far syntactic competence alone can take us if we stop trustassess-ing punctuation and capitalization entirely, we also perform joint sentence boundary detection/parsing on a lower-cased, no-punctuation version of the Wall Street Journal.

5.2 Joint Transition-based Sentence Segmentation and

Im Dokument Online learning of latent linguistic structure with approximate search (Seite 127-131)