Possible Extensions - Conversive Hidden non-Markovian models

Alternative Approaches to the Recursive Decoding Algorithm In this chapter, the basic iterative Decoding algorithm has been argued to have a too high memory footprint, and the recursive divide and conquer algorithm has been developed as an alternative. Yet, several other approach are feasible to reduce

the memory footprint of the Decoding task.

First, merging of Proxels with the modified Forward step effectively discards all but one successor Proxel with a given state. Thus, some Proxels may not have successors in the next time step. Without successors these can never be part of the most likely path of internal states and can consequently be discarded.

When being discarded,their former predecessor Proxels may in turn no longer have successors and may be discarded as well. The whole process of discarding successorless Proxels can continue up to the Proxel set of the first time step.

Such an individual Proxel discarding scheme may be implemented through refer-ence counting of the Proxels, or may occur naturally in programming languages that feature a tracing garbage collector [85]. Depending on the model struc-ture, this approach may discard a substantial fraction of the model’s Proxels, reducing the approach’s memory footprint. And since relevant Proxels are never discarded (as opposed to the recursive Decoding algorithm which also discards relevant Proxels and recomputs them later), the links to predecessor Proxels could directly be implemented as pointers and need not replicate the state of the precessor Proxel, further reducing memory consumption. The downside of this approach is its increased complexity and the unpredictability of how many Proxel can actually be discarded. Furthermore, memory for Proxels needs to be allocated per Proxel (instead of using a region-based approach to memory management), further reducing the efficiency of the algorithm.

Second, other Proxel discarding schemes are possible. For example, one may decide to perform the initial modified Forward computation for allntime steps and to retain every√

nth time step as a “checkpoint”. The backtracking part of the Decoding path then initially requires only the reconstruction of the set of the √

n time steps between the last checkpoint and the final time step.

All of those are kept in memory and the Proxels on the most likely path are determined of all those time steps. Afterwards, the Proxel sets for all these time steps can be discarded and those for the time steps between the last and the last but one checkpoint can be reconstructed in order to determine their Proxels on the most likely path, and so on for all checkpoint intervals up to the first time step. Overall, this approach requires the concurrent storage of 2√

n time step Proxel sets and each of the Proxel sets has to be computed at most twice (once to generate the checkpoints, and once for the backtracking between the checkpoints). The approach is generalizable to a hierarchy of m levels of checkpoints, yielding a memory complexity ofO(m^m√

n) and a time complexity ofO(mn) time steps.

Third, the original iterative CHnMM Decoding algorithm may be used and Proxel sets of time steps that are not currently used may simply be stored on background storage devices such as hard disks or even digital tape, which both feature abundant storage capacity. Since all Proxels of a time step can be read or written at the same time, random access is not required and even those mechanical storage systems can sustain a relatively high throughput. However, even their linear read/write throughout is usually one to two orders of magnitude lower than that of PC RAM. Furthermore, writing data to disc usually requires some kind of serialization, which further slows down the computation.

And finally, instead of requiring links to parents and backtracking each Proxel may directly store the most likely path that lead to it. With this, the most likely path can easily be found after the modified Forward computation has reached the final time step, since it is already completely stored in the

Proxel with the highest probability for that time step. Thus, no backtracking is required and time steps whose successors have been computed can be discarded as with the original Forward algorithm for the Evaluation task. The downside of this approach is that the size of Proxels grows with the trace length. Thus, for longer traces, it might be unfeasible to even keep the Proxels of a single time step in memory. And since the list of predecessor Proxels grows in size, in a na¨ıve implementation, the time complexity of generating a Proxel will grow linearly with the number of observations in the trace.

Extensions to the Decoding Task The algorithms that solve the Decoding task may be slightly modified to solve further tasks on CHnMMs.

First, since the observations are caused by the completion of activities and not by remaining in a state (as with HMMs), a practitioner may be interested in the most likely sequence of completed activities instead of the most likely sequence of discrete states passed. Fortunately, the latter can easily be converted into the former: for CHnMMs, at most one activity causes the state change from one stateSito another stateSj. Thus, if on the most likely path thenth discrete state isSi and then+ 1th discrete state isSj, then then+ 1th activity whose completion caused that state change must have been aij. This lookup can be performed for every pair of adjacent most likely states to retrieve the sequence of most likely activities.

Second, it might be of interest to not only find the one most likely path of internal system states, but to find the n most likely paths. In order to find those, it is not sufficient to find the n Proxels of the final time step with the highest probabilities and to backtrack the corresponding paths, because the modified Forward algorithm discards all Proxels that are not locally part of the most likely path and thus no Proxel in the final time step may even exist for the second most likely tonth most likely path. Instead, Proxels would have to separately store the path probabilities of thenmost likely paths through them along with the predecessors’ discrete state and age vector for alln paths. On Proxel merging, only those paths that do not fall into thenmost likely list are discarded. With this approach, it is guaranteed that Proxels exist in the final time step for allnmost likely paths, and that those paths can be backtracked.

Multiple of those topnpaths may end with the same Proxel, but do not need to.

Third, in addition to the most likely path itself, the Decoding algorithm may be modified to compute the conditional probability of that path given the observation sequence. This probability can be useful to assess whether the most likely path is overwhelmingly likely to have caused the observation, or whether it is just slightly more likely than many other paths. That conditional probability for a trace ofnobservations is formally:

P(Q|O) = P(Q∩O)

P(O) = sⁿP(Q∩O)

sⁿP(O) =P(Q∩O) P(O)

Here,sⁿ is the unknown scaling factor introduced through the virtual instanta-neous state change probability of each of thenForward computation steps (cf.

Equation 4.2 on page 44). With its help the equation shows that the desired conditional probability can be computed as a fraction of two virtual probabili-ties even though the scaling factor used in computing these virtual probabiliprobabili-ties

remains unknown. The virtual probability of the numerator is the one computed for the Proxel of the final time step that ends the most likely path as part of the modified Forward computation. It is thus already computed as a side effect of solving the Decoding problem. The denominator is the Evaluation probabil-ity as computed in Chapter 4. So, the probabilprobabil-ity of the most likely internal path given an observation sequence can be computed alone from (intermediate) results of the existing Decoding and Evaluation algorithms.

Im Dokument Conversive Hidden non-Markovian models (Seite 76-79)