Experiments - Conversive Hidden non-Markovian models

6.6.1 Application Example

For an application example we resort to the example from used for Decoding in Chapter 5: In the Tester model (cf. Figure 3.1 on page 32), both machines produce items with the same defective probability. The individual defective items have to be attributed to the originating machine in order to better plan the repairs. We again use the same synthetic trace of 1500 symbols (about a day of observations) for this task, but this time we use the Smoothing task to determine the most likely source for each individual observation. This is done with the Smoothing evaluation function for the recursive Smoothing algorithm that is given in Algorithm 11. It simply sums up the Smoothing probability of Proxels having the same discrete state, and then uses a majority vote to determine the most likely source.

The results of this experiment are shown in Table 6.1. Using this Smoothing-based approach 83.8% of the produced items could be attributed to the correct machine, whereas the Decoding-based approach (cf. Section 5.3) only attributed 78.2% of the items correctly. This difference was to be expected, since the Smoothing approach indeed attempts to identify the most likely correct source of each observation, while Decoding only finds the most likelysequence of sources,

Algorithm 11: MostLikelyStateEvalFunc Input: A, T R, R_t, R_t+1, O, t, N, T, K

Result: The most likely discrete state after thetth symbol emission q_t^∗ if t= 0 then return;

p_M1= X

ρ∈Rt s.t. ρ.q=S_{prev f rom}₁

ρ.γ;

p_M2= X

ρ∈Rt s.t. ρ.q=S_{prev f rom}₂

ρ.γ;

if pM1≥pM2then

q^∗_t =Sprev f rom1;

else

q^∗_t =S_{prev f rom}₂;

Decoding Smoothing Correctly Classified 78.2% 83.8%

Incorrectly Classified 21.8% 16.2%

Table 6.1: Results for the attribution of produced items to the correct source for a trace of 1500 observations of the Tester model.

even if some of the elements of this sequence are rather unlikely. The downside of the more accurate Smoothing approach is that its reconstructed sources may not form a valid sequence of internal states. For example, the Smoothing approach may attribute four consecutive observations to the same machine, even though it is virtually impossible in this setting that the other machine has not produced a single item in this time interval. Whether the more accurate Smoothing or the guaranteed consistent Decoding is preferable depends on the actual application scenario.

6.6.2 Comparison of Iterative and Recursive Approaches

To assess the feasibility of the iterative and the recursive Smoothing algorithms their computation time and memory consumption are measured. Here, again, the Car Rental Agency model is used for all experiments, since the Tester model is too small and is solved too quickly for accurate measurements.

In the first experiment the computation time under increasing trace length is assessed. Figure 6.1 shows the results for both algorithms, and the corre-sponding values for the Decoding algorithms for comparison. For Decoding and Smoothing, the iterative algorithm is faster than the recursive one, since it does not have to recompute the Forward probabilities for discarded time steps.

For each type of algorithm (recursive and iterative), the Smoothing algo-rithm is slightly slower than the corresponding Decoding algoalgo-rithm. The expla-nation is that for Decoding, an algorithm has to perform a certain number of Forward steps; the additional backtracking involves only locating a single Proxel per time step, which barely impacts the computation time. For Smoothing on the other hand, in addition to the Forward steps a Backward computation step

0 250 500 750 1000 1250 1500 0

100 200 300 400

Smoothing Recursive Smoothing Iterative Decoding Recursive Decoding Iterative

Trace Length

CPU Time (s)

Figure 6.1: Plot of the computation time (CPU time) required by the itera-tive and recursive Smoothing algorithms for the Car Rental Agency model for different trace lengths. The computation time of the corresponding Decoding algorithms is shown for comparison.

with essentially the same computation time as the Forward step has to be per-formed for every time step. This also explains why the computation times of the Smoothing algorithms differ by a much smaller factor (about three vs. five) than those of the Decoding algorithms: For Smoothing, both algorithm have perform the same number of Forward steps as their Decoding counterparts. But the Smoothing algorithms both additionally have to perform the same number of additional Backward steps, which brings their relative computation times closer together.

Overall, the computation times of the Smoothing algorithms are very similar to those of the Decoding algorithm. And as with the Decoding algorithms, the factor by which the iterative algorithm is faster than the recursive one increases along with the trace length, from 1.8 for 125 observations to 2.8 for 1500 ob-servations. This behavior was to be expected, since the Smoothing algorithm perform as many Forward computation steps as the corresponding Decoding algorithms, and additionally perform at most as many additional (and equally computationally expensive) Backward as Forward steps. Thus, the number of Forward computation steps is the dominant factor on the time complexity, which should thus be equal to that of the Decoding algorithms: O(n) for the iterative approach andO(nlog(n)) for the recursive one in the trace lengthn. Therefore, judging by the computation time alone both approaches should be practically feasible even for vastly longer traces (e.g. 100.000 observations and more).

However, the memory consumption of the algorithms limits the general fea-sibility. Figure 6.2 shows the peak memory consumption of the two algorithms under different trace lengths. They are very similar to the same measurements for the Decoding algorithm (cf. Figure 5.3 on page 70 ). Indeed, the memory consumption for Smoothing is smaller than that for Decoding of the same trace length by a constant factor, because Smoothing does not need to store the age vector of a parent Proxel in each Proxel.

Since the memory consumption and computation time of the Smoothing and

125 250 375 500 625 750 875 1000 1125 1250 1375 1500 0

500 1000 1500 2000 2500

Iterative Algorithm

Trace Length

Peak Memory Usage for Proxel Storage (MB)

125 250 375 500 625 750 875 1000 1125 1250 13751500 0.00

5.00 10.00 15.00 20.00 25.00

Recursive Algorithm

Trace Length

Peak Memory Usage for Proxel storage (MB)

Figure 6.2: Plot of the memory consumption for storing the required Proxels for the Car Rental Agency model under different trace lengths. The diagram on the left-hand side shows the peak memory consumption for a single trace under the iterative approach. The right-hand side shows the average of the peak memory consumptions of 10 traces under the recursive Smoothing algorithm. Note the vastly different scales between the two graphs.

Decoding algorithms are similar, their limits of practical feasibility are close as well: The iterative algorithms have a linear increase in memory consumption with increasing trace length and reached the limit of physical memory present in 2011 commodity hardware at traces with about 1500 observations. For the recursive algorithms on the other hand the memory consumption increases only with the logarithm of the trace length, so that even far longer traces can be processed without a prohibitive increase in memory consumption. The drawback of the recursive algorithms is their slightly increased computational complexity compared to the iterative ones. But since their computational complexity is only (O(nlog(n))) in the number of observations (compared toO(n) of the iterative approaches), this barely negatively impacts the practical feasibility.

Im Dokument Conversive Hidden non-Markovian models (Seite 90-93)