• Keine Ergebnisse gefunden

Generalizations and Specializations of Patterns

MiTemP : Mining Temporal Patterns

4.4 Generalizations and Specializations of Patterns

Similar to Dehaspe [Deh98] we define a quasi-order “more general than” for the set of patterns. Referring to Dehaspe [Deh98, p. 80] for a given set L and a binary relation onL,L, is a quasi-order if and only if is reflexive and transitive:

∀x∈ L:xx (Reflexivity)

∀x, y, z ∈ L:xy and yz ⇒xz (Transitivity)

Our pattern description consists of the basic pattern, the temporal restriction, and the concept restriction. All these parts must be taken account in the definition of the generalization (sub-pattern) relation:

Definition 4.37 (Generalization Relation). Let p1 = (cp1,T R1,CR1) and p2 = (cp2,T R2,CR2) be two temporal patterns with conjunctive pattern sizes s1 = size(cp1) ands2 =size(cp2) andcp1 =ap1,1∧. . .∧ap1,s1 andcp2 =ap2,1∧. . .∧ap2,s2 p1 subsumes p2 if and only if s1 s2 and there is an injective mapping μ : {1, . . . , s1} → {1, . . . , s2} such that ∀i, j :i < j =⇒μ(i)< μ(j) (the order must be preserved).

Furthermore, ∀i ∈ {1, . . . , s1} : pi1,i = pi2,μ(i) with ap1,i = (pi1,i, args1,i) and ap2,μ(i)= (pi2,μ(i), args2,μ(i)). There must exist a substitution θ such that (ap1,1, . . . , ap1,s1)θ = (ap2,μ(1), . . . , ap2,μ(s1)). It is also required that the temporal relation sets ofp1 are supersets of the mapped ones inp2: ∀i, j ∈ {1, . . . , s1}:T Ri,j ⊇ T Rμ(i),μ(j)

with i < j, (i, j,T Ri,j)∈ T R1, and (μ(i), μ(j),T Rμ(i),μ(j))∈ T R2.

Let t1 = arg(cp1) = (t1,1, . . . , t1,l1) and t2 = arg((ap2,μ(1), . . . , ap2,μ(s1))) = (t2,1, . . . , t2,l

2) be the tuples of terms of the conjunctive patterns. Then∀i: (t1,i, ci1,i) CR1 =(t2,i, ci2,i)∈ CR2(is-a(ci2,i, ci1,i)∨ci2,i =ci1,i).

In other words, a pattern p1 subsumes another one p2 if there exists an order-preserving mapping of the elements in the conjunctive pattern of the general pattern to the special pattern so that the conjunctive pattern subsumes the conjunction of the mapped elements. Additionally, the temporal restrictions for all predicate pairs of the general pattern must be supersets of the corresponding temporal restriction in the special pattern and the concept restrictions for each argument of the conjunctive pattern must subsume the corresponding concept in the concept restriction of the special pattern. A proper “more general than” relation " exists iff:

p1 "p2 iff p1 p2 p2 p1.

We also define direct, proper more general than relations and denote them by

"1:

tp1 "1 tp2 ⇐⇒ tp1 "tp2∧tp3 :tp1 "tp3∧tp3 "tp2

The "1-relation can be used for enumerating the pattern space. It allows for systematically searching for frequent patterns starting from the most general pattern .

The pattern mining algorithm has to perform a search through the pattern space.

We define five different refinement operations in order to specialize a pattern: leng-thening, temporal refinement, variable unification, concept refinement, and instanti-ation. If applied to a pattern p, each of these operations leads to a set of specialized patterns which are subsumed by p. These operations are similar to those defined by Lee [Lee06], but in our case predicates have a temporal extent and thus, interval relations are used for temporal restrictions and Lee’s operations do not include the concept refinement as defined here. We follow Lee’s notation w.r.t. the refinement operations [Lee06].

Definition 4.38 (Refinement Operator). A refinement operator ρ : Ltp 2Ltp maps a pattern p of the pattern language to a set of patterns such that it holds ρ(p)⊆ {p | p"1 p}.

Definition 4.39 (Lengthening). Let tp = ((ap1, . . . , apm),T R,{(t1, ci1), . . . ,(tn, cin)}) be a temporal pattern. The lengthening operator ρL adds an atomic pat-tern to the conjunctive patpat-tern: ρL(((ap1, . . . , apm),T R,{(t1, ci1), . . . ,(tn, cin)})) = {((ap1, . . . , api1, ap, api, . . . , apm),T R,{(t1, ci1), . . . ,(tn, cin),(V1, ci1), . . . ,(Varity, ciarity)})}withap = (pi,(V1, . . . , Varity)),pt = (pi,(ci1, . . . , ciarity)), andpi ∈ PI. Variables V1, . . . , Varity must not occur in any of the previously existing atomic pat-terns and must be mutually unequal. The new temporal restriction is defined as

T R ={(1,2,T R1,2), . . . ,(m, m+ 1,T Rm,m+1)} with:

T Rk,l =

⎧⎪

⎪⎪

⎪⎪

⎪⎨

⎪⎪

⎪⎪

⎪⎪

T Rk,l if k, l < i T Rk,l1 if k < i, l ≥i T Rk1,l1 if k, l≥i

IRolder if (k =i∨l =i)∧apnew,k >lex apnew,l IR if (k =i∨l =i)∧apnew,k lex apnew,l with 1≤k < (m+ 1) and 1< l≤(m+ 1), k < l.

The new temporal restriction in the definition above keeps the temporal relations for the existing atomic pattern pairs and introduces new interval relation sets for the new atomic pattern in combination with all existing ones. Depending on the lexicographic order, the new interval relation set is set to IRolder orIR.

Definition 4.40 (Temporal Refinement). Let p = (cp,T R,CR) be a temporal pattern with n = size(cp) atomic patterns. The set of temporal refinements for p is defined as ρT = {(cp,T R,CR) | ∃!i, j : (i, j,T Ri,j) ∈ T R ∧ |T Ri,j|+ 1 =

|T Ri,j| ∧ T Ri,j ⊂ T Ri,j∧ ∀k, l: (k =i∨l=j) =⇒(k, l,T Rk,l)∈ T R} with i < j and k < l.

Definition 4.41 (Variable Unification). Lettp= (cp,T R,CR) be a temporal pat-tern andV1, V2be variables occurring incp. Thevariable unificationoperator unifies two variables V1 and V2 with V1 = V2: ρU((cp,T R,CR)) = {(cpθ,T R,CR) | θ = {V1/V2}} where V1 and V2 occur in cp.

In the concept restriction, the entry of V1 must be removed and V2 must be up-dated according to the previously defined concepts of V1 and V2: CR = CR \ {(V1, ci1),(V2, ci1)} ∪ {(V2, ci2)} with ci2 = ci2 ⇐⇒ is-a(ci2, ci1) and ci2 = ci1 ⇐⇒ is-a(ci1, ci2). The variable to be unified must be compatible: is-a(ci1, ci2) is-a(ci2, ci1).

Definition 4.42 (Concept Refinement). Let (cp,T R,{(t1, ci1), . . . ,(tn, cin)}) be a temporal pattern. Aconcept refinement replaces one of thenconcepts in the concept restriction by one of its direct sub-concepts: ρC((cp,T R,{(t1, ci1), . . . ,(tn, cin)})) = {(cp,T R,CR) | CR ={(t1, ci1), . . . ,(ti1, ci1),(ti, cii),(ti+1, cii+1), . . . ,(tn, cin)}, is-a(cii, cii)} with 1≤i≤n.

It is possible to calculate the number of concept refinements for a term in a temporal pattern by computing the distance of the current concept to the most special concept of the predicate templates of the pattern:

Definition 4.43 (Concept Refinement Level). Let tp = ((ap1, . . . , apn),T R,CR) be a temporal pattern with atomic patterns api = (pti,(ti,1, . . . , ti,arity)) and pti = (pii,(cii,1, . . . , cii,arity))∈P T. The set of concept identifiers for a term tin the tem-poral pattern tp is then defined asCIt={cij,k |tj,k =t}. The most special concept iscispecif and only if∀ci∈ CIt:is-a(cispec, ci). As it is not allowed that two concepts mutually subsume each other (Def. 4.15), the distance between two concepts in the concept hierarchy is defined as: dist(ci1, ci2) :=|{ci ∈CI |is-a(ci1, ci)∧ci =ci2}|. The concept refinement level for a concept with identifier ci w.r.t. a temporal pattern tp is defined as: crl(t, ci, tp) :=dist(ci, cispec).

Definition 4.44 (Instantiation). Let tp = (cp,T R,CR) be a temporal pattern, V be a variable occurring in cp, and o ∈ O be an object occurring in the dynamic scene. The binding of a variableV to an objectois denoted asinstantiation. Similar to a variable unification, the set of refined patterns is defined by a substitution:

ρI((cp,T R,CR)) = {(cp θ,T R,CRθ)| θ ={V /o}} whereV must occur in cp and o ∈ O.

Example 4.18. Let pbe a temporal pattern p= (approaches(A, B) inBallCon-trol(C, D), {(1,2,{older, head-to-head})},{(A, object),(B, object),(C, player),(D, ball)}). Then these are examples of valid refinements:

Lengthening: pL = (approaches(A, B)∧inBallControl(C, D)∧ inBallCon-trol(E, F), {(1,2,{older, head-to-head}),(1,3,{older, head-to-head}),(2,3, { older, head-to-head})},{(A, object),(B, object),(C, player),(D, ball),(E, play-er), (F, ball))})

Temporal refinement: pT = (approaches(A, B)∧inBallControl(C, D),{(1,2, {older})}, {(A, object),(B, object),(C, player),(D, ball)})

Unification: pU = (approaches(A, B) inBallControl(B, D),{(1,2,{older, head-to-head})},{(A, object),(B, player),(D, ball)})

Concept Refinement: pC = (approaches(A, B)∧inBallControl(C, D),{(1,2, {older, head-to-head})},{(A, ball),(B, object),(C, player),(D, ball)})

Instantiation: pI = (approaches(A, B) inBallControl(C, b),{(1,2,{older, head-to-head})},{(A, object),(B, object),(C, player),(b, ball)})

The combination of the five refinement operations form the refinement operator on the pattern language Ltp: ρ(p) =ρL(p)∪ρT(p)∪ρU(p)∪ρC(p)∪ρI(p).

One important property in pattern mining approaches is the anti-monotonicity w.r.t. the support of specialization operators. It is used, e.g., in Apriori in order

to prune the search space [AS94]: If an itemset is found to be not frequent, all its specializations cannot be frequent due to the anti-monotonicity property.

Theorem 4.3 (Anti-monotonicity of the refinement operator). All refinements of a pattern have the same or a smaller support than the pattern itself, i.e.,∀sp∈ρ(p) : supp(p)≥supp(sp).

Proof 4.3. In order to prove the anti-monotonicity property, it must be shown that for each of the five refinement operations the support of the resulting patterns cannot increase in comparison to the support of the original patternp. It is obvious that a pattern can only have a match where a generalization of the pattern has a match as for each atomic pattern in the conjunctive pattern we must find a matching predicate instance so that a valid substitution of the conjunctive pattern can be found and the temporal restriction and concept restriction are satisfied. There is only one operation which changes the size of the conjunctive pattern, namely the lengthening operation. In all other cases, the size of the conjunctive pattern does not change.

We show for both cases that the support cannot increase.

1. Temporal refinement, variable unification, concept refinement, instantiation:

In all these refinements, the size of the pattern does not change. A refined patternp can only have a match wherepalso has a match, i.e.,matches(p) matches(p). For each matchm ∈matches(p) there are only two possibilities:

It either also satisfies the additional refinement, i.e., m matches(p) or it does not, i.e.,m ∈matches(p). In particular, these cases are possible:

(a) Temporal refinement: LetT Ri,j ⊂ T Ri,j be the refined temporal restric-tion with T Ri,j =T Ri,j\tr and matchM={(1, pred1), . . . ,(n, predn)} with predi = (pi, o1, . . . , om, si, ei, b). If ir(si, ei,sj, ej) = tr then M ∈ matches(p), otherwise M ∈matches(p).

(b) Variable unification: Let V1 and V2 be the two variables in p before unification. For each match m of the pattern p a substitution θ with {V1/o1, V2/o2} ⊆ θ has been performed. After unification, only those matches ofpwhere both variables are substituted by the same object can also be matches of p, i.e., if and only ifo1 =o2 m∈matches(p).

(c) Concept refinement: Let CR = {(o1, ci1), . . . ,(oi, cii), . . . ,(on, cin)} and CR ={(o1, ci1), . . . ,(oi, cii), . . . ,(on, cin)} be the concept restrictions of p and p, respectively, withis-a(cii, cii). If the corresponding objectoi is still an instance of the sub-concept ci, i.e., instance-of(oi, ci), then M ∈ matches(p).

(d) Instantiation: Let V be the variable in p that has been instantiated by o1 and let o2 be the instance that has been used in the substitution θ

with {V /o2} ⊆θ in order to match then M ∈matches(p) if and only if o1 =o2.

2. Lengthening: As lengthening increases the size of the conjunctive pattern, the matches of p cannot be directly used as matches for p. The reason is that p has one more atomic pattern and there is no corresponding predicate in the existing matches of p. Nevertheless, p can only have matches that extend a match M ∈ matches(p) as, of course, the conjunctive pattern without the added atomic pattern apnew must also match. Thus, matches(p) can be seen as the relevant set of sub-matches where matches of p can occur. Ignoring the temporal restriction and concept restriction for simplicity a match M can only be a valid match of p if there is a match M ∈ matches(p) with

∀i : (i, predi) ∈ M =⇒ ∃j : (j, predi) ∈ M with i j and there exists a predicate (k, prednew) ∈ M with prednew = (pi, o1, . . . , on, s, e, true) which occurs concurrently to the match interval of M. It is required that e (smax−w) and s < emin+w.

If a predicate satisfying these conditions exists, it must be shown that it cannot extend the validity interval of match M. The validity interval of a match M is vi = (smaxi, emini+w] by definition. The following cases can occur:

s smaxi: Then smaxi = smaxi, i.e., the lower bound of the validity interval does not change.

s > smaxi: Then smaxi > smaxi, i.e., the interval length decreases.

e emini: Then emini = emini, i.e., the upper bound of the validity interval does not change.

e < emini: Then emini < emini, i.e., the interval length decreases.

Thus, it has been shown that the validity intervals of the matches ofp can only be within the validity intervals of the matches of p and it follows that supp(p) supp(p).

Fig. 4.10 illustrates an existing match (AandB) and three cases of an additional predicate C. C1 and C3 are not relevant as they do not occur concurrently to the previous match interval. C2 illustrates the only situation where the new interval lies within the previous match interval. As shown in the proof above, it does not matter if C2 starts before or ends after the match interval.

From Def. 4.36 it follows directly that ∀sp ρ(p) : f req(p) f req(sp). It also holds that any pattern that can be generated by more than one application of

Figure 4.10: Different cases for a match applying the lengthening operation the refinement operator must have a frequency equal to or less than the original pattern’s frequency:

pq =⇒f req(p)≤f req(q)