Exercises for DW & DM
Institut für Informationssysteme – TU Braunschweig - http://www.ifis.cs.tu-bs.de
Technische Universität Braunschweig Institut für Informationssysteme http://www.ifis.cs.tu-bs.de Wolf-Tilo Balke, Silviu Homoceanu
Exercises for DW & DM Sheet 5 (until 26.01.2012)
Please drop your solution in the silver homework box (second floor where the IfIS is locat- ed) until Tuesday, before the lecture (date is also mentioned above). You may answer in either German or English. You are encouraged to work in teams of 2 students (not more than 2), and send your solution as a team. Please mention in your email the name of both students together with the corresponding inmatriculation numbers.
Exercise 1 (15P)
Simulate the functionality of the Multiple Minimum Supports mining algorithm on the trans- actions provided in Annex 1, presenting each of the 2 steps, as well as the initialization, k=2 and generalization phases for step 1. Minimum support values are also provided in the An- nex 3. φ = 20% and minconf = 60%. (15P)
Exercise 2 (15P)
Simulate the functionality of the GSP mining algorithm on the transactions provided in An- nex 2, considering a min_sup of 2. (15P)
Hint for the generalization step:
Joininig:
Two sequences, s1 and s2 can be joined if after dropping the first item from s1 and the last
item from s2, we obtain the same sequence. E.g.:
<bc> and <ca> can be joined since by dropping b from <bc> and a from <ca> we obtain <c>.
The joined result is <bca>. <ba> and <(ab)> can also be joined and we obtaine <b(ab)>
Pruning:
Similar to the apriori algorithm <bca> passes pruning only if <bc>, <ba> and <ca> ∈ F2
<b(ab)> passes pruning only if <ba>, <bb> and <(ab)> ∈ F2
Exercises for DW & DM
Institut für Informationssysteme – TU Braunschweig - http://www.ifis.cs.tu-bs.de
Technische Universität Braunschweig Institut für Informationssysteme http://www.ifis.cs.tu-bs.de Wolf-Tilo Balke, Silviu Homoceanu
Annex 1
Annex 2
SID Sequence
1 <(dc)b(ac)>
2 <bc(bac)>
3 <(ab)a>