Continuous Inverse Ranking Queries in Uncertain Streams
Thomas Bernecker*, Hans-Peter Kriegel*,
Nikos Mamoulis**, Matthias Renz* and Andreas Zuefle*
*)
Ludwig-Maximilians-Universität München (LMU) Munich, Germany
http://www.dbs.ifi.lmu.de
{bernecker, kriegel, renz, zuefle}@dbs.ifi.lmu.de
**)
University of Hong Kong (HKU) Hong Kong
http://www.cs.hku.hk nikos@cs.hku.hk
1. Motivation: Probabilistic Inverse Ranking 2. Continuous Inverse Ranking Queries
– Initial Computation
– Incremental Processing
3. Experimental Evaluation 4. Summary
– Inverse Ranking: Return the position of the query object q w.r.t. the score function S
– Probabilistic Inverse Ranking: Find all possible positions of q
• Example: Stock rating system
q
Stock I Stock II Stock III
Risk
q
Rank 1? → 0 % Rank 2? → 50 % Rank 3? → 50 % Rank 4? → 0 % S = Chances - Risk
• Probabilistic Inverse Ranking (PIR) Query
– Probabilistic database DB where |DB| = n
– Uncertain object o: m alternative locations (discrete uncertainty) or pdf (continuous uncertainty)
– Query object q
– Score function S : DB → R0+
– Definition: ∀ i = 1, ..., k : P(q is on rank i w.r.t. S ) =
– There exist exactly i - 1 objects o Ԗ DB with S(o) > S(q)
• Challenge: Application to dynamic data
– General stream model with location updates retrieved at a time t – P(q is on rank i at time t) =
– Initial computation
– Incremental processing
( )
iPqt
( )
iPqt
– Object o Ԗ DB : = P(S(o) > S(q) at time t)
– j objects have been processed so far (oj is the latest)
– Successive processing by the Poisson Binomial Recurrence (PBR):
( )
⎪⎩
⎪⎨
⎧
−
⋅ +
⋅
>
∨
<
=
∧
=
=
−
−
− 1 else
0 if
0
0 0
if 1
1 , 1
, 1 ,
t o t
j i t
o t
j i t
j i
j
j P p
p P
j i
i
j i
P
i out of j:
S(o) > S(q)
i-1 out of j-1:
S(o) > S(q) and
S(oj) > S(q)
i out of j-1:
S(o) > S(q) and
S(oj) ≤ S(q)
t
po
q
• j = n (∀i = 0,...,k-1):
⇒ PIR result for q ⇒ runtime: O(k·n)
• Optimizations:
– ⇒ o has no effect on the rank of q
– ⇒ increment counter
• General case ( ) ⇒ process o by PBR: ∀i = 0,...,k-1 :
P(i objects processed by PBR have a higher score than q) =
• Initial PIR result:
( )
1,
, = P = P i +
Pitj itn qt
( ) ( )
⎩⎨
⎧ − − + ≤ ≤ + +
= 0 else
1 1
if
1 C C i C k
i i P
P
t t
t t
t PBR q
) (i PPBRt
= 0
t
po
=1
t
po Ct
1 0 < pot <
1 .
1 = 0
t
po 0
2 =
t
po 0.6
3 =
t
po 1
4 =
t
po Ct = 0
q o1
o3 o2
o4
• Example: n = 4, k = 2, j = 1
• j = 1: P0t,1 = P−t1,0 ⋅ pot1 +P0t,0 ⋅
(
1− pot1)
= 0⋅0.1+1⋅0.9 = 0.9 1.
1 = 0
t
po 0
2 =
t
po 0.6
3 =
t
po 1
4 =
t
po
(
1 1)
1 0.1 0 0.9 0.11 1,0
0 , 0 1
,
1t = Pt ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
= 0 Ct
• j = 1: P0t,1 = P−t1,0 ⋅ pot1 +P0t,0 ⋅
(
1− pot1)
= 0⋅0.1+1⋅0.9 = 0.9 1.
1 = 0
t
po 0
2 =
t
po 0.6
3 =
t
po 1
4 =
t
po
(
1 1)
1 0.1 0 0.9 0.11 1,0
0 , 0 1
,
1t = Pt ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
= 0 Ct
• Example: n = 4, k = 2, j = 3
• j = 1:
• j = 3:
(
1 1)
0 0.1 1 0.9 0.91 0,0
0 , 1 1
,
0t = P−t ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
1 .
1 = 0
t
po 0
2 =
t
po 0.6
3 =
t
po 1
4 =
t
po
(
1 1)
1 0.1 0 0.9 0.11 1,0
0 , 0 1
,
1t = Pt ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
(
1 3)
0 0.6 0.9 0.4 0.363 0,1
1 , 1 2
,
0t = P−t ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
(
1 3)
0.9 0.6 0.1 0.4 0.583 1,1
1 , 0 2
,
1t = Pt ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
= 0 Ct
• j = 1:
• j = 3:
(
1 1)
0 0.1 1 0.9 0.91 0,0
0 , 1 1
,
0t = P−t ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
1 .
1 = 0
t
po 0
2 =
t
po 0.6
3 =
t
po 1
4 =
t
po
(
1 1)
1 0.1 0 0.9 0.11 1,0
0 , 0 1
,
1t = Pt ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
(
1 3)
0 0.6 0.9 0.4 0.363 0,1
1 , 1 2
,
0t = P−t ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
(
1 3)
0.9 0.6 0.1 0.4 0.583 1,1
1 , 0 2
,
1t = Pt ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
=1 Ct
• Example: n = 4, k = 2
• j = 1:
• j = 3:
• Initial PIR result:
(
1 1)
0 0.1 1 0.9 0.91 0,0
0 , 1 1
,
0t = P−t ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
1 .
1 = 0
t
po 0
2 =
t
po 0.6
3 =
t
po 1
4 =
t
po
(
1 1)
1 0.1 0 0.9 0.11 1,0
0 , 0 1
,
1t = Pt ⋅ pot +Pt ⋅ − pot = ⋅ + ⋅ =
P
(
1 3)
0 0.6 0.9 0.4 0.36( )
03 0,1
1 , 1 2
, 0
t PBR t
o t
t o t
t P p P p P
P = − ⋅ + ⋅ − = ⋅ + ⋅ = =
(
1 3)
0.9 0.6 0.1 0.4 0.58( )
13 1,1
1 , 0 2
, 1
t PBR t
o t
t o t
t P p P p P
P = ⋅ + ⋅ − = ⋅ + ⋅ = =
=1 Ct
( )
1 = PBRt(
1−1−1)
= PBRt( )
−1 = 0t
q P P
P
( )
2 = PBRt(
2−1−1)
= PBRt( )
0 = 0.36t
q P P
P
compute ∀i = 1,...,k
• Naive solution: Apply PBR ⇒ O(n) ∀i = 1,...,k
• Enhanced solution: just consider update of o
⇒ O(1) ∀i = 1,...,k
– Phase 1
• Remove effect of old value from ∀i = 0,...,k-1
• Obtain intermediate result – Phase 2
• Incorporate effect of new value in
• Obtain new PIR result
( )
iPqt
) (i PPBRt )
ˆ 1 (i PPBRt+
) ˆ 1 (i PPBRt+ )
1( i Pqt+
t
po
+1 t
po
• Phase 1: Three cases
1. ⇒
2. ⇒ and
3. ⇒ remove from
( ) ( ) ( ) (
ot)
t PBR t
o t
PBR t
PBR i P i p P i p
P = ˆ −1 ⋅ + ˆ ⋅ 1−
( ) ( ) ( )
t o
t o t
PBR t
t PBR
PBR p
p i
P i
i P
P −
⋅
−
= −
1 ˆ 1 ˆ
( ) ( )
t o t
t PBR
PBR p
P P
= − 1 0 0
ˆ
= 0
t
po
( )
i P( )
iPˆPBRt = PBRt 1
0 < pot <
( )
i P( )
iPˆPBRt = PBRt
=1
t
po Ct+1 = Ct −1
( )
i PPBRtt
po
1. ⇒
2. ⇒ and
3. ⇒ compute applying PBR
• New PIR result:
1 = 0
+ t
po
1 0 < pot+1 <
( )
i P( )
iPPBRt+1 = ˆPBRt
1 =1
t+
po PPBRt+1
( )
i = PˆPBRt( )
i Ct+1 = Ct +1( )
i PPBRt+1( ) ( )
1( ) (
1)
1 ˆ 1 + ˆ 1 +
+ = PBRt − ⋅ ot + PBRt ⋅ − ot
t
PBR i P i p P i p
P
( ) ( )
⎩⎨
⎧ − − + ≤ ≤ + +
= + + + +
+
else
0
1 1
if
1 1 1 1
1
1 P i C C i C k
i P
t t
t t
t PBR q
• Example: n = 4, k = 2
1 .
1 = 0
t
po 0
2 =
t
po 0.6 1 0.2
3
3 = → ot+ =
t
o p
p 1 2 0
4
4 = → ot+ =
t
o p
p Ct =1
q o1
o3 o2
o4 q
o1
o3 o2
o4
– Phase 1 (Case 3):
– Phase 2 (Case 3):
– PIR result:
1 .
1 = 0
t
po 0
2 =
t
po 0.6 1 0.2
3
3 = → ot+ =
t
o p
p 1 2 0
4
4 = → ot+ =
t
o p
p Ct =1
( )
0 ˆ( )
1 1 ˆ( )
0(
1 1)
0 0.2 0.9 0.8 0.721
3
3 + ⋅ − = ⋅ + ⋅ =
⋅
−
= + +
+ t
o t
PBR t
o t
PBR t
PBR P p P p
P
( )
1 ˆ( )
0 1 ˆ( )
1(
1 1)
0.9 0.2 0.1 0.8 0.261
3
3 + ⋅ − = ⋅ + ⋅ =
⋅
= + +
+ t
o t
PBR t
o t
PBR t
PBR P p P p
P
( ) ( ) ( )
1 . 4 0
. 0
6 . 0 9 . 0 58 . 0 1
ˆ 0 1 1
ˆ
3
3 = − ⋅ =
−
⋅
= − t
o
t o t
PBR t
t PBR
PBR p
p P
P P
( ) ( )
9 . 4 0
. 0
36 . 0 1
0 0 ˆ
3
=
− =
= t
o t t PBR
PBR p
P P
( )
1 = 0 → t+1( )
1 = 0t P
P Pt
( )
2 = 0.36 → Pt+1( )
2 = 0.72• Example: n = 4, k = 2
– Phase 1 (Case 1):
– Phase 2 (Case 2):
– PIR result:
1 .
1 = 0
t
po 0
2 =
t
po 0.6 1 0.2
3
3 = → ot+ =
t
o p
p 1 2 0
4
4 = → ot+ =
t
o p
p Ct =1
( )
0( )
0 0.72ˆPBRt+1 = PPBRt+1 = P
( )
1 0 2( )
1 2(
1 1 0)
2( )
0 0.721 = → + = + − − = + =
+ t
PBR t
PBR t
q t
q P P P
P
( )
2 0.72 2( )
2 2(
2 1 0)
2( )
1 0.261 = → + = + − − = + =
+ t
PBR t
PBR t
q t
q P P P
P
( )
0 ˆ 1( )
0 0.722 = + =
+ t
PBR t
PBR P
P
( )
1 ˆ 1( )
1 0.262 = + =
+ t
PBR t
PBR P
P
( )
1( )
1 0.26ˆPBRt+1 = PPBRt+1 = P
= 0 Ct
0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2
0 1.000 2.000 3.000 4.000 5.000
time per update [ms]
enhanced naive
0 2.000 4.000 6.000 8.000
0 1.000 2.000 3.000 4.000 5.000
time to process the full stream [ms]
database size n
enhanced naive
• dimensions = 2, m = 10, σ = 5, k = n, buffer = 3
0 50.000 100.000 150.000 200.000 250.000
0 1.000 2.000 3.000 4.000 5.000 6.000
time to process the full stream [ms]
enhanced naive
0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 80.000
0 2 4 6 8 10
time to process the full stream [ms]
standard deviation σ
enhanced naive
• n = 10,000, dimensions = 2, m = 10, k = n, buffer = 3
update costs of O(k) instead of O(k·n)
• The framework can be adapted to other query types, e.g. the probabilistic threshold inverse ranking query
• Future work: approximate approach using lower and upper bounds for the probabilities and applying the concept of Generating Functions