Efficient Computation of Reverse Skyline Queries
Evangelos Dellis (University of Marburg, Germany)
Bernhard Seeger (University of Marburg, Germany)
Outline
Skyline
Dynamic Skyline Query
Reversed Skyline Query
Branch-and-Bound for Reversed Skylines
Reversed Skylines with Approximations
Experimental Results
Skyline
Important new class of queries
Given: a set of d-dimensional points Result: points that are not dominated by others x dominates yx is as good as y in all dimensions and better in at least one dimension
Example (collection of used cars)
Goal: Cheapest car with lowest mileageprice mileage
Motivation (customer perspective)
ideal used car: 120 hp, 30000 km, build 2005, … Find all cars that are close to customer’s specificationSkyline query relative to a reference point ref
x dominates y iff x is not farer from ref than y in in all dimensions and in at least one dimension closer to refExample (Used Car Database)
2. Dynamic Skyline Query
mileage
30k ref
The distance function
Distance function f: Rd R
d
f(q) = (0,…,0)
f(c1,…,ci-1,xi,ci+1,…,cd) linear decreasing in xi, xi < qi
f(c1,…,ci-1,xi,ci+1,…,cd) linear increasing in xi, xi > qi
Generalization to a more general class is possible
Without loss of generality
f(x) = (|x
1-q
1|, |x
2-q
2|,…,|x
d-q
d|)
q
f
Motivation (customer perspective)
ideal used car: 120 hp, 30000 km, build 2005, … Find all cars which are close to the customer’s specificationSkyline query relative to a reference point ref
x dominates y iff x is not farer from ref than y in in all dimensions and in at least one dimension closer to refExample (Used Car Database)
Dynamic Skyline Query
hp mileage
30k ref DSL(ref)
3. Reverse Skyline Query
Motivation (dealer perspective)
Given: the preferences of customers, the collection of used cars Does it make sense to offer a car X to one of my customers?Car X is interesting, if it is in the skyline of a preference.
hp mileage
70
preferences used cars X
Reverse Skyline Query
Monochromatic Problem
Given a set P of d-dimensional points and a query point qReverse Skyline query of q
RSL(q) = points whose skyline contains qTwo Algorithms
Assumption: R-tree on set P Branch-and-bound algorithm (BBRS) Reversed Skyline Search with Approximations (RSSA)4. BBRS: Branch-and-Bound algorithm
Assumption
Multidimensional index (e.g. R-tree) on point set PGoal
Processing reversed skyline of point q without transformationGlobal Skyline GSL( q)
points that are not globally dominated point x globaly dominates y,if ε in {-1, 1}d exists such that for all i: 0 ≤ εi (xi - qi) ≤ εi (yi- qi)
e
f h
d a c
b
q
Important Properties
RSL(q) ⊆ ⊆ ⊆ ⊆ GSL(q)
A point a ∈ GSL(q) is not in RSL(q) if
there is a b ∈ P such that for all i: |b
i– a
i| < |a
i– q
i|.
f
e
g h d a c
b
q
Algorithm BBRS
Given: query point q, point set P
Return the reversed skyline RSL(q) Sketch
Candidate generation:
branch-and-bound computation of the global skyline GSL(q)
For each candidate p in GSL(q) perform a boolean window query
Results
Correctness
Minimum number of candidates
5.Reverse Skyline with Approximations
Important property
If any s from DSL(p) dominates q p is not in RSL(q)
p
DSL(p) q
DSL(p) q
Approximations
For each p we keep a subset of DSL(p) of constant size
Parameter kFilter Step
If q dominates one of the samples p is in RSL(q) If a sample dominates q p is not in RSL(q) Otherwise, call the refinement stepq
q
q DSL(p)
p
Refinement Step
Empty range query
q
p
Instead of one big range query, up to 2d small range queries
Dynamic Maintenance
Insertion of a new point x
Algorithm
Compute the global skyline GSL(x) For every a ∈ GSL(x) examine the approximation of DSL(p).If x dominates at least one sample Update the approximation
f
g h d a c
x
x DSL(a)
a
Computing Approximations
d=2
An algorithm based on the dynamic programming paradigm produces an optimal approximation.d>2
Greedy-algorithmIteratively add the point with the maximum approximation gain
Related literature
Jagadish et al.: Optimal Histograms with Quality Guarantees, VLDB 1998
Xuemin Lin, Yidong Yuan, Qing Zhang, Ying Zhang : Selecting
Stars: The k Most Representative Skyline Operator, ICDE
2007
6. Experiments
Data sets
Real Data CarDB: d = 2; N = 50000 NBA: d = 4; N = 17000 Synthetic Data Uniform distribution: d=2,…,4; N = 80000 Cluster distribution: d = 2,…,4; N = 80000Queries
100 reversed skyline queriesImplementation
XXL library (newest version on request)RSSA algorithm
Performance as a function of k
Size of the reversed skyline
in comparison to the size of the global skyline
Comparison RSSA vs. BBRS
Average number of I/Os (logarithmic scale)
Comparison RSSA vs. BBRS
Performance as a function of dimensionality
Conclusions
Reverse Skylines are important for finding interesting points
Dealer perspective:What kind of items are interesting to my customers?