• Keine Ergebnisse gefunden

Efficient Computation of Reverse Skyline Queries

N/A
N/A
Protected

Academic year: 2022

Aktie "Efficient Computation of Reverse Skyline Queries"

Copied!
22
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Efficient Computation of Reverse Skyline Queries

Evangelos Dellis (University of Marburg, Germany)

Bernhard Seeger (University of Marburg, Germany)

(2)

Outline

Skyline

Dynamic Skyline Query

Reversed Skyline Query

Branch-and-Bound for Reversed Skylines

Reversed Skylines with Approximations

Experimental Results

(3)

Skyline

Important new class of queries

Given: a set of d-dimensional points

Result: points that are not dominated by others

x dominates y

x is as good as y in all dimensions and better in at least one dimension

Example (collection of used cars)

Goal: Cheapest car with lowest mileage

price mileage

(4)

Motivation (customer perspective)

ideal used car: 120 hp, 30000 km, build 2005, …

Find all cars that are close to customer’s specification

Skyline query relative to a reference point ref

x dominates y iff x is not farer from ref than y in in all dimensions and in at least one dimension closer to ref

Example (Used Car Database)

2. Dynamic Skyline Query

mileage

30k ref

(5)

The distance function

Distance function f: R

d

R

d

f(q) = (0,…,0)

f(c1,…,ci-1,xi,ci+1,…,cd) linear decreasing in xi, xi < qi

f(c1,…,ci-1,xi,ci+1,…,cd) linear increasing in xi, xi > qi

Generalization to a more general class is possible

Without loss of generality

f(x) = (|x

1

-q

1

|, |x

2

-q

2

|,…,|x

d

-q

d

|)

q

f

(6)

Motivation (customer perspective)

ideal used car: 120 hp, 30000 km, build 2005, …

Find all cars which are close to the customer’s specification

Skyline query relative to a reference point ref

x dominates y iff x is not farer from ref than y in in all dimensions and in at least one dimension closer to ref

Example (Used Car Database)

Dynamic Skyline Query

hp mileage

30k ref DSL(ref)

(7)

3. Reverse Skyline Query

Motivation (dealer perspective)

Given: the preferences of customers, the collection of used cars

Does it make sense to offer a car X to one of my customers?

Car X is interesting, if it is in the skyline of a preference.

hp mileage

70

preferences used cars X

(8)

Reverse Skyline Query

Monochromatic Problem

Given a set P of d-dimensional points and a query point q

Reverse Skyline query of q

RSL(q) = points whose skyline contains q

Two Algorithms

Assumption: R-tree on set P

Branch-and-bound algorithm (BBRS)

Reversed Skyline Search with Approximations (RSSA)

(9)

4. BBRS: Branch-and-Bound algorithm

Assumption

Multidimensional index (e.g. R-tree) on point set P

Goal

Processing reversed skyline of point q without transformation

Global Skyline GSL( q)

points that are not globally dominated

point x globaly dominates y,

if ε in {-1, 1}d exists such that for all i: 0 ≤ εi (xi - qi) ≤ εi (yi- qi)

e

f h

d a c

b

q

(10)

Important Properties

RSL(q) GSL(q)

A point a GSL(q) is not in RSL(q) if

there is a b ∈ P such that for all i: |b

i

– a

i

| < |a

i

– q

i

|.

f

e

g h d a c

b

q

(11)

Algorithm BBRS

Given: query point q, point set P

Return the reversed skyline RSL(q) Sketch

Candidate generation:

branch-and-bound computation of the global skyline GSL(q)

For each candidate p in GSL(q) perform a boolean window query

Results

Correctness

Minimum number of candidates

(12)

5.Reverse Skyline with Approximations

Important property

If any s from DSL(p) dominates q p is not in RSL(q)

p

DSL(p) q

DSL(p) q

(13)

Approximations

For each p we keep a subset of DSL(p) of constant size

Parameter k

Filter Step

If q dominates one of the samples p is in RSL(q)

If a sample dominates q p is not in RSL(q)

Otherwise, call the refinement step

q

q

q DSL(p)

p

(14)

Refinement Step

Empty range query

q

p

Instead of one big range query, up to 2

d

small range queries

(15)

Dynamic Maintenance

Insertion of a new point x

Algorithm

Compute the global skyline GSL(x)

For every a ∈ GSL(x) examine the approximation of DSL(p).

If x dominates at least one sample Update the approximation

f

g h d a c

x

x DSL(a)

a

(16)

Computing Approximations

d=2

An algorithm based on the dynamic programming paradigm produces an optimal approximation.

d>2

Greedy-algorithm

Iteratively add the point with the maximum approximation gain

Related literature

Jagadish et al.: Optimal Histograms with Quality Guarantees, VLDB 1998

Xuemin Lin, Yidong Yuan, Qing Zhang, Ying Zhang : Selecting

Stars: The k Most Representative Skyline Operator, ICDE

2007

(17)

6. Experiments

Data sets

Real Data

CarDB: d = 2; N = 50000

NBA: d = 4; N = 17000

Synthetic Data

Uniform distribution: d=2,…,4; N = 80000

Cluster distribution: d = 2,…,4; N = 80000

Queries

100 reversed skyline queries

Implementation

XXL library (newest version on request)

(18)

RSSA algorithm

Performance as a function of k

(19)

Size of the reversed skyline

in comparison to the size of the global skyline

(20)

Comparison RSSA vs. BBRS

Average number of I/Os (logarithmic scale)

(21)

Comparison RSSA vs. BBRS

Performance as a function of dimensionality

(22)

Conclusions

Reverse Skylines are important for finding interesting points

Dealer perspective:

What kind of items are interesting to my customers?

Two Algorithms

BBRS

Adaptation of the original BBS algorithm

RSSA

Filter-and-refinement paradigm

Preprocessing approximations of skylines

Updates are expensive

Future Work

Accurate Approximation of skylines for d > 2

Bichromatic Reversed Skylines

Referenzen

ÄHNLICHE DOKUMENTE

Furthermore, because pre-Christian Judaism equated preexistent wisdom with Torah, Davies drew the logical conclusion that Paul, in describing Christ as the New Torah,

12 doing, we distinguish between four levels of car quality: new vehicles, used cars sub- mitted by dealers for inspection up to three months before purchase, those privately

Thus social capital is not treated as another &#34;risk or resilience factor&#34; but as the context in which young people make decisions about risk, and navigate their way into

Pending that decision, the EU and its Member States fully support the OPCW Action Plan on National Implementation by providing assistance to other States Parties in meeting

We apply this idea to model the preferences of query answers indirectly, by modeling the preferences over the contexts that entail them.. In a nutshell, we divide an EL knowledge

Since the mixed integer problem is based on the weak infinitesimal decrease in the sense of generalized gradients, it has the important restriction that it can only have a

Abstract: We present an optimal control based algorithm for the computation of robust domains of attraction for perturbed systems. We give a sufficient condition for the continuity

The knowledge of this connection between history and identity has led female authors to the writing of historical novels since this genre gives them the perfect opportunity