• Keine Ergebnisse gefunden

Spatial Data Mining for Customer Segmentation

N/A
N/A
Protected

Academic year: 2022

Aktie "Spatial Data Mining for Customer Segmentation"

Copied!
13
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Spatial Data Mining for Customer Segmentation

Data Mining in Practice Seminar, Dortmund, 2003

Dr. Michael May

Fraunhofer Institut Autonome Intelligente Systeme

(2)

Introduction: a classic example for spatial analysis

Dr. John Snow

Deaths of cholera epidemia

London, September 1854

Infected water pump?

A good representation is the key to solving a problem

Disease cluster

(3)

Good representation because...

Represents spatial relation of objects of the same type

Represents spatial relation of objects to other objects

It is not only

important where a cluster is but also,

what else is there (e.g.

a water-pump)!

Shows only relevant aspects and

hides irrelevant

(4)

Goals of Spatial Data Mining

Identifying spatial patterns

Identifying spatial objects that are potential generators of patterns

Identifying information relevant for explaining the spatial pattern (and hiding irrelevant information)

Presenting the information in a way

that is intuitive and supports further analysis

(5)

Approach to Spatial Knowledge Discovery

Data Mining

+

Geographic Information Systems

= SPIN!

0

0

0 (1 ) p p

p p

n

(6)

UK, Greater Manchester, Stockport

Buildings

Rivers Streets

Hospitals

Person p. Household No. of Cars

Long-term illness Age

Profession Ethnic group Unemployment Education

Migrants

Medical establishment

Shopping areas

(7)

Representation of spatial data in Oracle Spatial

A set of relations R

1

,...,R

n

such that each relation R

i

has a geometry attribute G

i

or an identifier A

i

such that R

i

can be linked (joined) to a relation R

k

having a geometry attribute G

k

– Geometry attributes G

i

consist of ordered sets of x,y-pairs defining points, lines,

or polygons

– Different types of spatial objects are organized in different relations R

i

(geographic layers), e.g. streets, rivers, enumeration districts,

buidlings, and

– each layer can have its own set

of attributes A

1

,..., A

n

and at most

one geometry attribute G

(8)

Stockport Database Schema

ED

TAB01

TAB95 TAB61

...

Water

...

River

Building

Street Shopping Region

Vegetation

=zone_id

=zone_id

=zone_id spatially

interact inside

spatially interacts spatially

interacts

spatially interacts

Attribute data 95 tables with census data,

~8000 attributes

Geographical Layers

85 tables

Spatial Hierarchy

• County

• District

• Wards

• Enumeration district

spatially interact

(9)

Spatial Predicates in Oracle Spatial

A disjoint B, B disjoint A A meets B, B meets A

A overlaps B, B overlaps A A equals B, B equals A

A covers B, B covered by A A covered-by B, B covers A A contains B, B inside A A inside B, B contains A

Distance relation: Minimum distance between 2 points

Topological relation (Egenhofer 1991)

(10)

Typical Data Mining representation

Data Mining for spatial data: strong discrepancy between usual and adequate problem representation

‘spreadsheet data’

exactly 1 table

atomic values

(11)

SPIN! – The Elements

0

0

0 (1 ) p p

p p

n

(12)

1. Spatial Data Mining Platform

(13)

Providing an integrated data mining platform

• Data access to heterogeneous and distributed data sources (Oracle RDBMS, flat file, spatial data)

• Organizing and documenting analysis tasks

• Launching analysis tasks

• Visualizing results

Note: Same software

basis as MiningMart!

(14)

SPIN! Architecture: Enterprise Java Bean-based

Enterprise Java Bean Container Client

Database Database

Workspace Entity

Bean Algorithm

Session Bean

Client Entity Bean Workspace

Algorithm Component

Persistent object Data

JDBC (Connections) RMI/IIOP (References)

Visual Component

Java Swing based Client

Object-relational spatial database (Oracle9i)

JBoss application server

(15)

SPIN! User Interface

Workspace Tree

Point & Click- Tool for

defining

analysis tasks

Property editor

(16)

2. Visual Exploratory Analyis

(17)

Interactive Exploratory Analysis

Combining spatial and non-spatial displays

Variables selected and manipulated by the user

Powerful for low- dimensional

dependencies (3-4)

Scatter Plot

Parallel Coordinate Plot Choropleth maps showing

distribution of variable(s) in space

Displays dynamically linked

(18)

3. Searching for Explanatory

Patterns

(19)

Data Mining Tasks in SPIN!

• Looking for associations between subsets of spatial and non-spatial attributes

 Spatial Association Rules

• A phenomenon of interest (e.g. death rate) is given but it is not clear which of a large number of spatial and non-spatial attributes is relevant for explaining it  Spatial Subgroup Discovery

• A quantitative variable of interest is given and we ask how much this variable changes when one of the relevant independent variables is changed

 Bayesian Local regression

(20)

Subgroup Discovery Search

• Subgroup discovery is a multi-relational approach that searches for probabilistically defined deviation patterns (Klösgen 1996, Wrobel 1997)

• Top-down search search from most general to most specific subgroups, exploiting partial ordering of subgroups (S

1

 S

2

S

1

more general than S

2)

• Beam search expanding only the n best ones at each level of search

• Evaluating hypothesis according to quality function:

T= target group C= concept

T = long-term illness=high C = unemployment=high

n N

n N T

p T

p

T p C

T p

 

)) (

1 )(

(

) ( )

|

(

(21)

Division of labour between Oracle RDBMS and Search Manager

Database Server Search Algorithm

Mining Server

sufficient statistics

• search in hypothesis space

• generation and evaluation of hypotheses (subgroup patterns)

mining query

• Database integration: efficiently organize mining queries

• Mining query delivers statistics (aggregations)

sufficient for evaluating many hypotheses

(22)

Data Mining visualization

Linked Display

Spatial Venn Diagram Subgroup Overview

p(T|C) vs. p(C)

Subgroup

High long-term illness in

districts crossed by M60

(23)

Customer Analysis Rodgau, Germany

(24)

System Demo:

Customer Analysis using

MiningMart and SPIN!

(25)

Summary & Outlook

• SPIN! tightly integrates Data Mining analysis and GIS-based visualization

• Main features:

– A spatial data mining platform

– New spatial data mining algortihms for subgroup discovery, association rules, Baysian MCMC

– New visualization methods

• Integration of Spatial Data allows to get results that could not be achieved otherwise

• MiningMart can usefully applied for some pre-processing tasks

• Future tasks: Integrating spatial preprocessing in MiningMart

Referenzen

ÄHNLICHE DOKUMENTE

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 54. 12.2 Supply

• Spatial object/Geoobject: element to model real world data in geographic information system. • Are described by spatial

• The basic elements and data types are defined in 33 XML-Schema documents (base schemas). • A concrete application has to be derived from the

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

A) Es werden oft Windeln gekauft. B) Der Kauf von Windeln führt zum Kauf von Bier.. B) Der Kauf von Windeln führt zum Kauf

- durch Spatial Data Mining kann ein besseres Verständnis von räumlichen (spatial) und nicht räumlichen (non-spatial) Daten erreicht werden?. Gesunde Zähne ↔ Wohnort

In this article, we describe an overview of methods for visualizing large spatial data sets containing point, line and area phenomena.... The results show that the visualization

Estimate the variance of all the points within that distance Z(x,y) measurement value at location x,y.