• Keine Ergebnisse gefunden

PAC Learning and The VC Dimension

N/A
N/A
Protected

Academic year: 2021

Aktie "PAC Learning and The VC Dimension"

Copied!
29
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

PAC Learning and

The VC Dimension

(2)

Fix a rectangle (unknown to you):

Rectangle Game

From An Introduction to Computational Learning Theory by Keanrs and Vazirani

(3)

Draw points from some fixed unknown distribution:

Rectangle Game

(4)

You are told the points and whether they are in or out:

Rectangle Game

(5)

You propose a hypothesis:

Rectangle Game

(6)

Your hypothesis is tested on points drawn from the same distribution:

Rectangle Game

(7)

We want an algorithm that:

With high probability will choose a hypothesis that is approximately correct.

Goal

(8)

Choose the minimum area rectangle containing all the positive points:

Minimum Rectangle Learner:

h

(9)

Derive a PAC bound:

For fixed:

R : Rectangle

D : Data Distribution

ε : Test Error

δ : Probability of failing

m : Number of Samples

How Good is this?

h R

(10)

We want to show that with high probability the area below measured with respect to D is

bounded by ε :

Proof:

h

< ε R

(11)

We want to show that with high probability the area below measured with respect to D is

bounded by ε :

Proof:

h R

< ε/4

(12)

Define T to be the region that contains exactly ε/4 of the mass in D sweeping down from the top of R.

p(T’) > ε/4 = p(T) IFF T’ contains T

T’ contains T IFF

none of our m samples are from T

What is the probability that all samples miss T

Proof:

h R

< ε/4 T’

T

(13)

What is the probability that all m samples miss T:

What is the probability that we miss any of the

rectangles?

Union Bound

Proof:

h R

< ε/4 T’

T

(14)

Union Bound

A B

(15)

What is the probability that all m samples miss T:

What is the probability that we miss any of the

rectangles:

Union Bound

Proof:

h R T

= ε/4

(16)

Probability that any region has weight greater than ε/4 after m samples is at most:

If we fix m such that:

Than with probability 1- δ we achieve an error

rate of at most ε

Proof:

h R T

= ε/4

(17)

Common Inequality:

We can show:

Obtain a lower bound on the samples:

Extra Inequality

(18)

Provides a measure of the complexity of a

“hypothesis space” or the “power” of

“learning machine”

Higher VC dimension implies the ability to represent more complex functions

The VC dimension is the maximum number of points that can be arranged so that f

shatters them.

What does it mean to shatter?

VC – Dimension

(19)

A classifier f can shatter a set of points if

and only if for all truth assignments to those points f gets zero training error

Example: f(x,b) = sign(x.x-b)

Define: Shattering

(20)

What is the VC Dimension of the classifier:

f(x,b) = sign(x.x-b)

Example Continued:

(21)

Conjecture:

Easy Proof (lower Bound):

VC Dimension of 2D Half-Space:

(22)

Harder Proof (Upper Bound):

VC Dimension of 2D Half-Space:

(23)

VC Dimension Conjecture:

VC-Dim: Axis Aligned

Rectangles

(24)

VC Dimension Conjecture: 4

Upper bound (more Difficult):

VC-Dim: Axis Aligned

Rectangles

(25)

What is the VC Dimension of:

f(x,{w,b})=sign( w . x + b )

X in R^d

Proof (lower bound):

Pick {x_1, …, x_n} (point) locations:

Adversary gives assignments {y_1, …, y_n} and you choose {w_1, …, w_n} and b:

General Half-Spaces in (d – dim)

(26)

Extra Space:

(27)

Proof (upper bound): VC-Dim = d+1

Observe that the last d+1 points can always be expressed as:

General Half-Spaces

(28)

Proof (upper bound):

VC-Dim = d+1

Observe that the last d+1 points can always be expressed as:

(29)

Extra Space

Referenzen

ÄHNLICHE DOKUMENTE

If the minimum wage were at 50 percent of the production worker wage in 2012 (again, using CBO projections to produce a full-year 2012 estimate), the federal minimum would be

Minimum-wage workers today may be able to buy DVD players that did not exist in 1979, but at the current level of the minimum wage, they are also far less able to cover

Optimal solution values, median running times t (in seconds) to find and prove these solutions when using different strategies for jump cut separation, and optimality gaps of the

[9] extended the Pr¨ufer coding with permutations in an EA for the time- dependent minimum spanning tree problem, in which edge costs depend on when the edges are included in the

Further the computational complexity of the setup phase in our protocol is more efficient than in protocols based on homomorphic encryption when using efficient OT protocols

The higher minimum wage results in a decline in the wage of uneducated high-ability workers, in an increase in the unemployment rate of low-ability workers, and in an increase in

We derive a lower bound on the number of minimal codewords of a linear code using a geometric characterization of minimal (and non-minimal) codewords.. As a consequence, we obtain

Besides a proof of Conjecture 1.4 one might consider special subclasses of general graphs to obtain stronger bounds on the minimum oriented diameter.. Thomassen, Distances