• Keine Ergebnisse gefunden

Computer Vision –

N/A
N/A
Protected

Academic year: 2022

Aktie "Computer Vision –"

Copied!
78
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Computer Vision –

eine Herausforderung in der Künstlichen Intelligenz

Prof. Carsten Rother

Computer Vision Lab Dresden Institute of Artificial Intelligence

Computer Vision – a hard case for AI 11/12/2013

(2)

Roadmap for this lecture

• A few more words on the history of AI and subareas of AI

• An introduction to Computer Vision

• What is it?

• Why is it hard?

• How can we solve it?

• What can we do with it?

• Roadmap for the remaining lecture

(3)

Roadmap for this lecture

• A few more words on the history of AI and subareas of AI

• An introduction to Computer Vision

• What is it?

• Why is it hard?

• How can we solve it?

• What can we do with it?

• Roadmap for the remaining lecture

11/12/2013 Computer Vision – a hard case for AI 3

(4)

From first lecture

(5)

Going back to 1973

• Sir James Lighthill report to the British Parliament

11/12/2013 Computer Vision – a hard case for AI 5

Full report on Youtube: http://www.youtube.com/watch?v=FLnqHzpLPws&list=PL27303EC6EC90FD5A

The general purpose robot is a mirage

“Ein Roboter der alles kann ist eine Illusion”

(6)

Going back to 1973

• Sir James Lighthill report to the British Parliament

(7)

What do we have today … Personal Conclusion

• He is correct … we don’t have the general purpose robot.

• AI Research split into many sub/related areas:

Machine Learning, Computer Vision, … (more later)

• In some areas we are doing a very good job:

• Natural Language Processing (NLP)

• Playing chess

• In some areas turned out to be very hard:

• Robotics

• Computer Vision seems like one of the hardest ones (a few success stories come later)

11/12/2013 Computer Vision – a hard case for AI 7

(8)

Scene understanding … in the 70s

(9)

Scene understanding - today

11/12/2013 Computer Vision – a hard case for AI 9

We are getting there … 40 years later

[Xiao et al. NIPS 2012]

(10)

Today: Topics / Subareas in AI

Applications:

• Natural Language Processing

• Planning

• Computer Vision

• Robotics

• Biology

• Human-Computer Interaction

Theory:

• Logic

• Machine Learning

• Probability Theory

• Decision Theory

• Automated Reasoning

Models:

• Knowledge representation

• Undirected graphical models

• Directed Graphical models

• Unstructured models Algorithms:

• Search

• Discrete Optimization

• Continuous Optimization

• Probabilistic Inference

(11)

Today: Topics / Subareas in AI

11/12/2013 Computer Vision – a hard case for AI 11

Applications:

• Natural Language Processing

• Planning

• Computer Vision

• Robotics

• Biology

• Human-Computer Interaction

Theory:

• Logic

• Machine Learning

• Probability Theory

• Decision Theory

• Automated Reasoning

[derived from first lecture]

Models:

• Knowledge representation

• Undirected graphical models

• Directed Graphical models

• Unstructured models Algorithms:

• Search

• Discrete Optimization

• Continuous Optimization

• Probabilistic Inference

• Learning

• AI overlaps with many disciplines

• There is not one unique, overarching theory

• AI has impact in many domains

(12)

Books for the following lecture

• Artificial Intelligence: A modern Approach Russell, Norvig (Third Edition, English)

(we cover: (parts of) sections: 4,5,6)

• Pattern recognition and machine learning, Bishop. Springer 2006.

• Learning from data: A short course, Abu-Mostafa, Magdon- Ismail,Hsuan-Tien Lin. AMLbook.

• Markov Random Fields for Vision and Image Processing, Blake, Kohli, Rother. MIT-Press 2011

(13)

Roadmap for this lecture

• A few more words on history of AI and subareas of AI

• An introduction to Computer Vision

• What is it?

• Why is it hard?

• How can we solve it?

• What can we do with it?

• Roadmap for the remaining lecture

11/12/2013 Computer Vision – a hard case for AI 13

(14)

What is computer Vision?

(Potential) Definition:

Developing computational models and algorithms

to interpret digital images and visual data in order

to understand the visual world we live in.

(15)

What is computer Vision?

11/12/2013 Computer Vision I: Introduction 15

(Potential) Definition:

Developing computational models and algorithms

to interpret digital images and visual data in order

to understand the visual world we live in.

(16)

What does it mean to “understand”?

Physics-based vision:

Geometry Segmentation

Camera parameters Emitted light (sun)

Surface properties: Reflectance, material

Semantic-based vision:

Objects: class, pose Scene: outdoor,…

Attributes/Properties:

(Potential) Definition:

Developing computational modelsand algorithmsto

(17)

Image-formation model

11/12/2013 Computer Vision I: Introduction 17

[Slide Credits: John Winn, ICML 2008]

Image

Very many sources of

variability

(18)

Image-formation model

Scene type Scene geometry

Street scene

(19)

Image-formation model

11/12/2013 Computer Vision I: Introduction 19

[Slide Credits: John Winn, ICML 2008]

Scene type Scene geometry Object classes

Street scene

Sky

Building×3 Road

Sidewalk Tree×3 Person×4

Bicycle Car×5 Bench

Bollard

(20)

Image-formation model

Street scene

Sky

Building×3 Road

Sidewalk Tree×3 Person×4

Bicycle Car×5 Bench

Bollard

Scene type

Scene geometry

Object classes

Object position

Object orientation

(21)

Image-formation model

11/12/2013 Computer Vision I: Introduction 21

[Slide Credits: John Winn, ICML 2008]

Scene type Scene geometry Object classes Object position Object orientation Scene type Scene geometry Object classes Object position Object orientation Object shape

Street scene

(22)

Image-formation model

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Depth/occlusions

(23)

Image-formation model

11/12/2013 Computer Vision I: Introduction 23

[Slide Credits: John Winn, ICML 2008]

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Depth/occlusions

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Depth/occlusions

Object appearance

(24)

Image-formation model

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Depth/occlusions

Object appearance

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Depth/occlusions

Object appearance

Illumination

Shadows

(25)

Image-formation model

11/12/2013 Computer Vision I: Introduction 25

[Slide Credits: John Winn, ICML 2008]

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Depth/occlusions

Object appearance

Illumination

Shadows

(26)

Image-formation model

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Depth/occlusions

Object appearance

Illumination

Shadows

(27)

Image-formation model

11/12/2013 Computer Vision I: Introduction 27

[Slide Credits: John Winn, ICML 2008]

Scene type

Scene geometry

Object classes

Object position

Object orientation

Object shape

Depth/occlusions

Object appearance

Illumination

Shadows

Motion blur

Camera effects

(28)

The “Scene Parsing” challenge ---

a “grand challenge” of computer vision

(Probabilistic) Script = {Camera, Light, Geometry, Material, Objects, Scene, Attributes, Others}

Single image

(29)

Why is “scene parsing” hard?

11/12/2013 Computer Vision I: Introduction 29

Computer Vision Computer Graphics

3D Rich Representation,

2D pixel representation

Computer Vision can be seen as “inverse graphics”

Script = {Camera, Light,

Geometry, Material, Objects,

Scene, Attributes, Others}

(30)

Example of a recent work

Input

Scene gr aph

(31)

Example: General Object recognition & segmentation

11/12/2013 Computer Vision I: Introduction 31

[TextonBoost; Shotton et al, ‘06]

Good results …

(32)

Example: General Object recognition & segmentation

Failure cases…

(33)

Comparison: CV to NLP

Computer Vision (Scene Understanding)

• Amount of Input Data: 10 Mpixel /second for a robot

• Images are 2D (much harder inference!)

• Rules/Models are hard to define since images are so varied (see next lecture)

• Scene Understand is far from being solved, best method has a 47% of being correct for 20 object classes

11/12/2013 Computer Vision – a hard case for AI 33

Natural Language Processing

• Amount of input data: (Audiobooks have 2.2 words per second, i.e. ~20 letters per second)

• Sound is 1D

• Strong rule (context free grammars exists)

• Real-time Speech translation exists more or less

“Real-time Speech translation”

(34)

• Scene Understand is far from being solved,

best method has a 47% of being correct for

20 object classes

(35)

What is computer Vision?

11/12/2013 Computer Vision I: Introduction 35

(Potential) Definition:

Developing computational models and algorithms

to interpret digital images and visual data in order

to understand the visual world we live in.

(36)

Visual Data is everywhere

• Visual Data is dense, structured data

• Real world:

• RGB photo/video cameras

• Mobile phones

• Depth cameras

• Laser scanners

• Robotics

• Medicine

• Microscopy

• Surveillance

(37)

How can we interpret visual data?

11/12/2013 Computer Vision I: Introduction 37

• What general (prior) knowledge of the world (not necessarily visual) can be exploit?

• What properties / cues from the image can be used?

2D pixel representation

3D Rich Representation,

Both aspects are quite well understood (a lot is based on physics) … but how to use them is efficiently is open challenged (see later)

Computer Graphics

Computer Vision

Script = {Camera, Light, Geometry, Material, Objects, Scene,

Attributes, Others}

(38)

How can we interpret visual data?

• What general (prior) knowledge of the world (not necessarily visual) can be exploit?

• What properties / cues from the image can be used?

2D pixel representation

3D Rich Representation,

Computer Graphics

Computer Vision

Script = {Camera, Light, Geometry, Material, Objects, Scene,

Attributes, Others}

(39)

Prior knowledge (examples)

• “Hard” prior knowledge

• Trains do not fly in the air

• Objects are connected in 3D

• “Soft” prior knowledge:

• The camera is more likely 1.70m above ground and not 0.1m.

• Self-similarity: “all black pixels belong to the same object”

11/12/2013 Computer Vision I: Introduction 39

(40)

Prior knowledge – harder to describe

• Describe Image Texture

• Microscopic Images. What is the true shape of these objects

Not a real Image zoom

Real Image zoom

(41)

The importance of Prior knowledge

11/12/2013 Computer Vision I: Introduction 42

[Edward Adelson]

Which patch is brighter: A or B?

(42)

The importance of Prior knowledge

(43)

The importance of Prior knowledge

11/12/2013 Computer Vision I: Introduction 44

Direct Light

The most likely 3D representation 2D Image - local

What the computer sees

This is what humans see

implicitly. Ideally the computer sees the sane.

True colours In 3D world A

B

A B Ambient

Light

An unlikely 3D representation (hard to see for a human)

2D 3D 3D

True colors in 3D world A

B

(44)

The importance of Prior knowledge

2D Image

Light

3D representation Humans see an image not as a set of 2D pixels. They understand an image as a projection of the 3D world we live in

Humans have the prior knowledge about the world encoded, such as:

• Light cast shadows

(45)

Male or Female?

11/12/2013 Computer Vision I: Introduction 46

(46)

How can we interpret visual data?

• What general (prior) knowledge of the world (not necessarily visual) can be exploit?

• What properties / cues from the image can be used?

2D pixel representation

3D Rich Representation,

Computer Graphics

Computer Vision

Script = {Camera, Light, Geometry, Material, Objects, Scene,

Attributes, Others}

(47)

Cue: Appearance (Colour, Texture) for object recognition

11/12/2013 Computer Vision I: Introduction 48

To what object does the patch belong to ?

(48)

Cue: Outlines (shape) for object recognition

(49)

Guess the Object

11/12/2013 Computer Vision I: Introduction 50

 Colour

 Texture Shape

[from JohnWinn ICML 2008]

(50)

Cue: Context for object recognition

(51)

Cue: Context for object recognition

11/12/2013 Computer Vision I: Introduction 52

(52)

Cue: stereo vision (2 frames) for geometry estimation

(53)

Cue: Multiple Frames for geometry estimation

11/12/2013 Computer Vision I: Introduction 54

(54)

Cue: Shading & shadows for geometry and Light estimation

(55)

Texture gradient for geometry estimation

11/12/2013 Computer Vision I: Introduction 56

(56)

The “Scene Parsing” challenge ---

a “grand challenge” of computer vision

(Probabilistic) Script = {Camera, Light, Geometry, Material, Objects, Scene, Attributes, Others}

Many applications do not have to extract the full probabilistic script but only a subset, e.g. “does the image contain a car?”

Single image

(57)

… many application scenarios are in reach

To simplify the problem:

1) Richer Input:

- Modern sensing technology - Moving images

- User involvement

2) Rich Data to learn from:

- use the web

- crowdsourcing to get labels

(online games, mechanical turk) - Powerful graphics engines

11/12/2013 Computer Vision I: Introduction 58

(58)

Real-time pedestrian detection

(59)

Animate the world

11/12/2013 Computer Vision I: Introduction 60

[Chen et al. UIST ‘12]

(60)

Example: Xbox people tracking

(61)

Example: people tracking (test data)

11/12/2013 Computer Vision – a hard case for AI 63

(62)

Body tracking and Gesture Recognition has many applications

StartUp 2012: Try Fashion online

(63)

Start-Up Company: Like.com

11/12/2013 Computer Vision I: Introduction 66

(64)

What is computer Vision?

(Potential) Definition:

Developing computational models and algorithms

to interpret digital images and visual data in order

to understand the visual world we live in.

(65)

Example: Image Segmentation

11/12/2013 Introducing the Computer Vision Lab Dresden 68

Image with User Input

Typically 𝑛is large ≥ 1𝑀

𝒙 = 0,1

𝑛

output

Undirected graphical models 𝜃

𝑖𝑗

(𝑦

𝑖

, 𝑦

𝑗

) 𝑦

𝑗

𝜃

𝑖

(𝑦

𝑖

)

𝑦

𝑖

(66)

Example: Image Segmentation

Image with User Input

I nference/Optimization: 𝒚 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝒚 𝑃(𝒚|𝜽)

Typically 𝑛is large ≥ 1𝑀

𝒚 = 0,1

𝑛

M odelling: How to formulate the graphcial model, 𝑒. 𝑔. 𝑃 𝒚|𝜽

(this this is one of many tasks)

Graphical models

𝜃𝑖𝑗(𝑦𝑖, 𝑦𝑗)

𝑦

𝑗

𝜃𝑖(𝑦𝑖) 𝑦𝑖

(67)

What is Learning?

11/12/2013 Introducing the Computer Vision Lab Dresden 70

Error Function to say how

we compare results find weights 𝜽

(can be up to 10M

parameters)

Probabilistic model: P 𝒚 𝜽

) Image and Ground Truth

Inference: Maximum Probability:

𝒚

= 𝑎𝑟𝑔𝑚𝑎𝑥

𝒚

P 𝒚 𝜽

)

Training:

Testing:

(68)

Model versus Inference (Algorithm)

[Data courtesy from Oliver Woodford]

Input:

Image sequence

Output: New view

(69)

Another Example: Model versus Algorithm

11/12/2013 Computer Vision I: Introduction 72

Belief Propagation ICM, Simulated Annealing

Ground Truth Graph Cut

with truncation

[Rother et al. ‘05]

Why is the result not perfect?

Model or Inference

(approximate solution) (exact solution)

QPBOP

[Boros et al. ’06;

Rother et al. ‘07]

(approximate solution)

(approximate solution)

(70)

Summary: The key questions for the upcoming lectures

• What is the modelling language:

undirected / directed Graphical models; unstructured models

• How does the model look like:

• What is the structure?

• How do the functions look like?

• Can we learn the Model from Data:

• Learn structure

• Learn potential functions

• Probabilistic Learning / Discrimantive Learning

• How do we optimize the model (perform inference):

(71)

Is Machine Learning feasible?

• We are looking at a mapping:

𝑋 = 0,1 3 → 𝑌 = {0,1}

• We are given 5 training data instances:

11/12/2013 Computer Vision – a hard case for AI 74

[example from book: Learning from data; Abu-Mustafa et al.]

(72)

Is Machine Learning feasible?

• We are looking at a mapping:

𝑋 = 0,1 3 → 𝑌 = {0,1}

• We are given 5 training data instances:

?

?

?

(73)

Is Machine Learning feasible?

11/12/2013 Computer Vision – a hard case for AI 76

• Let us look at all possible functions: 𝑓 𝑥 1 , 𝑥 2 , 𝑥 3 = 𝑦

• We have in total 2 2

3

= 256 possible functions

• Given the training data fixed we have 8 remaining functions:

?

?

?

• Without any information about 𝑓 any solution for f is good !

• We need information about 𝑓

[example from book: Learning from data; Abu-Mustafa et al.]

(74)

Is Machine Learning feasible?

Assume 𝑓 is “smooth” in 3D space (𝑥

1

, 𝑥

2

, 𝑥

3

), i.e. few “0-1” transitions in Manhattan-space (neighborhood drawn by lines)

𝑥

1

𝑥

2

𝑥

3

6 Transitions (optimal) 𝑥

1

𝑥

2

𝑥

3

𝑥

2

𝑥

3

𝑥

2

𝑥

3

(75)

Roadmap for this lecture

• A few more words on history of AI and subareas of AI

• An introduction to Computer Vision

• What is it?

• Why is it hard?

• How can we solve it?

• What can we do with it?

• Roadmap for the remaining lecture

11/12/2013 Computer Vision – a hard case for AI 78

(76)

Roadmap for next lectures

• 11.12 (1): Computer Vision – a hard case for AI

• 11.12 (2): Introduction to probability theory

• 18.12 (1): Exercise: probability theory

18.12 (2): Unstructured models: Decision theory

• 8.1 (1): Unstructured models: Probabilistic Learning

• 8.1 (2): Unstructured models: Discriminative Learning Intro

• 15.1 (1): Exercise: Learning

• 15.1 (2): Unstructured models: Discriminative Learning

(77)

Roadmap for next lectures

• 22.1 (1): Undirected Graphical models: Models and Inference

• 22.1 (2): Undirected Graphical models: Models and Inference

• 29.1 (1): Exercise: Learning

• 29.1 (2): Undirected Graphical models: Learning

• 5.2 (1): Directed Graphical models

• 5.2 (2): Wrap up; Putting theory to practice

11/12/2013 Computer Vision – a hard case for AI 80

Lecturers: Carsten Rother and Dimitri Schlesinger

(78)

Related Lectures in Master / Bachelor / Diploma

• Computer Vision 1: Algorithms and Applications (winter term; 2+2)

• Machine Learning (winter term; 2+2)

• Computer Vision 2: Models, Inference, and Learning (summer term; 4+2)

• Many seminars and practical sessions

• Topics for Bachelor, Master, Diploma Thesis

Referenzen

ÄHNLICHE DOKUMENTE

Scene type Scene geometry Object classes Object position Object orientation Object shape Depth/occlusions Object appearance Illumination Shadows Motion blur Camera

Scene type Scene geometry Object classes Object position Object orientation Object shape Depth/occlusions Object appearance Illumination Shadows Motion blur Camera

• Application has AD for Library Object with Just Use (Lookup) Rights. • Application Invokes List Subprogram

10 Load the device driver for your SCSI or network adapter 11 Load the device driver for your hard disk's adapter 12 Load any additional device drivers you need 13 Select

• If you're installing NEXTSTEP on an Intel-based computer using a device driver that isn't on the original NEXTSTEP Device Drivers floppy disk, you may see the

Click or drag in the image to select a color. Some applications come with several images to use as palettes. You can also add your own images as described next.. Click the

This cycle-the event cycle-usually starts at launch time when the application (which includes all the OpenStep frameworks it's linked to) sends a stream of PostScript code to

Testing an object identification solution δ C accessing D is a set of test runs according to the specification S, leading to results Q ˆ for each test run. The test specification is