Machine Learning Techniques for Multimedia

(1)

Machine Learning Techniques for Multimedia

Case Studies on Organization and Retrieval

Bearbeitet von

Matthieu Cord, Pádraig Cunningham

1. Auflage 2008. Buch. xvi, 289 S. Hardcover ISBN 978 3 540 75170 0

Format (B x L): 15,5 x 23,5 cm Gewicht: 625 g

Weitere Fachgebiete > EDV, Informatik > Datenbanken, Informationssicherheit, Geschäftssoftware > Datenkompression, Dokumentaustauschformate

Zu Leseprobe

schnell und portofrei erhältlich bei

Die Online-Fachbuchhandlung beck-shop.de ist spezialisiert auf Fachbücher, insbesondere Recht, Steuern und Wirtschaft.

Im Sortiment finden Sie alle Medien (Bücher, Zeitschriften, CDs, eBooks, etc.) aller Verlage. Ergänzt wird das Programm durch Services wie Neuerscheinungsdienst oder Zusammenstellungen von Büchern zu Sonderpreisen. Der Shop führt mehr

als 8 Millionen Produkte.

(2)

Part I Introduction to Learning Principles for Multimedia Data

1 Introduction to Bayesian Methods and Decision Theory . . . . 3

Simon P. Wilson, Rozenn Dahyot, and P´adraig Cunningham 1.1 Introduction . . . 3

1.2 Uncertainty and Probability . . . 4

1.2.1 Quantifying Uncertainty . . . 4

1.2.2 The Laws of Probability . . . 5

1.2.3 Interpreting Probability . . . 6

1.2.4 The Partition Law and Bayes’ Law . . . 7

1.3 Probability Models, Parameters and Likelihoods . . . 8

1.4 Bayesian Statistical Learning . . . 9

1.5 Implementing Bayesian Statistical Learning Methods . . . 10

1.5.1 Direct Simulation Methods . . . 11

1.5.2 Markov Chain Monte Carlo . . . 12

1.5.3 Monte Carlo Integration . . . 13

1.5.4 Optimization Methods . . . 14

1.6 Decision Theory . . . 15

1.6.1 Utility and Choosing the Optimal Decision . . . 16

1.6.2 Where Is the Utility? . . . 17

1.7 Naive Bayes . . . 17

1.8 Further Reading . . . 18

References . . . 19

2 Supervised Learning. . . . 21

P´adraig Cunningham, Matthieu Cord, and Sarah Jane Delany 2.1 Introduction . . . 21

2.2 Introduction to Statistical Learning . . . 22

2.2.1 Risk Minimization . . . 22

2.2.2 Empirical Risk Minimization . . . 23

2.2.3 Risk Bounds . . . 24

ix

(3)

2.3 Support Vector Machines and Kernels . . . 26

2.3.1 Linear Classification: SVM Principle . . . 26

2.3.2 Soft Margin . . . 27

2.3.3 Kernel-Based Classification . . . 28

2.4 Nearest Neighbour Classification . . . 29

2.4.1 Similarity and Distance Metrics . . . 31

2.4.2 Other Distance Metrics for Multimedia Data . . . 32

2.4.3 Computational Complexity . . . 35

2.4.4 Instance Selection and Noise Reduction . . . 36

2.4.5 k-NN: Advantages and Disadvantages . . . 39

2.5 Ensemble Techniques . . . 40

2.5.1 Introduction . . . 40

2.5.2 Bias–Variance Analysis of Error . . . 41

2.5.3 Bagging . . . 41

2.5.4 Random Forests . . . 44

2.5.5 Boosting . . . 45

2.6 Summary . . . 46

References . . . 47

3 Unsupervised Learning and Clustering . . . . 51

Derek Greene, P´adraig Cunningham, and Rudolf Mayer 3.1 Introduction . . . 51

3.2 Basic Clustering Techniques . . . 52

3.2.1 k-Means Clustering . . . 52

3.2.2 Fuzzy Clustering . . . 53

3.2.3 Hierarchical Clustering . . . 54

3.3 Modern Clustering Techniques . . . 58

3.3.1 Kernel Clustering . . . 58

3.3.2 Spectral Clustering . . . 60

3.4 Self-organizing Maps . . . 65

3.4.1 SOM Architecture . . . 66

3.4.2 SOM Algorithm . . . 66

3.4.3 Self-organizing Map and Clustering . . . 69

3.4.4 Variations of the Self-organizing Map . . . 70

3.5 Cluster Validation . . . 73

3.5.1 Internal Validation . . . 75

3.5.2 External Validation . . . 79

3.5.3 Stability-Based Techniques . . . 84

3.6 Summary . . . 87

References . . . 87

4 Dimension Reduction . . . . 91

P´adraig Cunningham 4.1 Introduction . . . 91

4.2 Feature Transformation . . . 93

4.2.1 Principal Component Analysis . . . 94

(4)

4.2.2 Linear Discriminant Analysis . . . 97

4.3 Feature Selection . . . 99

4.3.1 Feature Selection in Supervised Learning . . . 99

4.3.2 Unsupervised Feature Selection . . . 104

4.4 Conclusions . . . 110

References . . . 110

Part II Multimedia Applications 5 Online Content-Based Image Retrieval Using Active Learning. . . . 115

Matthieu Cord and Philippe-Henri Gosselin 5.1 Introduction . . . 115

5.2 Database Representation: Features and Similarity . . . 117

5.2.1 Visual Features . . . 117

5.2.2 Signature Based on Visual Pattern Dictionary . . . 117

5.2.3 Similarity . . . 118

5.2.4 Kernel Framework . . . 119

5.2.5 Experiments . . . 120

5.3 Classification Framework for Image Collection . . . 121

5.3.1 Classification Methods for CBIR . . . 122

5.3.2 Query Updating Scheme . . . 123

5.4 Active Learning for CBIR . . . 124

5.4.1 Notations for Selective Sampling Optimization . . . 125

5.4.2 Active Learning Methods . . . 125

5.5 Further Insights on Active Learning for CBIR . . . 127

5.5.1 Active Boundary Correction . . . 128

5.5.2 MAP vs Classification Error . . . 130

5.5.3 Batch Selection . . . 130

5.6 CBIR Interface: Result Display and Interaction . . . 132

6 Conservative Learning for Object Detectors . . . . 139

Peter M. Roth and Horst Bischof 6.1 Introduction . . . 140

6.2 Online Conservative Learning . . . 143

6.2.1 Motion Detection . . . 143

6.2.2 Reconstructive Model . . . 144

6.2.3 Online AdaBoost for Feature Selection . . . 146

6.2.4 Conservative Update Rules . . . 148

6.3 Experimental Results . . . 149

6.3.1 Description of Experiments . . . 149

6.3.2 CoffeeCam . . . 151

6.3.3 Switch to Caviar . . . 153

6.3.4 Further Detection Results . . . 156

(5)

6.4 Summary and Conclusions . . . 156

7 Machine Learning Techniques for Face Analysis. . . . 159

Roberto Valenti, Nicu Sebe, Theo Gevers, and Ira Cohen 7.1 Introduction . . . 160

7.2 Background . . . 160

7.2.1 Face Detection . . . 160

7.2.2 Facial Feature Detection . . . 161

7.2.3 Emotion Recognition Research . . . 162

7.3 Learning Classifiers for Human–Computer Interaction . . . 163

7.3.1 Model Is Correct . . . 165

7.3.2 Model Is Incorrect . . . 166

7.3.3 Discussion . . . 167

7.4 Learning the Structure of Bayesian Network Classifiers . . . 168

7.4.1 Bayesian Networks . . . 168

7.4.2 Switching Between Simple Models . . . 169

7.4.3 Beyond Simple Models . . . 169

7.4.4 Classification-Driven Stochastic Structure Search . . . 170

7.4.5 Should Unlabeled Be Weighed Differently? . . . 171

7.4.6 Active Learning . . . 172

7.4.7 Summary . . . 173

7.5 Experiments . . . 173

7.5.1 Face Detection Experiments . . . 174

7.5.2 Facial Feature Detection . . . 178

7.5.3 Facial Expression Recognition Experiments . . . 183

7.6 Conclusion . . . 184

8 Mental Search in Image Databases: Implicit Versus Explicit Content Query. . . . 189

Simon P. Wilson, Julien Fauqueur, and Nozha Boujemaa 8.1 Introduction . . . 189

8.2 “Mental Image Search” Versus Other Search Paradigms . . . 190

8.3 Implicit Content Query: Mental Image Search Using Bayesian Inference . . . 191

8.3.1 Bayesian Inference for CBIR . . . 191

8.3.2 Mental Image Category Search . . . 193

8.3.3 Evaluation . . . 195

8.3.4 Remarks . . . 196

8.4 Explicit Content Query: Mental Image Search by Visual Composition Formulation . . . 197

8.4.1 System Summary . . . 198

8.4.2 Visual Thesaurus Construction . . . 198

(6)

8.4.3 Symbolic Indexing, Boolean Search

and Range Query Mechanism . . . 199

8.4.4 Results . . . 201

8.4.5 Summary . . . 203

9 Combining Textual and Visual Information for Semantic Labeling of Images and Videos . . . . 205

Pınar Duygulu, Muhammet Bas¸tan, and Derya Ozkan 9.1 Introduction . . . 206

9.2 Semantic Labeling of Images . . . 207

9.3 Translation Approach . . . 210

9.3.1 Learning Correspondences Between Words and Regions . 211 9.3.2 Linking Visual Elements to Words in News Videos . . . 212

9.3.3 Translation Approach to Solve Video Association Problem . . . 213

9.3.4 Experiments on News Videos Data Set . . . 214

9.4 Naming Faces in News . . . 218

9.4.1 Integrating Names and Faces . . . 218

9.4.2 Finding Similarity of Faces . . . 219

9.4.3 Finding the Densest Component in the Similarity Graph . 220 9.4.4 Experiments . . . 221

9.5 Conclusion and Discussion . . . 223

10 Machine Learning for Semi-structured Multimedia Documents: Application to Pornographic Filtering and Thematic Categorization. 227 Ludovic Denoyer and Patrick Gallinari 10.1 Introduction . . . 227

10.2 Previous Work . . . 229

10.2.1 Structured Document Classification . . . 230

10.2.2 Multimedia Documents . . . 231

10.3 Multimedia Generative Model . . . 231

10.3.1 Classification of Documents . . . 231

10.3.2 Generative Model . . . 232

10.3.3 Description . . . 232

10.4 Learning the Meta Model . . . 238

10.4.1 Maximization of Lstructure . . . 238

10.4.2 Maximization of Lcontent. . . 239

10.5 Local Generative Models for Text and Image . . . 239

10.5.1 Modelling a Piece of Text with Naive Bayes . . . 240

10.5.2 Image Model . . . 240

10.6 Experiments . . . 241

10.6.1 Models and Evaluation . . . 241

10.6.2 Corpora . . . 242

(7)

10.6.3 Results over the Pornographic Corpus . . . 243

10.6.4 Results over the Wikipedia Multimedia Categorization Corpus . . . 244

10.7 Conclusion . . . 246

11 Classification and Clustering of Music for Novel Music Access Applications. . . . 249

Thomas Lidy and Andreas Rauber 11.1 Introduction . . . 250

11.2 Feature Extraction from Audio . . . 251

11.2.1 Low-Level Audio Features . . . 251

11.2.2 MPEG-7 Audio Descriptors . . . 252

11.2.3 MFCCs . . . 255

11.2.4 MARSYAS Features . . . 256

11.2.5 Rhythm Patterns . . . 258

11.2.6 Statistical Spectrum Descriptors . . . 259

11.2.7 Rhythm Histograms . . . 260

11.3 Automatic Classification of Music into Genres . . . 262

11.3.1 Evaluation Through Music Classification . . . 263

11.3.2 Benchmark Data Sets for Music Classification . . . 264

11.4 Creating and Visualizing Music Maps Based on Self-organizing Maps . . . 267

11.4.1 Class Visualization . . . 268

11.4.2 Hit Histograms . . . 269

11.4.3 U-Matrix . . . 270

11.4.4 P-Matrix . . . 271

11.4.5 U*-matrix . . . 272

11.4.6 Gradient Fields . . . 272

11.4.7 Component Planes . . . 273

11.4.8 Smoothed Data Histograms . . . 274

11.5 PlaySOM – Interaction with Music Maps . . . 276

11.5.1 Interface . . . 276

11.5.2 Interaction . . . 277

11.5.3 Playlist Creation . . . 278

11.6 PocketSOMPlayer – Music Retrieval on Mobile Devices . . . 280

11.6.1 Interaction . . . 281

11.6.2 Playing Scenarios . . . 282

11.6.3 Conclusion . . . 282

Index . . . 287