Speech recognition yesterday … and tomorrow

(1)

- 1 - Digital Signal Processing and Pattern Recognition

Speech recognition yesterday

… and tomorrow

(2)

Classification of signals, pattern recognition

e.g. speech, gestures, hand written text, ECG, traffic situations, …

Objective

Preprocessing:

Computation of feature vectors

…

Word 2 Word n Word 1

Classification:

Comparison with reference patterns Sensor Signal

(3)

Distance Measure for Vector Sequences with Different Lengths

Dynamic programming, dynamic time warping

Computation of a Reference Pattern from Several Examples

Mean value, variance, Mahalanobis distance

Random Variables

Modeling of uncertainty, normal distribution

Hidden Markov Models, Learning from Examples

Viterbi Training, Expectation Maximization (EM), k-means Algorithm, mixture distributions

Statistical Dependency, Covariance

Reduction of classification errors, n-dimensional normal distribution

(4)

Maximum Likelihood Estimation

Theoretical foundation of statistical learning algorithms

Decorrelation, Principal Component Analsysis

Simplification of the classification problem through preprocessing

(5)

Ilias or

http://mitarbeiter.hs-heilbronn.de/~vstahl

Course Material

Slides used in the lecture Exercises

Data for Exercises

Literature hints

(6)

Programming project synchronous to the lectures

Development of a reliable classifier Successive improvement, homework

Programming language C, Java, Python or Matlab Teamwork (max. 3 students)

Deadline: First lecture after Christmas

Grading

(7)

Elementare Einführung in die Wahrscheinlichkeitsrechnung

Karl Bosch (E-Book, 519.Bosch)

Literature

Elementare Einführung in die angewandte Statistik

Karl Bosch (519.Bosch)

Mustererkennung mit Markov-Modellen – Theorie, Praxis, Anwendungsgebiete

Gernot A. Fink (621.841 Fin)

Pattern Classification and Scene Analysis

Richard O. Duda, Peter E. Hart (519.Dud)

Mathematical Methods and Algorithms for Signal Processing

Todd K. Moon, Wynn C. Stirling

The Elements of Statistical Learning

Trevor Hastie, Robert Tibshirani, Jerome Friedman (519.Has)

(8)

1957 „Sputnik shock“

Foundation of ARPA 1958 (Advanced Research Projects Agency)

under the administration of the US defence ministry

DARPA (1972-1993), ARPA (1993-1996), DARPA (1996-…) current budget  3,5 Mrd. $

(9)

1969 Arpanet

Further projects: stealth technology, GPS, …

Grand Challenge 2004, Mohave desert Nevada, 241km.

Grand Challenge 2005 Urban Challenge 2007

(10)

(11)

(12)

(13)

September 2014: Internationale Automobilausstellung Frankfurt Daimler presents autonomously driving truck.

SOP planned for 2025.

(14)

“Google’s Next Phase in Driverless Cars: No Steering Wheel or Brake Pedals”

www.nytimes.com May 27, 2014

(15)

„Keine Folgen für Tesla nach tödlichem Unfall“

FAZ 20.1.2017

(16)

Daimler AG 2017 Prototyp

(17)

Example

Character recognition

Pattern Recognition

A

(18)

Recognition means Classification

Given:

Signal (or features derived from it): Pattern Finite number of classes

Wanted:

Class, to which the pattern „belongs“.

Example:

Character recognition

Visual quality control (ok, reject)

Speech recognition(phonemes, words) Medical diagnostics (X-ray images, ECG, …)

…

(19)

Classification means Comparision with Reference Patterns

Given:

Pattern to be classified (test pattern) A reference pattern for each class Wanted:

The class whose reference pattern is most „similar“

to the given pattern

Problem: „Suitable“ similarity/distance measure between patterns?

(20)

A

Reference pattern class A Reference

pattern class B:

B

Example character recognition

Test pattern

A

Similarity / distance measure?

(21)

Preprocessing

Computation of distinctive features of a pattern: Feature vector.

Sequence of feature vectors

Preprocessing is application dependent!

Different features for speech (formant frequencies), gestures, handwritten text, images, ...

Time signal

(speech, gestures, ECG…)

(22)

Distance measure for sequences of feature vectors

Reference pattern class A

Reference pattern class B

Test pattern

(23)

Euclidean Distance

(24)

Reference pattern class A

Reference pattern class B

Test pattern

Distance measure for sequences with different lengths?

(25)

Reference

Test

Matching

(26)

Summary

Classification of a signal means comparison with reference signals Comparison is not done on the level of pixels or samples

but on features (or feature vector sequences) derived from the signals Feature extraction is application dependent.

Different features e.g. for speech / image / letter recognition

Features for speech recognition are usually derived from Fourier coefficients of short signal sections (e.g. 32ms)

Meaningful distance measure for feature vector sequences with different lengths.

Task: Match test sequence to reference sequence!

(27)

Interpretation as a path in a search grid

Reference

Test

Test Reference

(28)

Restrictions for matching sequences

Each test vector has to be mapped to exactly one reference vector (mapping is a function!)

Forbidden!

Temporal order has to be maintained!

(function is monotonously increasing) A later test vector cannot be mapped to an earlier reference vector

Forbidden!

three hundred two two hundred three

(29)

First and last vectors of the sequences

have to be mapped to each other Forbidden!

At most one reference vector

may be skipped at a time Forbidden!

…

hundred

three two

three hundred two

… …

Restrictions for matching sequences

(30)

Wanted: functional path through the grid with minimal cost

from bottom left to top right monotonously increasing

maximim slope 2 (skip one reference vector)

Reference

Test

(31)

Conclusion: Only 3 kinds of transitions in the grid allowed

Next transition

Loop transition

Skip transition

(32)

Computation of the optimal path

Viterbi Algorithm:

(33)

(34)

(35)

(36)

(37)

(38)

(39)

(40)

Milestone for your project:

Implement Viterbi matching algorithmand test it.

Compute distance between feature vector sequences and the optimal path (will be needed in the next step).

Do some speech recognition experiments using the feature extraction program on my web page:

(long/short words, different speaker for test/reference signal, noise, …)

(41)

Improvement:

More than one reference sequence for a class

References for the class

Mean value for the class:

„Model“

„States“

Problem:

Matching with each reference sequence time consuming Solution:

Compute „average“ over all sequences of one class

Example:

(42)

Problem:

Solution:

Match all reference vector sequences of a class to a common model with fixed length

Two reference sequences from the same class

Reference vector sequences of a class may have different lengths

(43)

Outline:

Determine the length of the model

Match each reference sequence to the model length (assume linear time distortion)

Initial estimation of the model vector sequence by averaging

Match each reference sequence to the current model (dynamic time warping, Viterbi matching as above) Recalculate the model by averaging

Iterative improvement

Viterbi Training of a Model

Initial estimation

(44)

Choice of the model length

Model is too long!

Choice of the model length e.g. ½  median of the length of the reference sequences

Model length is ok.

(45)

Linear Segmentation

Linear mapping of a reference vector sequence to the model states.

(Values of the feature vectors are irrelevant in this step.)

Model

Reference sequence

(46)

Example

two reference

sequences of a class (given)

Model for the class (wanted)

Length 6 Length 7

Length 3

(47)

Linear Segmentation

(48)

Initial estimation of the model vectors: Initial model

Model vector =

Mean value of all reference vectors which have been mapped to the model state

(49)

Match the reference sequences with the model using Viterbi algorithm (Segmentation)

(50)

Reestimate the model vectors:

Model vector =

mean value of all reference vectors which have been mapped to the state.

Recompute segmentation with the new model Reestimate model using the new segmentation

Iterate:

Expectation Maximization Principle (EM algorithm)

(51)

(52)

Milestone for your project:

Implement Viterbi training and test it.

Make recognition experiments with varying number of reference recordings for each word.

Make two sets of recordings: one for training the models and one for evaluating the recognition error rates. Automate this proess.

Make two models of the same word spoken by different speakers.

Instead of speech recognition you can do speaker recognition that way.

(53)

Example

Speech recognition yesterday … and tomorrow