• Keine Ergebnisse gefunden

Speech recognition yesterday … and tomorrow

N/A
N/A
Protected

Academic year: 2021

Aktie "Speech recognition yesterday … and tomorrow"

Copied!
53
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

- 1 - Digital Signal Processing and Pattern Recognition

Speech recognition yesterday

… and tomorrow

(2)

- 2 - Digital Signal Processing and Pattern Recognition

Classification of signals, pattern recognition

e.g. speech, gestures, hand written text, ECG, traffic situations, …

Objective

Preprocessing:

Computation of feature vectors

Word 2 Word n Word 1

Classification:

Comparison with reference patterns Sensor Signal

(3)

- 3 - Digital Signal Processing and Pattern Recognition

Distance Measure for Vector Sequences with Different Lengths

Dynamic programming, dynamic time warping

Contents

Computation of a Reference Pattern from Several Examples

Mean value, variance, Mahalanobis distance

Random Variables

Modeling of uncertainty, normal distribution

Hidden Markov Models, Learning from Examples

Viterbi Training, Expectation Maximization (EM), k-means Algorithm, mixture distributions

Statistical Dependency, Covariance

Reduction of classification errors, n-dimensional normal distribution

(4)

- 4 - Digital Signal Processing and Pattern Recognition

Maximum Likelihood Estimation

Theoretical foundation of statistical learning algorithms

Contents

(optional)

Decorrelation, Principal Component Analsysis

Simplification of the classification problem through preprocessing

(5)

- 5 - Digital Signal Processing and Pattern Recognition

Ilias or

http://mitarbeiter.hs-heilbronn.de/~vstahl

Course Material

Slides used in the lecture Exercises

Data for Exercises

Literature hints

(6)

- 6 - Digital Signal Processing and Pattern Recognition

Programming project synchronous to the lectures

Development of a reliable classifier Successive improvement, homework

Programming language C, Java, Python or Matlab Teamwork (max. 3 students)

Deadline: First lecture after Christmas

Grading

(7)

- 7 - Digital Signal Processing and Pattern Recognition

Elementare Einführung in die Wahrscheinlichkeitsrechnung

Karl Bosch (E-Book, 519.Bosch)

Literature

Elementare Einführung in die angewandte Statistik

Karl Bosch (519.Bosch)

Mustererkennung mit Markov-Modellen – Theorie, Praxis, Anwendungsgebiete

Gernot A. Fink (621.841 Fin)

Pattern Classification and Scene Analysis

Richard O. Duda, Peter E. Hart (519.Dud)

Mathematical Methods and Algorithms for Signal Processing

Todd K. Moon, Wynn C. Stirling

The Elements of Statistical Learning

Trevor Hastie, Robert Tibshirani, Jerome Friedman (519.Has)

(8)

- 8 - Digital Signal Processing and Pattern Recognition

1957 „Sputnik shock“

Foundation of ARPA 1958 (Advanced Research Projects Agency)

under the administration of the US defence ministry

DARPA (1972-1993), ARPA (1993-1996), DARPA (1996-…) current budget  3,5 Mrd. $

(9)

- 9 - Digital Signal Processing and Pattern Recognition

1969 Arpanet

Further projects: stealth technology, GPS, …

Grand Challenge 2004, Mohave desert Nevada, 241km.

Grand Challenge 2005 Urban Challenge 2007

(10)

- 10 - Digital Signal Processing and Pattern Recognition

(11)

- 11 - Digital Signal Processing and Pattern Recognition

(12)

- 12 - Digital Signal Processing and Pattern Recognition

(13)

- 13 - Digital Signal Processing and Pattern Recognition

September 2014: Internationale Automobilausstellung Frankfurt Daimler presents autonomously driving truck.

SOP planned for 2025.

(14)

- 14 - Digital Signal Processing and Pattern Recognition

“Google’s Next Phase in Driverless Cars: No Steering Wheel or Brake Pedals”

www.nytimes.com May 27, 2014

(15)

- 15 - Digital Signal Processing and Pattern Recognition

„Keine Folgen für Tesla nach tödlichem Unfall“

FAZ 20.1.2017

(16)

- 16 - Digital Signal Processing and Pattern Recognition

Daimler AG 2017 Prototyp

(17)

- 17 - Digital Signal Processing and Pattern Recognition

Example

Character recognition

Pattern Recognition

A

(18)

- 18 - Digital Signal Processing and Pattern Recognition

Recognition means Classification

Given:

Signal (or features derived from it): Pattern Finite number of classes

Wanted:

Class, to which the pattern „belongs“.

Example:

Character recognition

Visual quality control (ok, reject)

Speech recognition(phonemes, words) Medical diagnostics (X-ray images, ECG, …)

(19)

- 19 - Digital Signal Processing and Pattern Recognition

Classification means Comparision with Reference Patterns

Given:

Pattern to be classified (test pattern) A reference pattern for each class Wanted:

The class whose reference pattern is most „similar“

to the given pattern

Problem: „Suitable“ similarity/distance measure between patterns?

(20)

- 20 - Digital Signal Processing and Pattern Recognition

A

Reference pattern class A Reference

pattern class B:

B

Example character recognition

Test pattern

A

Similarity / distance measure?

(21)

- 21 - Digital Signal Processing and Pattern Recognition

Preprocessing

Computation of distinctive features of a pattern: Feature vector.

Sequence of feature vectors

Preprocessing is application dependent!

Different features for speech (formant frequencies), gestures, handwritten text, images, ...

Time signal

(speech, gestures, ECG…)

(22)

- 22 - Digital Signal Processing and Pattern Recognition

Distance measure for sequences of feature vectors

Reference pattern class A

Reference pattern class B

Test pattern

(23)

- 23 - Digital Signal Processing and Pattern Recognition

Euclidean Distance

(24)

- 24 - Digital Signal Processing and Pattern Recognition

Reference pattern class A

Reference pattern class B

Test pattern

Distance measure for sequences with different lengths?

(25)

- 25 - Digital Signal Processing and Pattern Recognition

Reference

Test

Matching

(26)

- 26 - Digital Signal Processing and Pattern Recognition

Summary

Classification of a signal means comparison with reference signals Comparison is not done on the level of pixels or samples

but on features (or feature vector sequences) derived from the signals Feature extraction is application dependent.

Different features e.g. for speech / image / letter recognition

Features for speech recognition are usually derived from Fourier coefficients of short signal sections (e.g. 32ms)

Meaningful distance measure for feature vector sequences with different lengths.

Task: Match test sequence to reference sequence!

(27)

- 27 - Digital Signal Processing and Pattern Recognition

Interpretation as a path in a search grid

Reference

Test

Test Reference

(28)

- 28 - Digital Signal Processing and Pattern Recognition

Restrictions for matching sequences

Each test vector has to be mapped to exactly one reference vector (mapping is a function!)

Forbidden!

Temporal order has to be maintained!

(function is monotonously increasing) A later test vector cannot be mapped to an earlier reference vector

Forbidden!

three hundred two two hundred three

(29)

- 29 - Digital Signal Processing and Pattern Recognition

First and last vectors of the sequences

have to be mapped to each other Forbidden!

At most one reference vector

may be skipped at a time Forbidden!

hundred

three two

three hundred two

three hundred two

… …

Restrictions for matching sequences

(30)

- 30 - Digital Signal Processing and Pattern Recognition

Wanted: functional path through the grid with minimal cost

from bottom left to top right monotonously increasing

maximim slope 2 (skip one reference vector)

Reference

Test

(31)

- 31 - Digital Signal Processing and Pattern Recognition

Conclusion: Only 3 kinds of transitions in the grid allowed

Next transition

Loop transition

Skip transition

(32)

- 32 - Digital Signal Processing and Pattern Recognition

Computation of the optimal path

Viterbi Algorithm:

(33)

- 33 - Digital Signal Processing and Pattern Recognition

(34)

- 34 - Digital Signal Processing and Pattern Recognition

(35)

- 35 - Digital Signal Processing and Pattern Recognition

(36)

- 36 - Digital Signal Processing and Pattern Recognition

(37)

- 37 - Digital Signal Processing and Pattern Recognition

(38)

- 38 - Digital Signal Processing and Pattern Recognition

(39)

- 39 - Digital Signal Processing and Pattern Recognition

(40)

- 40 - Digital Signal Processing and Pattern Recognition

Milestone for your project:

Implement Viterbi matching algorithmand test it.

Compute distance between feature vector sequences and the optimal path (will be needed in the next step).

Do some speech recognition experiments using the feature extraction program on my web page:

(long/short words, different speaker for test/reference signal, noise, …)

(41)

- 41 - Digital Signal Processing and Pattern Recognition

Improvement:

More than one reference sequence for a class

References for the class

Mean value for the class:

„Model“

„States“

Problem:

Matching with each reference sequence time consuming Solution:

Compute „average“ over all sequences of one class

Example:

(42)

- 42 - Digital Signal Processing and Pattern Recognition

Problem:

Solution:

Match all reference vector sequences of a class to a common model with fixed length

Two reference sequences from the same class

Reference vector sequences of a class may have different lengths

(43)

- 43 - Digital Signal Processing and Pattern Recognition

Outline:

Determine the length of the model

Match each reference sequence to the model length (assume linear time distortion)

Initial estimation of the model vector sequence by averaging

Match each reference sequence to the current model (dynamic time warping, Viterbi matching as above) Recalculate the model by averaging

Iterative improvement

Viterbi Training of a Model

Initial estimation

(44)

- 44 - Digital Signal Processing and Pattern Recognition

Choice of the model length

Model is too long!

Choice of the model length e.g. ½  median of the length of the reference sequences

Model length is ok.

(45)

- 45 - Digital Signal Processing and Pattern Recognition

Linear Segmentation

Linear mapping of a reference vector sequence to the model states.

(Values of the feature vectors are irrelevant in this step.)

Model

Reference sequence

(46)

- 46 - Digital Signal Processing and Pattern Recognition

Example

two reference

sequences of a class (given)

Model for the class (wanted)

Length 6 Length 7

Length 3

(47)

- 47 - Digital Signal Processing and Pattern Recognition

Linear Segmentation

(48)

- 48 - Digital Signal Processing and Pattern Recognition

Initial estimation of the model vectors: Initial model

Model vector =

Mean value of all reference vectors which have been mapped to the model state

(49)

- 49 - Digital Signal Processing and Pattern Recognition

Match the reference sequences with the model using Viterbi algorithm (Segmentation)

(50)

- 50 - Digital Signal Processing and Pattern Recognition

Reestimate the model vectors:

Model vector =

mean value of all reference vectors which have been mapped to the state.

Recompute segmentation with the new model Reestimate model using the new segmentation

Iterate:

Expectation Maximization Principle (EM algorithm)

(51)

- 51 - Digital Signal Processing and Pattern Recognition

(52)

- 52 - Digital Signal Processing and Pattern Recognition

Milestone for your project:

Implement Viterbi training and test it.

Make recognition experiments with varying number of reference recordings for each word.

Make two sets of recordings: one for training the models and one for evaluating the recognition error rates. Automate this proess.

Make two models of the same word spoken by different speakers.

Instead of speech recognition you can do speaker recognition that way.

(53)

- 53 - Digital Signal Processing and Pattern Recognition

Example

Referenzen

ÄHNLICHE DOKUMENTE

The work pre- sented here focuses on the performance of state-of-the-art commercial ASR services (Amazon Web Services Tran- scribe, Google Cloud Speech, and IBM Watson Speech-

Interestingly, the simulated thresholds for the German Matrix sentence test in the test-specific noise condition were not found to depend on the very different feature sets,

Hence, and in order to guarantee a successful Fraunhofer IAIS audiomining system which relies on constant development of the speech recognition system, we, amongst other

7 While this reduces our performance in terms of correctness (we crop away areas with nearly 100 % correctness), it has no impact on the edit overhead, as the number of changes in

Summary In summary the results of the different sets of experiments with rate and reduction dependent models indicate that a more detailed modelling which makes use of the

The Berlin database was evaluated by five-fold speaker-independent cross-validation as in the experiments in Chapter 5 with all seven classes (anger, joy, sadness, fear,

B^ _C`KaQbdcebfa]g9h_Ci j _Qck_Qlm_Qion bfpqbr`K`TsutWvwsxgyb sCz... ‡ý

A motivation of this idea comes from the famous McGurk effect (McGurk and MacDonald, 1976), which demonstrates an interaction between hearing and vision in speech per- ception. We