• Keine Ergebnisse gefunden

Language-Based Multimedia Information Retrieval

N/A
N/A
Protected

Academic year: 2022

Aktie "Language-Based Multimedia Information Retrieval"

Copied!
12
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Language-Based Multimedia Information Retrieval

Franciska de Jong Jean-Luc Gauvain Djoerd Hiemstra Klaus Netter University of Twente

Dept. of Computer Science/CTIT

P.O.Box 217 7500 AE Enschede,

Netherlands fdejong@cs.utwente.nl

LIMSI-CNRS B.P. 133 91403 Orsay Cedex,

France gauvain@limsi.fr

University of Twente Dept. of Computer

Science/CTIT P.O.Box 217 7500 AE Enschede,

Netherlands hiemstra@cs.utwente.nl

Language Technology German Research Center for Artificial Intelligence

– DFKI GmbH Stuhlsatzenhausweg 3, D-66123 Saarbrücken,

Germany netter@dfki.de

Abstract

This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality.

1 Introduction

,QDUFKLYHVRIDOONLQGVGHWDLOHGGRFXPHQWDWLRQDQGSURILOLQJRIWKHDUFKLYHGPDWHULDOLVDSUHUHTXLVLWH IRUHIILFLHQWDQGSUHFLVHDFFHVVWRWKHGDWD:KLOHLQWKHGRPDLQRIWH[WXDOGLJLWDOOLEUDULHVDGYDQFHG PHWKRGVRILQIRUPDWLRQUHWULHYDOFDQVXSSRUWVXFKSURFHVVHVWKHUHDUHVRIDUQRHIIHFWLYHPHWKRGVIRU DXWRPDWLFDOO\SURILOLQJLQGH[LQJDQGUHWULHYLQJLPDJHDQGYLGHRPDWHULDORQWKHEDVLVRIDGLUHFW DQDO\VLVRILWVYLVXDOFRQWHQW$OWKRXJKWKHUHKDYHEHHQRIFRXUVHDGYDQFHVLQWKHDXWRPDWLFDQDO\VLV DQGUHFRJQLWLRQRILPDJHVWKHVHDUHVWLOOVROLPLWHGWKDWWKH\GRQRWSURYLGHDVXIILFLHQWO\UREXVWEDVLV IRUSURILOLQJODUJHDPRXQWVRIKRPRJHQHRXVDXGLRYLVXDOGDWD

:LWKRXWDQ\GRXEWVLPDJHDQGYLGHRSURFHVVLQJKDYHPDGHHQRUPRXVSURJUHVVRYHUWKHSDVW\HDUV IRUH[DPSOHLQWKHDUHDVRIDQDO\VLVRIORZOHYHORUPD\EHHYHQKLJKHUOHYHOLPDJHIHDWXUHVDQGRI VHJPHQWDWLRQRIFRQWLQXRXVYLGHRPDWHULDO/RZOHYHODQDO\VLVRIWH[WXUHDQGFRORXUKLVWRJUDPVFDQ DOUHDG\IRUPWKHEDVLVIRUUHWULHYLQJLPDJHVRIVLPLODUNLQGV7KHGHWHFWLRQRIPRYHPHQWVFDQKHOSWR LGHQWLI\DQGUHWULHYHVHTXHQFHVLQZKLFKPRYHPHQWVRFFXUZKLFKDOUHDG\E\LWVHOIFDQEHDQDSSOLFD WLRQDVLQWKHFDVHRIDXWRPDWLFVXUYHLOODQFH7KHUHFRJQLWLRQRIVLPSOHVKDSHVFDQEHDILUVWVWHSLQ WKHGLUHFWLRQRIUHFRJQLVLQJDQGLGHQWLI\LQJFHUWDLQREMHFWV7KHSURJUHVVWKDWKDVEHHQPDGHLQWKH DUHDRIVHJPHQWDWLRQLHWKHLGHQWLILFDWLRQRIVKRWVRUHYHQVFHQHVE\QRZDOVRDOORZVWRLGHQWLI\

PRUHFRPSOH[VKRWERXQGDULHVDVWKH\DUHIRXQGLQZLSHVIDGLQJRURWKHUNLQGVRIWUDQVLWLRQV

6HJPHQWDWLRQLVUHOHYDQWQRWRQO\IRULGHQWLI\LQJWKHH[WHQVLRQRIDFRKHUHQWVHTXHQFHEXWDOVRDQG DERYHDOOIRUEUHDNLQJGRZQFRQWLQXRXVPDWHULDOLQWRVWDWLFRUVSDWLDOGLPHQVLRQVE\UHSUHVHQWLQJVXFK VKRWVWKURXJKRQHVLQJOHSLFWXUH,QWKHVLPSOHVWFDVHWKLVLVW\SLFDOO\DFKLHYHGE\NH\IUDPH

H[WUDFWLRQZKHUHRQHVLJQLILFDQWIUDPHLVWDNHQWRUHSUHVHQWRQHVKRW+RZHYHUWKHUHDUHDJDLQDOVR PXFKPRUHFRPSOH[W\SHVRIUHSUHVHQWDWLRQEXLOGLQJRQPRVDLFLQJZKHUHWKHFDPHUDVZHHSRIDQ HQWLUHVKRWLVSXWWRJHWKHULQWRRQHVWDWLFSDQRUDPLFSLFWXUH

(2)

%HVLGHVDOOWKLVSURJUHVVWKHUHDSSHDUWREHVWLOOWZRPDMRUXQVROYHGSUREOHPVLQWKHEURDGVFDOH LQGH[LQJDQGUHWULHYDORIYLGHRPDWHULDORQWKHEDVLVRIWKHGHVFULEHGWHFKQRORJLHVYL]DLPDJHDQG YLGHRSURFHVVLQJLVVWLOOIDUDZD\IURPXQGHUVWDQGLQJWKHFRQWHQWRIDSLFWXUHLQWKHVHQVHRID

NQRZOHGJHEDVHGXQGHUVWDQGLQJDQGEWKHUHLVQRHIIHFWLYHTXHU\ODQJXDJHLQWKHZLGHUVHQVHIRU VHDUFKLQJLPDJHDQGYLGHRGDWDEDVHV,QDFHUWDLQVHQVHWKHVHWZRSUREOHPVDUHUHDOO\WZRVLGHVRIWKH VDPHLVVXHOHDYLQJDVLGHDOOWKHSKLORVRSKLFDOLQWULFDFLHVDVVRFLDWHGZLWKWKLVTXHVWLRQWKHUHLVVLPSO\

QRHVFDSHIURPWKHIDFWWKDWKXPDQODQJXDJHSOD\VDFHQWUDOUROHLQUHSUHVHQWLQJH[SUHVVLQJDQGDOVR SURFHVVLQJNQRZOHGJH2QHRIWKHFUXFLDOFRQVHTXHQFHVRIWKLVVWDWHRIWKHDUWLVRIFRXUVHWKDWVRIDU IRULPDJHDQGYLGHRREHFWVLQGH[LQJDQGUHWULHYDOLVVWLOOSUDFWLFDOO\LPSRVVLEOHZLWKRXWWKH

LQWHUYHQWLRQRILQWHUSUHWDWLRQE\DKXPDQZKRXQGHUVWDQGVWKHDXGLRYLVXDOFRQWHQWDQGGHVFULEHVLWLQ WKHIRUPDWRIDKXPDQODQJXDJHZKLFKWKHQFDQVHUYHDVWKHSODWIRUPIRUODQJXDJHEDVHGVHDUFKDQG UHWULHYDO

7RWDFNOHWKHSUREOHPRIDXWRPDWLFGLVFORVXUHDQGUHWULHYDORIDXGLRYLVXDOPDWHULDOWKHWZR(8 IXQGHGSURMHFWV323(<( DQG2/,9(

DUHWKHUHIRUHWU\LQJWRH[SORLWWKHOLQJXLVWLFLQIRUPDWLRQ DVVRFLDWHGZLWKVXFKGDWD7KH\DUHERWKEXLOGLQJRQKXPDQODQJXDJHDVWKHPHGLDLQWHUOLQJXD PDNLQJWKHDVVXPSWLRQWKDWDVORQJDVWKHUHLVQRSRVVLELOLW\WRFDUU\RXWERWKDEURDGVFDOH UHFRJQLWLRQRIYLVXDOREMHFWVDQGDQDXWRPDWLFPDSSLQJIURPVXFKREMHFWVWROLQJXLVWLFUHSUHVHQWD WLRQVWKHGHWDLOHGFRQWHQWRIYLGHRPDWHULDOLVEHVWGLVFORVHGWKURXJKWKHOLQJXLVWLFFRQWHQWDVVRFLDWHG ZLWKWKHLPDJHV,QWKHFDVHRI323(<(ZKLFKZDVODXQFKHGDWDWLPHZKHQSUDFWLFDOO\RQO\

ZULWWHQPDWHULDOFRXOGEHUHOLDEO\SURFHVVHGWKHSULPHOLQJXLVWLFGDWDZHUHVXEWLWOHVFORVHRURSHQ FDSWLRQVDVVRFLDWHGZLWKYLGHRV2/,9(PRUHRUOHVVDIROORZXSSURMHFWH[WHQGVWKHUDQJHRU YDULHW\RIOLQJXLVWLFGDWDDQGIRFXVHVRQVSHHFKWHFKQRORJ\SURFHVVLQJRIWKHVRXQGWUDFNEXWDOVR WDNHVLQWRDFFRXQWRWKHUOLQJXLVWLFPDWHULDODVVRFLDWHGZLWKYLGHRGRFXPHQWV%RWKSURMHFWVPDNHWKH DVVXPSWLRQWKDWZKLOHZHKDYHQRDFFHVVWRWKHYLVXDOFRQWHQWGLUHFWO\RQHVKRXOGDWOHDVWPDNHXVHRI WKHOLQJXLVWLFFRQWHQWRIYLGHRGDWDZKLFKPD\DOVRSURYLGHDGLUHFWRULQGLUHFWUHIOHFWLRQRIWKHYLVXDO FRQWHQW&OHDUO\WKLVFDQQRWSURYLGHDXQLYHUVDOVROXWLRQIRUWKHSUREOHPRIDXWRPDWLFGLVFORVXUHDQG UHWULHYDOEXWDWOHDVWLWFDQFRQWULEXWHWRWKHDXWRPDWLFFDSWXULQJRIDVPXFKRIWKHLQIRUPDWLRQDVLV SRVVLEOHE\WKHVWDWHRIWKHDUW

7KHPDLQREMHFWLYHRIWKHVHSURMHFWVLVWKXVWRGHYHORSDUDGLRDQGYLGHRDUFKLYLQJDQGUHWULHYDOWRRO WKDWZLOOIDFLOLWDWHHIILFLHQWDFFHVVWRODUJHOLEUDULHVRIDXGLRYLVXDOPDWHULDO,QRUGHUWRDOORZD GHWDLOHGUHWULHYDOWKHLQGLFHVWKDWDUHEXLOWIURPWKHDVVRFLDWHGOLQJXLVWLFPDWHULDODUHUHODWHGWRWLPH FRGHVZKHQHYHUWKLVLVSRVVLEOHLHWKH\SRLQWWRSDUWLFXODUIUDPHVRUVKRWVLQWKHYLGHRUDWKHUWKDQWR WKHYLGHRDVDZKROH:KLOHVXEWLWOHVWKHPVHOYHVDOUHDG\SURYLGHDWLPHFRGHGWH[WXDOEDVLVZKLFK RQO\KDVWREHLQGH[HGDSSURSULDWHO\VXFKDWH[WXDOEDVLVKDVWREHFUHDWHGLQWKHFDVHRIVSRNHQLQSXW PDWHULDOWKURXJKDXWRPDWLFVSHHFKUHFRJQLWLRQ7KXVLQWKH2/,9(SURMHFWDSURWRW\SHLVGHYHORSHG DQGWHVWHGZKLFKDXWRPDWLFDOO\SDUWLWLRQVWKHDXGLRFKDQQHODQGWUDQVFULEHVWKHVSHHFKSRUWLRQV SURGXFLQJDWLPHFRGHGRUWKRJUDSKLFWUDQVFULSWLRQ)URPWKHWUDQVFULSWDQLQGH[RIDSSURSULDWHWHUPV LVGHULYHGZLWKHDFKSKUDVHEHLQJOLQNHGWRVSHFLILFWLPHSRLQWVRIWKHYLGHRSURJUDPPH7KLVSURFHVV LVFRPSOHPHQWHGE\HPSOR\LQJYDULRXVDOLJQPHQWWHFKQLTXHVIRUGUDZLQJLQWRDFFRXQWRWKHUWH[WXDO PDWHULDOZKLFKLVQRWWLPHFRGHG\HWEXWZKLFKFDQEHEURXJKWLQWRDUHODWLRQZLWKWKHWLPHFRGHG PDWHULDO)RUWKHUHWULHYDOSDUWWRROVDUHGHYHORSHGZKLFKVXSSRUWXVHUVLQVHDUFKLQJIRUPDWHULDOYLD QDWXUDOODQJXDJHTXHULHVLQFOXGLQJFURVVOLQJXDODFFHVVEDVHGRQRIIOLQHPDFKLQHWUDQVODWLRQRIWKH DUFKLYHGGRFXPHQWVRUDOWHUQDWLYHO\RQOLQHTXHU\WUDQVODWLRQ

7KHFRQVRUWLDDUHFRPSULVHGRIXVHUVDQGWHFKQRORJ\SURYLGHUVDQGLQWHJUDWRUV7KHSULPDU\XVHUVDUH EURDGFDVWRUJDQLVDWLRQV$57(%5716:5DQG7526DQDWLRQDODXGLRYLGHRDUFKLYH,1$DQG

1

Pop-Eye is a EU-funded project within the Telematics Application Programme, sector Language Engineering (LE-4234). Duration: 1997-1998.

2

Olive is a EU-funded project within the Telematics Application Programme, sector Language Engineering (LE-

8364). Duration: Spring 1998- Summer 2000.

(3)

DODUJHVHUYLFHSURYLGHUIRUEURDGFDVWLQJDQG79SURGXFWLRQV12%7HFKQRORJ\SURYLGHUVLQFOXGH 712WKH8QLYHUVLW\RI7ZHQWHDQG').,IRUUHWULHYDOWHFKQRORJ\DQGQDWXUDOODQJXDJHSURFHVVLQJ /,06,&156IRUVSHHFKUHFRJQLWLRQWHFKQRORJ\WKH8QLYHUVLW\RI7ELQJHQIRUHYDOXDWLRQDQGWZR LQGXVWULDOFRPSDQLHV9(&6<6DQG9'$IRULQWHJUDWLRQDQGH[SORLWDWLRQ

7KLVSDSHUSUHVHQWVDQRYHUYLHZRIWKHSURMHFWJRDOVERWKIURPWKHSHUVSHFWLYHRIWKHXVHUVDQGWKH WHFKQRORJ\GHYHORSHUV6HFWLRQDGGUHVVHVWKHXVHUQHHGVDQG6HFWLRQGHVFULEHVWKHFRUHKXPDQ ODQJXDJHWHFKQRORJLHVXVHGIRUVSHHFKUHFRJQLWLRQLQGH[DWLRQDQGUHWULHYDO)LQDOO\LQ6HFWLRQVRPH PRUHGHWDLOHGSURMHFWLQIRUPDWLRQLVJLYHQLQFOXGLQJDQRYHUYLHZRIWKHPDMRUDFKLHYHPHQWVWKXVIDU LQWKHSURMHFWVDQGDVKRUWGHVFULSWLRQRIWKHGHPRQVWUDWRUVWKDWKDYHEHHQEXLOW

2 User Needs

7KHSULPHWDUJHWXVHUVRIWKHSURMHFWVDUHSURIHVVLRQDOVZLWKDQLQWHUHVWLQDQHIILFLHQWGHWDLOHGDQG GLUHFWDFFHVVWRWKHLUYLGHRDUFKLYHV)RUWKHXVHULQVWLWXWLRQVGLVFORVXUHRIYLGHRPDWHULDOSOD\VDQ LPSRUWDQWUROHEHLWIRUWKHSXUSRVHRIUHEURDGFDVWLQJRUUHVHOOLQJH[LVWLQJSURGXFWLRQVIRUUHXVLQJ SDUWRIWKHPDWHULDOLQQHZSURGXFWLRQVRUIRUVXSSRUWLQJUHVHDUFKLQYLGHRGDWDEDVHV:LWKULVLQJ SURGXFWLRQFRVWVUHEURDGFDVWLQJLVDQLPSRUWDQWPHDQVRIZULWLQJRIIWKHFRVWVRYHUWLPH5HVHOOLQJ PDWHULDOLQSDUWLFXODUDFURVVFRXQWU\DQGODQJXDJHERXQGDULHVLVOLNHZLVHDQDGGLWLRQDOVRXUFHRI LQFRPHZKLFKPDNHVPXOWLOLQJXDODFFHVVWRDUFKLYHVDGHVLUDEOHIHDXWXUH5HXVLQJDQGLQWHJUDWLQJ H[LVWLQJPDWHULDOFDQUHGXFHWKHFRVWIRUDQHZSURGXFWLRQE\DIDFWRURIRUPRUH(QDEOLQJGHWDLOHG UHVHDUFKLVRQHRIWKHPDLQIXQFWLRQVRISXEOLFDXGLRYLGHRDUFKLYHVVXFKDV,1$EXWFDQDOVRSOD\D UROHIRUSURGXFHUVDQGHGLWRUVLQ79VWDWLRQV

0RVWRIWKHVHQHHGVPDNHLWYHU\LPSRUWDQWWKDWWKHXVHUVRIWKHDUFKLYHVKDYHGLUHFWDFFHVVWRWKH FRQWHQWRIWKHYLGHRPDWHULDOZLWKRXWKDYLQJWRYLHZWKHHQWLUHGRFXPHQW7KLVLPSOLHVWKDWLQGH[HVWR YLGHRVKDYHWRUHIHUQRWMXVWWRWKHYLGHRSURGXFWLRQDVDZKROHEXWDOVRWRIUDJPHQWVRIWKHPDWHULDO YLDWKHLUWLPHFRGH

:KHQYLGHRDUFKLYHVDUHGLVFORVHGWKLVLVW\SLFDOO\FDUULHGRXWE\DUFKLYLVWVDQGGRFXPHQWDOLVWVZKR YLHZWKHYLGHRDQGLQSDUDOOHOQRWHLWVFRQWHQWWKURXJKNH\ZRUGVRUGHVFULSWLYHH[SUHVVLRQV

:KLOH WKLVPHWKRGLVPD[LPDOO\SUHFLVHDQGGHWDLOHGIRUWKHSXUSRVHRIFDSWXULQJWKHYLVXDOFRQWHQWRID YLGHRLWLVDOVRH[WUHPHO\WLPHDQGFRVWFRQVXPLQJ)RUWKHGHWDLOHGGLVFORVXUHRIDYLGHRDUDWLRRI FDQEHDVVXPHGLHIRURQHKRXUYLGHRXSWRILIWHHQKRXUVRIGHVFULSWLRQWLPHFDQEHQHFHVVDU\

,WLVTXLWHFOHDUWKDWVXFKDYDVWPDMRULW\RIPDWHULDOFDQQRWEHGLVFORVHGRQWKLVEDVLVDWDOOPHWKRG FDQRQO\EHDSSOLHGWRVHOHFWHGSURGXFWLRQVDQGWKDWWKHYDVWPDMRULW\RIPDWHULDOFDQQRWEHGLVFORVHG RQWKLVEDVLVDWDOO

7KHSURMHFWVDLPWRVXSSRUWVXFKKXPDQDUFKLYLQJSURFHVVHVE\GHYHORSLQJDV\VWHPZKLFKDXWRPD WLFDOO\SURGXFHVIXOOWH[WLQGH[HVIURPDWUDQVFULSWLRQRIWKHVRXQGWUDFNRIDSURJUDPPH7KLV LQGH[LQJPHWKRGLVPHDQWWRFRPSOHPHQWWUDGLWLRQDOPHWKRGVE\RIIHULQJDQRWKHUDQGLQVRPHFDVHV DQH[FOXVLYHLQIRUPDWLRQFKDQQHOLQWRWKHYLGHRPDWHULDO

,QDGGLWLRQWRWKHGHWDLOHGFRQWHQWGLVFORVXUHWKHV\VWHPVDOVRSURYLGHDFFHVVWRWKHGLJLWLVHGYLGHR PDWHULDOWKURXJKQHWZRUNWHFKQRORJ\VSHFLILFDOO\ZHEEURZVLQJ7KLVDQVZHUVWKHJURZLQJGHPDQG WRSUHYLHZPDWHULDOUHPRWHO\EHIRUHDFWXDOO\REWDLQLQJWKHPDWHULDOIURPWKHDUFKLYHV5DWKHUWKDQ KDYLQJWRFROOHFWWKHPDWHULDOIRUEURZVLQJDXVHULVDEOHWRTXHU\DGLJLWDOYLGHROLEUDU\IURPKLV GHVNWRSEURZVHWKURXJKWKHUHWXUQHGGHVFULSWLRQVDQGWKHQGRZQORDGDQGSUHYLHZWKHUHOHYDQW VHTXHQFHV7KHRYHUDOOSKLORVRSK\EHKLQGWKHVHVHDUFKHQYLURQPHQWVIRUYLGHRPDWHULDOLVWKDWWKH XVHUFDQQDUURZGRZQKLVVHDUFKE\ILUVWLQVSHFWLQJLQIRUPDWLRQLQWKHIRUPRILQGH[WHUPVWH[W SDVVDJHVWUDQVFULSWLRQVRUVXEWLWOHVVWRU\ERDUGVRUVHTXHQFHVRINH\IUDPHVLQRUGHUWRILQDOO\IRFXV LQRQWKHDFWXDOFRQGHQVHGDWDREMHFWVVXFKDVYLGHRVHTXHQFHV

3

Institutions, which carry out detailed disclosure processes, are for example German ARD TV stations or the

Belgian VRT.

(4)

3 Core Technologies

7RDQVZHUWKHSUREOHPVDQGGHPDQGVGHVFULEHGDERYH2/,9(DWWHPSWVWRSURYLGHRQOLQHDFFHVVWR YLGHRPDWHULDORQWKHEDVLVRIOLQJXLVWLFPDWHULDODVVRFLDWHGZLWKWKHYLVXDOGDWD7KHOLQJXLVWLFGDWD FRQQHFWHGZLWKDYLGHREDVLFDOO\FDQEHGLYLGHGLQWRWKRVHZKLFKDUHLQKHUHQWO\OLQNHGWRWKHWHPSRUDO GLPHQVLRQRIWKHYLGHRDQGWKRVHZKLFKDUHQRW$PRQJWKHIRUPHUDUHVXEWLWOHVZKLFKFDUU\VRPH LQYLVLEOHWLPHFRGHDQGRIFRXUVHWKHVSRNHQZRUGLWVHOIZKLFKLVWLPHFRGHGWKURXJKWKHDOLJQPHQWRI WKHVRXQGWUDFNZLWKWKHYLGHRVLJQDO

2QHRIWKHPDLQWHFKQLFDOWDVNVWREHIDFHGLVWKHUHIRUHWRVHJPHQWDQGSURFHVVWKHOLQJXLVWLFGDWDVXFK WKDWHDFKOLQJXLVWLFH[SUHVVLRQZKLFKTXDOLILHVDVDQLQGH[WHUPFDQEHGLUHFWO\DVVRFLDWHGZLWKWKH WLPHFRGHUHIHUULQJWRDFRUUHVSRQGLQJYLGHRVHTXHQFH7KLVLVWULYLDOO\DFKLHYHGLIWKHOLQJXLVWLF H[SUHVVLRQLVDOUHDG\LQDWLPHFRGHGWH[WXDOIRUPDWDVLQWKHFDVHRIVXEWLWOHV)RUDOORWKHUGDWDWKH WLPHFRGHDQGWKHWH[WXDOUHSUHVHQWDWLRQKDVWREHGHULYHG6SHHFKUHFRJQLWLRQGHYHORSHGLQ2/,9(

IRU)UHQFKDQG*HUPDQZKLFKLVEHLQJXVHGWRDXWRPDWLFDOO\JHQHUDWHWLPHFRGHGWUDQVFULSWLRQVRI WKHVRXQGWUDFNLVWKHUHIRUHRQHRIWKHFRUHWHFKQRORJLHVWRSURYLGHWKHQHFHVVDU\LQIRUPDWLRQ

)RUQRQWLPHFRGHGWH[WVVXFKDVVFULSWVPDQXDOWUDQVFULSWLRQVSURGXFHGIRUWUDQVODWLRQRUVXEWLWOLQJ DWLPHFRGLQJLVEHLQJGHULYHGE\DXWRPDWLFDOLJQPHQWZLWKWKHWLPHFRGHGGDWD6LQFHQRQWLPH FRGHGGDWDW\SLFDOO\FRQVLVWVRIPDQXDOO\SURGXFHGDQGFRQWUROOHGWH[WXDOPDWHULDOWKHTXDOLW\RIWKH LQGH[WHUPVIURPVXFKGDWDFRXOGHYHQEHPRUHUHOLDEOHWKDQWKHRQHGHULYHGIURPVSHHFK

WUDQVFULSWLRQV

7KHUHWULHYDOIXQFWLRQDOLW\LVEXLOGLQJRQVRPHRIWKHFRUHIXQFWLRQVRIDVHDUFKHQJLQHZKRVHYHU\

ILUVWIRXQGDWLRQVZHUHGHYHORSHGLQWKH7ZHQW\2QHSURMHFWKWWSWZHQW\RQHWSGWQRQO7KLVVHDUFK HQJLQHLVGHVFULEHGLQPRUHGHWDLOEHORZ7RVXSSRUWFURVVOLQJXDOVHDUFKDQGUHWULHYDOGLIIHUHQW DSSURDFKHVDUHSXUVXHGVXFKDVHPSOR\LQJWUDQVODWLRQWHFKQRORJ\IRURIIOLQHGRFXPHQWWUDQVODWLRQ ZKHUHWKHWUDQVODWHGGRFXPHQWVVHUYHDVWKHEDVLVIRULQGH[LQJDQGIRURQOLQHWUDQVODWLRQRITXHU\

WHUPVZKHUHWKHWUDQVODWHGTXHU\LVWKHQPDWFKHGDJDLQVWDQLQGH[EXLOGIURPWKHWH[WLQWKHRULJLQDO ODQJXDJH

3.1 Speech Recognition

7RDGGUHVVWKHYDULRXVXVHUQHHGV2/,9(VXSSRUWVGLIIHUHQWWUDQVFULSWLRQPRGHVVHJPHQWDWLRQ JXLGHGDQGIXOO\DXWRPDWLFWUDQVFULSWLRQ)RUWKHVHJPHQWDWLRQWDVNDSHUIHFWWUDQVFULSWLRQRIWKH VSRNHQGDWDLVDVVXPHGDQGWKLVWUDQVFULSWLVWLPHDOLJQHGZLWKWKHDFRXVWLFVLJQDO+RZHYHUH[LVWLQJ WUDQVFULSWVDUHXQOLNHO\WREHH[DFWWUDQVFULSWVRIZKDWZDVVDLGDQGRUPD\RQO\EHSDUWLDOWUDQVFULSWV ZKLFKFDQEHXVHGWRJXLGHWKHVHDUFKGXULQJUHFRJQLWLRQZKDWFDQEHTXDOLILHGDVLQIRUPHGVSHHFK UHFRJQLWLRQ7KHWLPHFRGHVSURGXFHGE\WKHVSHHFKUHFRJQL]HUFDQEHXVHGWRDOLJQWKHK\SRWKHVL]HG WUDQVFULSWLRQZLWKWKHWH[WRIWKHRULJLQDOGRFXPHQW

)XOO\DXWRPDWLFWUDQVFULSWLRQLVSURYLGHGE\WKHVWDWHRIWKHDUWVSHHFKUHFRJQL]HUGHYHORSHGDW/,06,

>KXE\FDFP@7KLVUHFRJQL]HUPDNHVXVHRIFRQWLQXRXVPL[WXUHGHQVLW\+00VIRUDFRXVWLF PRGHOLQJFRPELQHGZLWKDNZRUGIRXUJUDPODQJXDJHPRGHO'HFRGLQJLVFDUULHGRXWLQPXOWLSOH SDVVHVLQFRUSRUDWLQJFOXVWHUEDVHGWHVWVHWDFRXVWLFDGDSWDWLRQ&RQILGHQFHVFRUHVDUHDVVRFLDWHGZLWK HDFKK\SRWKHVL]HGZRUGWRDOORZIXUWKHUSURFHVVLQJVWHSVWRWDNHLQWRDFFRXQWWKHUHOLDELOLW\RIWKH FDQGLGDWHV

3ULRUWRZRUGUHFRJQLWLRQWKHDFRXVWLFVLJQDOLVSDUWLWLRQHGLQWRKRPRJHQRXVVHJPHQWVDQG

DSSURSULDWHODEHOVDUHDVVRFLDWHGZLWKWKHVHJPHQWV>LFVOS@7KLVSDUWLWLRQLQJDOJRULWKPILUVWGHWHFWV DQGUHMHFWVQRQVSHHFKVHJPHQWVXVLQJ*DXVVLDQPL[WXUHPRGHOV*00V$QLWHUDWLYHPD[LPXP OLNHOLKRRGVHJPHQWDWLRQFOXVWHULQJSURFHGXUHLVWKHQDSSOLHGWRWKHVSHHFKVHJPHQWVXVLQJ*00VDQG DQDJJORPHUDWLYHFOXVWHULQJDOJRULWKP7KHUHVXOWRIWKHSDUWLWLRQLQJSURFHVVLVDVHWRIVSHHFK

VHJPHQWVZLWKFOXVWHUJHQGHUDQGWHOHSKRQHZLGHEDQGODEHOV

7KHVSHHFKUHFRJQL]HU>KXE\@XVHVFRQWH[WGHSHQGHQWWULSKRQHEDVHGSKRQHPRGHOVZKHUHHDFK SKRQHPRGHOLVDWLHGVWDWHOHIWWRULJKW&'+00VZLWK*DXVVLDQPL[WXUHVDQGWKHWLHGVWDWHVDUH REWDLQHGE\PHDQVRIDSKRQHPLFGHFLVLRQWUHH:RUGUHFRJQLWLRQLVSHUIRUPHGLQWKUHHVWHSVLQLWLDO

(5)

K\SRWKHVLVJHQHUDWLRQZRUGJUDSKJHQHUDWLRQDQGILQDOK\SRWKHVLVJHQHUDWLRQ7KHLQLWLDOK\SRWKHVHV DUHXVHGIRUFOXVWHUEDVHGDFRXVWLFPRGHODGDSWDWLRQZKLFKDLPVWRUHGXFHWKHPLVPDWFKEHWZHHQWKH PRGHOVDQGWKHGDWDDFULWLFDOVWHSIRUJHQHUDWLQJDFFXUDFWHZRUGJUDSKV

7DNLQJDGYDQWDJHRIWKHFRUSRUDDYDLODEOHWKURXJKWKH/'&WKHVSHHFKUHFRJQL]HU>KXE\LFVOS@

ZDVILUVWGHYHORSHGDQGWHVWHGRQ$PHULFDQ(QJOLVK7KHDFRXVWLFPRGHOVDUHWUDLQHGRQKRXUVRI WUDQVFULEHGDXGLRGDWDZLWKWKHODQJXDJHPRGHOVWUDLQHGRQ0ZRUGVEURDGFDVWQHZV

WUDQVFULSWLRQVDQG0ZRUGVRIQHZVSDSHUDQGQHZVZLUHWH[WV8VLQJDERXWKRXUVRIEURDGFDVW GDWDFROOHFWHGLQ2/,9(IRUHDFKODQJXDJH/,06,KDVSRUWHGLWV$PHULFDQ(QJOLVKV\VWHPWR)UHQFK DQG*HUPDQ

([SHULPHQWVZLWKKRXUVRIXQUHVWULFWHGEURDGFDVWQHZVGDWDLQGLFDWHWKDWZRUGHUURUUDWHVDURXQG DUHREWDLQHGIRU$PHULFDQ(QJOLVKPHDVXUHGRQDUHSUHVHQWDWLYHKRXUVXEVHW([SHULPHQWV RQDERXWKRXUVRI)UHQFKDQG*HUPDQEURDGFDVWQHZVGDWDLQGLFDWHWKDWWKHZRUGHUURUUDWHVDUH VOLJKWO\KLJKHUDERXWZKLFKFDQEHH[SHFWHGDVWKHVHODQJXDJHVDUHPRUHKLJKO\LQIOHFWHGWKDQ (QJOLVKDQGOHVVWUDLQLQJGDWDDUHDYDLODEOH+RZHYHULWKDVWREHNHSWLQPLQGWKDWIRUWKHSXUSRVH RILQGH[LQJDQGUHWULHYDOSHUIHFWZRUGUHFRJQLWLRQLVQRWQHFHVVDU\VLQFHQRWHYHU\ZRUGZLOOKDYHWR PDNHLWLQWRWKHLQGH[DQGQRWHYHU\H[SUHVVLRQLQWKHLQGH[LVOLNHO\WREHTXHULHG5HVHDUFKLQWRWKH GLIIHUHQFHVEHWZHHQWH[WUHWULHYDODQGVSRNHQGRFXPHQWUHWULHYDOLQGLFDWHVWKDWJLYHQWKHFXUUHQWOHYHO RISHUIRUPDQFHRILQIRUPDWLRQUHWULHYDOWHFKQLTXHVUHFRJQLWLRQHUURUVGRQRWDGGQHZSUREOHPVIRU WKHUHWULHYDOWDVN>PD\EFDFP@

3.2 Alignment of non-time-coded text

%HVLGHVWLPHFRGHGWH[WVFRPLQJIURPVXEWLWOHVRUVSHHFKUHFRJQLWLRQIRUPDQ\SURJUDPVWKHUHLVDOVR DULFKVRXUFHRIQRQWLPHFRGHGWH[WVDYDLODEOH7KHVHWH[WVFDQEHWKHGLUHFWRULQGLUHFWUHVXOWRIWKH SURGXFWLRQLWVHOIVXFKDVVFULSWVDXWRFXHILOHVRUPDQXDOWUDQVFULSWLRQVSURGXFHGIRUWUDQVODWLRQEXW WKHWH[WVFDQDOVRFRPHIURPRWKHUVRXUFHVDVIRULQVWDQFHSUHVVUHOHDVHVRUUHYLHZV

6LQFHQRQWLPHFRGHGGDWDW\SLFDOO\FRQVLVWVRIPDQXDOO\SURGXFHGDQGFRQWUROOHGWH[WXDOPDWHULDO WKHTXDOLW\RIWKHLQGH[WHUPVIURPVXFKGDWDFRXOGHYHQEHPRUHUHOLDEOHWKDQWKHRQHGHULYHGIURP DXWRPDWLFVSHHFKUHFRJQLWLRQ7KH2/,9(DOLJQPHQWPRGXOHGHULYHVDWLPHFRGLQJIRUWKHVHWH[WVE\

DXWRPDWLFDOO\DOLJQLQJWKHPZLWKWKHWLPHFRGHGRXWSXWRIVSHHFKUHFRJQLWLRQ7KHUHVXOWVRI

DOLJQPHQWFDQEHXVHGWRUHSODFHWKHLPSHUIHFWRXWSXWRIDXWRPDWLFVSHHFKUHFRJQLWLRQRURWKHUZLVHWR FRPSOHPHQWWKHRXWSXWRIVSHHFKUHFRJQLWLRQ8VHUVPD\ZDQWFKRRVHWKHIRUPHURSWLRQLIWKHQRQ WLPHFRGHGWH[WLVDPDQXDOWUDQVFULSWLRQRIWKHSURJUDPDQGWKHODWWHURSWLRQIRUUHODWHGWH[WV

)RUWKHGHYHORSPHQWRIWKHDOLJQPHQWPRGXOHVHYHUDOVWDWLVWLFVDQGKHXULVWLFVEDVHGDSSURDFKHVZHUH WHVWHGXVLQJIRUH[DPSOHFKDUDFWHUIUHTXHQFLHVZRUGIUHTXHQFLHVDQGVWRSOLVWV>VOXLV@3UHOLPLQDU\

WHVWVZLWKDOLJQPHQWFRXOGQRWEHSHUIRUPHGRQWKHDFWXDORXWSXWRIVSHHFKUHFRJQLWLRQEHFDXVHWKLV GDWDZRXOGRQO\EHDYDLODEOHQHDUWKHHQGRIWKH2/,9(SURMHFW,QODFNRIWKLVGDWDDOLJQPHQWZDV WHVWHGXVLQJWLPHFRGHGFORVHGFDSWLRQVXEWLWOHILOHVRIQHZVEURDGFDVWVSURYLGHGE\RQHRIWKHXVHUV

,QDILUVWHYDOXDWLRQDXWRFXHILOHVUHIHUULQJWRWKHVDPHSURJUDPVDVWKHVXEWLWOHILOHVZHUHDOLJQHGWR WKHWLPHFRGHGVXEWLWOHILOHV7KHDXWRFXHILOHVLQWKHVHWHVWVVHUYHDVQHDUSHUIHFWPDQXDO

WUDQVFULSWLRQVRIWKHSURJUDPWKDWFRXOGEHXVHGWRUHSODFHWKHUHVXOWVRIVSHHFKUHFRJQLWLRQ7KH HYDOXDWLRQVKRZHGWKDWDQDYHUDJHSHUIRUPDQFHRISUHFLVLRQDQGUHFDOORQPDQXDOO\

DOLJQHGWHVWGDWD7KHXVHRIDGGLWLRQDOKHXULVWLFVWKDWWDNHLQWRDFFRXQWWKHVXFFHVVIXODOLJQPHQWVIRU VXUURXQGLQJVHQWHQFHVFRXOGLPSURYHWKHUHFDOOXSWRUHGXFLQJWKHSUHFLVLRQRQO\WR

,QDVHFRQGWHVWWKHWLPHFRGHVZHUHUHPRYHGIURPFORVHGFDSWLRQILOHVEHORQJLQJWRQHZVSURJUDPV WKDWZHUHEURDGFDVWRQWKHVDPHGD\DVWKHSURJUDPVRIWKHWLPHFRGHGVXEWLWOHILOHV7KHUHVXOWLQJ QRQWLPHFRGHGVXEWLWOHILOHVVHUYHDVUHODWHGPDWHULDO7KH\EHORQJWRSURJUDPVWKDWFRYHUDORWRIWKH VDPHQHZVHYHQWVEXWSRVVLEO\LQDGLIIHUHQWRUGHUDQGSRVVLEO\ZLWKRQHRUWZRLWHPVWKDWVKRXOGQRW EHDOLJQHGDWDOO2QWKHVHGDWDWKHEDVLFDOLJQPHQWDOJRULWKPDFKLHYHGDSUHFLVLRQRIDQGD UHFDOORIRQDYHUDJH7KHXVHRIDGGLWLRQDOKHXULVWLFVWKDWWDNHLQWRDFFRXQWVXUURXQGLQJ

(6)

DOLJQPHQWVLPSURYHGWKHSHUIRUPDQFHUHVXOWVFRQVLGHUDEO\WRDSUHFLVLRQRIDQGDUHFDOORI RQDYHUDJH

$OWKRXJKWKHSLORWHYDOXDWLRQVZHUHQRWGRQHZLWKWKHDFWXDORXWSXWRIVSHHFKUHFRJQLWLRQLWLV QHYHUWKHOHVVTXLWHOLNHO\WKDWWKH\DUHDUHOLDEOHLQGLFDWLRQRIWKHDOLJQPHQWSHUIRUPDQFHLQWKHILQDO 2/,9(V\VWHP

3.3 Indexing and Retrieval

7KHUHWULHYDOIXQFWLRQDOLW\HPSOR\HGEXLOGVRQWHFKQRORJ\GHYHORSHGZLWKLQWKH7ZHQW\2QHSURMHFW ZKLFKSURGXFHGWKHILUVWRQOLQHVHDUFKHQJLQHLQ(XURSHVXSSRUWLQJFURVVODQJXDJHUHWULHYDO

DFFHVVLEOHVLQFH7KHV\VWHPVXSSRUWVWKHDXWRPDWLFGLVFORVXUHRILQIRUPDWLRQLQD KHWHURJHQHRXVGRFXPHQWHQYLURQPHQWFRYHULQJGRFXPHQWVRIGLIIHUHQWW\SHVDQGODQJXDJHV

7KH7ZHQW\2QHUHWULHYDOWHFKQRORJ\ZDVHYDOXDWHGRQWZRWDVNVRIWKHLQWHUQDWLRQDO,5HYDOXDWLRQ FRQIHUHQFH75(&%RWKLQWKHPDLQWDVNDQGLQWKHFURVVODQJXDJHWDVNWKH7ZHQW\2QHV\VWHP SHUIRUPHGDWWKHOHYHORIWRGD\¶VZRUOGOHDGLQJH[SHULPHQWDO,5V\VWHPV&I>WUHF@

7KHREMHFWLYHRIWKH7ZHQW\2QHV\VWHPZDVWRGHYHORSGRPDLQLQGHSHQGHQWWHFKQRORJ\WRLPSURYH WKHGLVVHPLQDWLRQOHYHORIGLJLWLVHGDQGQRQGLJLWLVHGPXOWLPHGLDLQIRUPDWLRQ,WKDVVHWDEDVHOLQHIRU DVHULHVRI(8IXQGHGSURMHFWVGHYHORSLQJPXOWLPHGLDLQGH[LQJWRROV$QDSSOLFDWLRQRIWKHV\VWHPLQ WKHGRPDLQRIVXVWDLQDEOHGHYHORSPHQWFDQEHLQVSHFWHGDWKWWSWZHQW\RQHWSGWQRQOWZHQW\RQH&I DOVR>WZOW@DQG>LVGQ@

7KHODQJXDJHHOHPHQWVLQWKHGRFXPHQWVWREHGLVFORVHGDUHWKHEDVLVIRUWKHDXWRPDWLFJHQHUDWLRQRI DWH[WEDVHGLQGH[WKDWHQDEOHVWKHNLQGRIIXQFWLRQDOLW\FRPPRQO\NQRZQDVIXOOWH[WUHWULHYDO7KLV SURYLGHVXVHUVDFFHVVWRLQIRUPDWLRQQRWMXVWYLDDFRQWUROOHGVHWRIVHDUFKWHUPVEXWYLDDQ\ZRUGLQ WKHGRFXPHQW,WDOORZVXVHUVQRWRQO\WRORRNIRUHQWLUHGRFXPHQWVEXWDOVRIRULQIRUPDWLRQZLWKLQ WKHGRFXPHQWV

7KHUHWULHYDOV\VWHPWKXVFRQVLVWVRIWZRFUXFLDOVHWVRIVRIWZDUHLVRIWZDUHWRGLVFORVHPXOWLPHGLD LQIRUPDWLRQLQFOXGLQJDVHULHVRIQDWXUDOODQJXDJHSURFHVVLQJPRGXOHVDQGLLVRIWZDUHWRUHWULHYH PXOWLPHGLDLQIRUPDWLRQZLWKVWDWHRIWKHDUWEURZVLQJDSSOLFDWLRQVIURPUHPRWHRUORFDOVHUYHUVRU IURPDORFDO&'5207KHUHWULHYDOPRGXOHFRQWDLQVDVHDUFKNHUQHOVXSSRUWLQJVHYHUDOTXHU\PRGHV DQGTXHU\ODQJXDJHV

7KHGLVFORVXUHVXEV\VWHPEXLOGVRQOLQJXLVWLFVRIWZDUHZKLFKLQFOXGHVPRUSKRORJLFDODQDO\VLVDQG SDUWRIVSHHFKWDJJLQJSDUVLQJQRXQSKUDVHH[WUDFWLRQDQGWUDQVODWLRQ7KLVJRHVEH\RQGWKH DQDO\VLVSDUWVRIVWDQGDUGIXOOWH[WUHWULHYDOV\VWHPVLQDVIDUDVVXFKV\VWHPVRIWHQGRQRWHYHQ FRPSULVHOHPPDWLVDWLRQOHWDORQHSKUDVDOVWUXFWXULQJLQWKHLUDQDO\VLVSDUW7KHSDUVHURXWSXWFRQVLVWV RIDYHUVLRQRIWKHRULJLQDOGRFXPHQWLQZKLFKWKHQRXQSKUDVHV13VRURWKHUSKUDVDOXQLWV²ZKLFK DUHFRQVLGHUHGWREHSRWHQWLDOLQGH[WHUPV²KDYHEHHQPDUNHG)RUWKHRXWSXWRIWKHVSHHFK

UHFRJQLVHUOLQJXLVWLFDQDO\VLVDQGVHJPHQWDWLRQDWDKLJKHUSKUDVDOFODXVDORUVHQWHQWLDOOHYHOLV HYHQPRUHLPSRUWDQWDVKHUHWKHWH[WW\SLFDOO\FRQVLVWVRIDQXQVHJPHQWHGVWUHDPRIZRUGV3DUVLQJ DQGVWUXFWXUDODQDO\VLVDUHSUDFWLFDOO\LQGLVSHQVDEOHIRUWKHUHWULHYDORQWKHEDVLVRIKLJKHUPHDQLQJIXO OLQJXLVWLFXQLWVDQGIRUWKHSRVVLELOLW\WRSUHVHQWWRWKHXVHUWKHUHVXOWVLQVXFKDIRUPDW

7KHDXWRPDWLFDOO\DFTXLUHGWH[WEDVHGLQGH[LVWKHOLQNEHWZHHQWKHGLVFORVXUHDQGUHWULHYDOPRGXOHV DQGVXSSRUWVWKHUHWULHYDORIWKHVWRUHGWH[WXDOUHSUHVHQWDWLRQVDQGIUDJPHQWVRIWKHREMHFWVOLQNHGWR WKHLQGH[WHUPV7KHV\VWHPH[SORLWVODQJXDJHDVDPHDQVWRILOWHUDQGQDUURZGRZQLQVHYHUDOVWHSV WKHVSDFHRISRWHQWLDOO\UHOHYDQWWDUJHWREMHFWV2QHRIWKHREYLRXVDGYDQWDJHVRIWKLVVWHSZLVHSURFHVV LVWKDWWKHGRZQORDGLQJRIFRQGHQVHGDWDREMHFWVVXFKDVLPDJHVYLGHRVWUHDPVRUVRXQGWUDFNVFDQEH SRVWSRQHGXQWLOWKHUHLVFRQILUPHGHYLGHQFHWKDWWKHUHLVDPDWFKZLWKWKHDFWXDOLQIRUPDWLRQQHHG

8QOLNHLQPRVWRUGLQDU\UHWULHYDOV\VWHPVWKHLQGH[LVDOVRLQPDQ\RWKHUUHVSHFWVQRWOLPLWHGWRDQ LQGH[EDVHGRQVLQJOHZRUGVRUOHPPDWD,QIDFWLWLVDFRPELQDWLRQRIVHYHUDOLQGH[HVFRPSULVLQJD IX]]\SKUDVHEDVHGLQGH[DZHLJKWHGOHPPDEDVHGLQGH[DQGDELEOLRJUDSKLFLQGH[7KURXJKWKH SKUDVHEDVHGLQGH[XVHUVDUHDOORZHGWRTXHU\WKHV\VWHPE\XVLQJQRWRQO\VLPSOHNH\ZRUGVEXWDOVR

(7)

FRPSOHWHSKUDVHVVXFKDV³HIIHFWVRIDFLGUDLQRQIRUHVWVLQWKH1HWKHUODQGV´7KHPDWFKLQJEHWZHHQ TXHU\WH[WDQGLQGH[FDQEHGRQHYLDDRQHUXQIX]]\PDWFKWKDWUDQNVGRFXPHQWVRQWKHEDVLVRI VLPLODULW\DQGQXPEHURIPDWFKLQJSKUDVHV7KHLQFRUSRUDWLRQRIDZHLJKWHGOHPPDEDVHGLQGH[WKDW XVHVDVXFFHVVIXOQHZSUREDELOLVWLFWHUPZHLJKWLQJDOJRULWKPGHYHORSHGDWWKH8QLYHUVLW\RI7ZHQWH

>WUHF@DOORZVDXVHUWRLPSURYHWKHLQLWLDOUHWULHYDOUHVXOWVE\IHHGLQJWKHPRVWUHOHYDQWSDJHVEDFN LQWRWKHUHWULHYDOV\VWHPWRJHWVLPLODUGRFXPHQWVUHWXUQHG7KLVPL[HGDSSURDFKKDVEHHQSURYHQWR

\LHOGDFRQVLGHUDEOHLPSURYHPHQWLQUHWULHYDOSHUIRUPDQFH5HFDOOSURILWVIURPWKHPRUSKRORJLFDO DQDO\VLVLQFOXGLQJFRPSRXQGVSOLWWLQJDQGIX]]\PDWFKLQJ6WHSZLVHUHWULHYDOZLWKXVHULQWHUDFWLRQ DQGUHOHYDQFHIHHGEDFNLPSURYHVSUHFLVLRQ

2QWRSRIPRQROLQJXDOUHWULHYDO2/,9(VXSSRUWVFURVVODQJXDJHLQIRUPDWLRQUHWULHYDO&/,5 IROORZLQJDOVRWKHDSSURDFKGHYHORSHGZLWKLQ7ZHQW\2QH)RUH[DPSOHYLGHRVZLWKD*HUPDQ VRXQGWUDFNDUHPDGHDFFHVVLEOHYLDTXHULHVLQDQ\RIWKHODQJXDJHV)UHQFK(QJOLVK'XWFKDQG

*HUPDQ)RUWKLVDVSHFWRIWKHUHWULHYDOIXQFWLRQDOLW\WZRRSWLRQVDUHGHYHORSHGRIIOLQHGRFXPHQW WUDQVODWLRQXVLQJFRPPHUFLDO0DFKLQH7UDQVODWLRQ07VRIWZDUHVSHFLILFDOO\WKH/2*2607 VHUYHUDQGRQOLQHTXHU\WUDQVODWLRQ:KLFKRSWLRQLVRIIHUHGGHSHQGVPDLQO\RQWKHUHVRXUFHV DYDLODEOHHJWUDQVODWLRQGLFWLRQDULHVIRUHDFKODQJXDJHSDLU

,QRUGHUWRHYDOXDWHWKHYLDELOLW\RILQIRUPDWLRQUHWULHYDOIURPDXWRPDWLFDOO\JHQHUDWHGWUDQVFULSWLRQV WKHUHWULHYDOSUHFLVLRQIURPERWKPDFKLQHDQGKXPDQFUHDWHGWUDQVFULSWVRQDVPDOOVHWRIDXGLRDQG YLGHRGRFXPHQWVZDVPHDVXUHG7KLVGDWDXVHGLQWKH75(&6SRNHQ'RFXPHQW5HWULHYDOWUDFN FRQWDLQVDSSUR[LPDWHO\KRXUVRIUDGLRDQGWHOHYLVLRQEURDGFDVWQHZV8VLQJWKH/,06,VSHHFK UHFRJQLVHUDQGWKH712LQIRUPDWLRQUHWULHYDOV\VWHPWKHUHVXOWVREWDLQHGRQWKLVGDWDZLWKWKH PDFKLQHWUDQVFULSWVDYHUDJHSUHFLVLRQRIDUHSUHWW\FRPSDUDEOHWRWKRVHREWDLQHGZLWKWKH KXPDQWUDQVFULSWVDYHUDJHSUHFLVLRQRI

3.4 Cross-language Retrieval

2QWRSRIPRQROLQJXDOUHWULHYDOWKHSURMHFWVGHVFULEHGDOVRVXSSRUWFURVVODQJXDJHLQIRUPDWLRQ UHWULHYDO&/,5ZKHUHFURVVODQJXDJHUHWULHYDOPHDQVWKDWLQIRUPDWLRQRULJLQDOO\DYDLODEOHLQRQH ODQJXDJHLVUHWULHYHGDVDUHVSRQVHWRDTXHU\LQDQRWKHUODQJXDJH7KHEDVLFRSWLRQVDYDLODEOHIRU

&/,5DUHLOOXVWUDWHGLQILJXUH

(8)

Figure 1 : option for Cross-Language Information Retrieval

7KHILUVWRSWLRQZHZLOOUHIHUWRDVRIIOLQHGRFXPHQWWUDQVODWLRQ7RRXUNQRZOHGJHWKLVPHWKRGZDV ILUVWLQFRUSRUDWHGLQDSXEOLFO\DFFHVVLEOHVHDUFKHQJLQHLQWKH7ZHQW\2QHSURMHFW

,QWKLVDSSURDFK WKHGRFXPHQWVLQRQHODQJXDJH/DUHDXWRPDWLFDOO\WUDQVODWHGRIIOLQHLQWRDQRWKHUODQJXDJH/

2QWKLVWUDQVODWHGGRFXPHQWEDVH/VWDQGDUGPRQROLQJXDO,5WHFKQLTXHVFDQEHDSSOLHGLHDQ LQGH[IRU/FDQEHFUHDWHGZKLFKFDQWKHQEHDFFHVVHGYLDDTXHU\LQWKHODQJXDJH/7KLV GRFXPHQWEDVHFRXOGRIFRXUVHDOVREHDPL[HGGRFXPHQWEDVHZKLFKFRQWDLQVWRWKHWUDQVODWHG GRFXPHQWVDOVRRULJLQDOGRFXPHQWVRI/

7KHVHFRQGDSSURDFKLVFRPPRQO\UHIHUUHGWRDVRQOLQHTXHU\WUDQVODWLRQ,QWKLVFDVHWKHTXHU\LQWKH ODQJXDJH/RIWKHXVHULVWUDQVODWHGLQWRDTXHU\LQWKHODQJXDJH/7KLVTXHU\LVWKHQPDWFKHG DJDLQVWDQLQGH[FUHDWHGIURPWKHRULJLQDOGRFXPHQWVLQ/DQGWKHRULJLQDOGRFXPHQWVDUHUHWULHYHG 2QHRIWKHILUVWSURMHFWVWRLQFRUSRUDWHWKLVDSSURDFKZDVWKH0XOLQH[SURMHFW>0XOLQH[5HIHUHQFH@

7KHWKLUGRSWLRQIRUSURYLGLQJPXOWLOLQJXDODFFHVVEXLOGVRQLQIRUPDWLRQH[WUDFWLRQDQGWKH

FRQVWUXFWLRQRIODQJXDJHLQGHSHQGHQWUHSUHVHQWDWLRQVLQWKHIRUPRILQWHUOLQJXDOWHPSODWHVRURWKHU UHODWLRQDOVWUXFWXUHV)URPWKHVHODQJXDJHHLWKHUWH[WXDOUHSUHVHQWDWLRQVLQWKHGLIIHUHQWODQJXDJHVDUH FUHDWHGWKURXJKDXWRPDWLFPXOWLOLQJXDOWH[WJHQHUDWLRQRUWKHUHSUHVHQWDWLRQVKDYHVRPHRWKHUGLUHFW FRUUHVSRQGHQFHVLQWKHIRUPVRIPHQXLWHPV7\SLFDOO\VXFKLQWHUOLQJXDOUHSUHVHQWDWLRQVDUHTXHULHG E\PHDQVRIIRUPVRURWKHUVWUXFWXUHVZKLFKFDQEHPDSSHGRQWRWKHWHPSODWHV7KLVDSSURDFKKDV

4

In all of the projects under discussion here, which made use of document translation, i.e., Twenty-One, Mulinex, Pop-Eye, OLIVE, and Mietta, the Translation Server of the LOGOS Corporation has been employed.

Query L1

Index L1

Document Base L1

Information Extraction

Query L2

Index L2

Document Base L2

Free Text Query

Class-Based Query/

Multilingual Generation

Interlingual Template

Index

Query Translation

Document Translation

(9)

EHHQUHDOLVHGIRURQHRIWKHILUVWWLPHVLQWKH0LHWWDSURMHFWZKLFKSURYLGHVDFRPELQDWLRQRIFODVV EDVHGTXHU\LQJZLWKIUHHWH[WVHDUFKLQDGGLWLRQWRWKHFURVVOLQJXDOVHDUFKIDFLOLWLHVEDVHGRQRIIOLQH GRFXPHQWWUDQVODWLRQ>FI%XLWHODDU1HWWHU;X@

$OORIWKHVHRSWLRQVDUHUHDOLVHGLQRQHZD\RUWKHRWKHULQWKHSURMHFWVXQGHUGLVFXVVLRQ3RS(\HZDV EXLOGH[FOXVLYHO\RQRIIOLQHGRFXPHQWWUDQVODWLRQZKLOH2/,9(IRFXVHVPRUHEXWQRWH[FOXVLYHO\RQ TXHU\WUDQVODWLRQ$QRWKHU(8IXQGHGPXOWLPHGLDUHWULHYDOSURMHFW0XPLVZKLFKLVDERXWWREH ODXQFKHGZLOOUHDOLVHWKHWKLUGDSSURDFKGHVFULEHG0XPLVZLOOUHDOLVHDGHWDLOHGGLVFORVXUHRIYLGHRV RIVRFFHUPDWFKHVE\H[SORLWLQJDJDLQGLIIHUHQWVRXUFHVRIOLQJXLVWLFLQIRUPDWLRQVXFKDVVSRNHQ FRPPHQWVWUDQVFULEHGE\DXWRPDWLFVSHHFKUHFRJQLWLRQQHZVSDSHUUHSRUWVRQPDWFKHVRURWKHU NLQGVRIPDWHULDO$OORIWKLVPDWHULDOLVVXEPLWWHGWRVRPHLQIRUPDWLRQH[WUDFWLRQSURFHVVZKRVH REMHFWLYHLVWRH[WUDFWWHPSODWHVRUIUDPHVIURPWKHWH[WZKLFKGHVFULEHFHUWDLQDFWLRQVLQWKHJDPH 7KHH[WUDFWHGLQIRUPDWLRQLVWKHQVWRUHGLQDFRQFHSWOLNHUHSUHVHQWDWLRQZKLFKFDQEHVHDUFKHGLQ GLIIHUHQWODQJXDJHVWKURXJKGLUHFWPDSSLQJVRIFRQFHSWVRQWRODQJXDJHVSHFLILFWHUPV

1RZWKHTXHVWLRQLVRIFRXUVHZKLFKRIWKHVHWKUHHRSWLRQVSURYLGHVWKHEHVWVROXWLRQ8QIRUWXQDWHO\

QHLWKHUWKHUHLVQHLWKHUDFOHDUWKHRUHWLFDOO\QRUHPSLULFDOO\IXOO\VDWLVIDFWRU\DQVZHUWRWKLVTXHVWLRQ ,QDQLGHDOZRUOGZKHUHIXOO\DXWRPDWLF07ZRUNVZLWKKLJKSUHFLVLRQDQGIRUDOOODQJXDJHSDLUV ZKHUHWKHUHDUHQRVSDFHDQGWLPHOLPLWVWKHGRFXPHQWWUDQVODWLRQDSSURDFKZRXOGPRVWOLNHO\EHWKH LGHDOVROXWLRQ,WUHTXLUHVWKHOHDVWNQRZOHGJHRIWKHIRUHLJQODQJXDJHIURPWKHXVHUDVKHFDQERWK IRUPXODWHWKHTXHU\LQKLVRZQODQJXDJHDQGUHWULHYHWKHGRFXPHQWLQWKHODQJXDJHRIKLVFKRLFH LQGHSHQGHQWRIWKHRULJLQDOODQJXDJH,QSUDFWLFHWKHVROXWLRQLVQRWTXLWHDVLGHDO1RWDOOODQJXDJH SDLUVWKDWDUHQHHGHGDQGUHTXLUHGDUHFRYHUHGE\FRPPHUFLDO07V\VWHPV(YHQWKHKLJKTXDOLW\

V\VWHPVVWLOOSURGXFHRQO\WUDQVODWLRQVZKLFKDUHVXLWDEOHIRUXQGHUVWDQGLQJWKHZLGHUFRQWHQWRIWKH RULJLQDO7KHDSSURDFKUHTXLUHVWKDWWKHGRFXPHQWVEHWUDQVODWHGDQGWKHWUDQVODWLRQVVWRUHGLIRQH ZDQWVWRSURYLGHWKHWUDQVODWLRQVDVDQRSWLRQWRWKHXVHUV$QGZKLFKPD\EHPRVWFUXFLDOIURPWKH UHWULHYDOSRLQWRIYLHZWKH07V\VWHPGHWHUPLQHVWKHUHWULHYDOTXDOLW\,ILWPLVWUDQVODWHVDWHUPDQGLI WKLVPLVWUDQVODWLRQLVLQGH[HGWKHXVHUKDVSUDFWLFDOO\QRSRVVLELOLW\WRJHWEDFNWRWKHRULJLQDOWHUP DQGWKHRULJLQDOPHDQLQJ

4XHU\WUDQVODWLRQDSSURDFKHVZKLFKRIWHQVHUYHPDLQO\DVWUDQVODWLRQDLGVWRWKHXVHUGRQRWKDYH WKLVGLVDGYDQWDJH7\SLFDOO\WKHXVHULVRIIHUHGDUDQJHRISRVVLEO\WUDQVODWLRQVE\WKHV\VWHPIURP ZKLFKKHFDQFKRRVHWKHEHVWWUDQVODWLRQRUHYHQDGGDWUDQVODWLRQLIKHLVXQVDWLVILHGZLWKDOORIWKH RSWLRQV7KHELJJHVWGLVDGYDQWDJHRIWKHDSSURDFKLVRIFRXUVHWKDWLWUHTXLUHVDWOHDVWVRPHSDVVLYH NQRZOHGJHRIWKHIRUHLJQODQJXDJHIURPWKHXVHU

7KHWKLUGDSSURDFKXVLQJLQIRUPDWLRQH[WUDFWLRQDQGPXOWLOLQJXDOJHQHUDWLRQLVWKHEHVWFRQWUROOHG ZD\IRUSURYLGLQJPXOWLOLQJXDOLQIRUPDWLRQKRZHYHULWSUREDEO\DOVRWKHPRVWUHVWULFWHGRQH,WLV VXLWDEOHDERYHDOOIRUWKRVHFRQILJXUDWLRQVDQGFRQWH[WVZKHUHWKHLQIRUPDWLRQRILQWHUHVWLVEHVW UHSUHVHQWHGLQDKLJKO\VWUXFWXUHGZD\DVLQDUHODWLRQDOGDWDEDVHZKHUHWKHUHVSHFWLYHVWUXFWXUHVDUH KLJKO\UHSHWLWLYHDQGZKHUHQRWDOORIWKHLQIRUPDWLRQDYDLODEOHLVRILQWHUHVWEXWRQO\VHOHFWHGSDUWV 7KXVYLGHRFRQWHQWVZKLFKDUHYHU\GLYHUVLILHGDUHSUREDEO\WKHPRVWXQVXLWDEOHIRUWKLVDSSURDFK ZKLOHIRUH[DPSOHVRFFHUJDPHVZKHUHWKHEDVLFW\SHVRIDFWLRQVLQWKHLUHVVHQFHDUHIDLUO\OLPLWHG DQGDOVRTXLWHUHSHWLWLYHFDQEHPRVWOLNHO\GHVFULEHGE\IDLUO\VWDQGDUGLVHGUHSUHVHQWDWLRQVZKLFK DUHWKHQDOVRHDVLO\PDSSHGRQWRGLIIHUHQWODQJXDJHV:KHWKHUWKLVH[SHFWDWLRQFDUULHVRXWZLOOEHRQH RIWKHLQWHUHVWLQJRXWFRPHVRIWKH0XPLVSURMHFWUHIHUUHGWRDERYH

3.5 Inherent Limitations

2EYLRXVO\WKHGLVFRXUVHDQGOLQJXLVWLFGDWDDVVRFLDWHGZLWKDYLGHRZLOOQRWDOZD\VEHDGLUHFW UHIOHFWLRQRIWKHLPDJHVDQGWKHYLVXDOFRQWHQWRIWKHYLGHR,QSDUWLFXODUWKHUHZLOOEHDEURDGUDQJH RIYDULDWLRQEHWZHHQPRUHGHVFULSWLYHWH[WVOLNHGRFXPHQWDULHVZKHUHWKHFRPPHQWDU\UHIHUVWRDQG H[SODLQVWKHYLVXDOFRQWHQWDQGSURJUDPPHVRIWKHGUDPDW\SHZKHUHWKHGLDORJXHDQGGLVFRXUVH FRPSOHPHQWVWKHYLVXDOFRQWHQW7KXVWKHDSSURDFKGHVFULEHGZLOOKDYHVRPHFOHDUOLPLWDWLRQVDQG IXWXUHH[SHULHQFHDQGHYDOXDWLRQZLOOKDYHWRVKRZIRUZKDWW\SHRISURJUDPPHVWKHDSSURDFKLVPRVW VXLWDEOH

(10)

4 Screenshots from the OLIVE Lab Model

(11)

5 Project Information

2/,9(LVIXQGHGE\WKH(XURSHDQ&RPPLVVLRQXQGHUWKH7HOHPDWLFV$SSOLFDWLRQ3URJUDPPHLQWKH VHFWRU/DQJXDJH(QJLQHHULQJZKLFKQRZWXUQHGLQWRWKH+XPDQ/DQJXDJH7HFKQRORJ\DFWLRQOLQH 7KHSURMHFW/(VWDUWHGLQ$SULODQGZLOOODVWXQWLO7KHUHVXOWVWKXVIDUFRPSULVHD GHWDLOHGRYHUYLHZRIXVHUUHTXLUHPHQWVDGHWDLOHGIXQFWLRQDOGHVLJQIRUWKHGHPRQVWUDWRUDQXSGDWHRI WKHGDWDFDSWXUHWRROVGHYHORSHGZLWKLQ3RS(\HDQGDVRFDOOHGODEPRGHOZKLFKRIIHUVWKHSURRIRI FRQFHSWIRUVSHHFKEDVHGYLGHRUHWULHYDO7KLVODEPRGHOFRQWDLQVDOLPLWHGDPRXQWRIGLJLWLVHGYLGHR PDWHULDOIURPDQ$PHULFDQ(QJOLVKQHZVVKRZZLWKDYDULHW\RIVSHDNHUVDQFKRUPDQVWXGLRJXHVWV DQGSHRSOHFDOOLQJLQIURPRXWVLGHWKHVWXGLR7KHVRXQGWUDFNKDVEHHQWUDQVFULEHGE\WKH

UHFRJQLWLRQWRROVIRU$PHULFDQ(QJOLVKIURP/,06,GHYHORSHGSUHYLRXVO\>KXE\LFVOS@7KH UHVXOWLQJWUDQVFULSWVKDYHEHHQLQGH[HGE\WKHGLVFORVXUHPRGXOHVDQGWUDQVODWHGZLWKFRPPHUFLDO 076RIWZDUH/2*264XHULHVFDQEHVXEPLWWHGLQ)UHQFK*HUPDQDQG(QJOLVKDQGWKHV\VWHP UHWXUQVWKHUHOHYDQWSKUDVHVSOXVWKHOLQNVWRWKHUHOHYDQWIUDJPHQWVZKLFKFDQEHYLHZHGZLWKD5HDO 9LGHRSOXJLQ

7KHXVHUVLQWKH2/,9(FRQVRUWLXPDUHWZRWHOHYLVLRQVWDWLRQVFRPSULVLQJ$57(6WUDVERXUJ )UDQFHDQG7526+LOYHUVXP1HWKHUODQGVDVZHOODVWKH)UHQFKQDWLRQDODXGLRYLGHRDUFKLYH ,1$,QDWKHTXHLQ3DULV)UDQFHDQG12%DODUJHVHUYLFHSURYLGHUIRUEURDGFDVWLQJDQG79

SURGXFWLRQV+LOYHUVXP1HWKHUODQGV7HFKQRORJ\GHYHORSPHQWDQGV\VWHPLPSOHPHQWDWLRQLQYROYH 71273''HOIWWKHSURMHFWFRRUGLQDWRUVXSSO\LQJWKHFRUHLQGH[LQJDQGUHWULHYDOIXQFWLRQDOLW\

9'$%9+LOYHUVXPEXLOGLQJWKHYLGHRFDSWXULQJVRIWZDUHWKH8QLYHUVLW\RI7ZHQWHDQGWKH/7 /DERI').,*PE+6DDUEUFNHQUHVSRQVLEOHDPRQJRWKHUVIRUWKHQDWXUDOODQJXDJHWHFKQRORJ\

/,06,&1562UVD\)UDQFHDQG9HFV\V6$/HV8OLV)UDQFHGHYHORSLQJDQGLQWHJUDWLQJWKH VSHHFKUHFRJQLWLRQPRGXOHVUHVSHFWLYHO\

(12)

More information about OLIVE, the lab model and links to other relevant projects such as Twenty-One and Pop- Eye can be found under http://twentyone.tpd.tno.nl/olive.

References

[cacm00] J.L. Gauvain, L. Lamel, & G. Adda (2000), “Transcribing broadcast news for audio and video indexing.” In : Communications of the ACM, 43(2).

[hub4y97] J.L. Gauvain, G. Adda, L. Lamel & M. Adda-Decker (1997), “Transcribing Broadcast News: The LIMSI Nov96 Hub4 System.” In : Proceedingsof the ARPA Speech Recognition Workshop, pp. 56-63.

[icslp98] J.L. Gauvain, L. Lamel & G. Adda (1998), “Partitioning and Transcription of Broadcast News Data.” In : Proceedings of ICSLP’98, Sydney, pp. 1335-1338.

[twlt98] F.M.G. de Jong (1998), “Twenty-One: a baseline for multilingual multimedia retrieval.” In ; Proceedings of the 14

th

Twente Workshop on Language Technology (TWLT-14), University of Twente, pp. 189-194.

[isdn98] W.G. ter Stal. J-H Beijert, G. de Bruin, J. van Gent, F.M.G. de Jong, W. Kraaij, K. Netter &

G. Smart (1998), “Twenty-One: Cross-language disclosure and retrieval of multimedia documents on sustainable development.” In : Journal of Computer Networks and ISDN Systems Vol. 30, Elsevier, pp. 1237-1248.

[trec99] Hiemstra, D. & W. Kraaij (1999), “Twenty-One at TREC-7: Ad-hoc and Cross-language track.” In : Proceedings of the Seventh Text Retrieval Conference TREC-7, NIST Special Publications.

[mayb97] G. Jones, J. Foote, K. Sparck Jones & S. Young (1997) “The video mail retrieval project:

experiences in retrieving spoken documents.” In : Mark T. Maybury (ed.) Intelligent Multimedia Information Retrieval, AAAI Press.

[sluis00] I.F. van der Sluis & F.M.G. de Jong (2000), ”Enriching Textual Documents with Timecodes

from Video Fragments.” In this volume.

Referenzen

ÄHNLICHE DOKUMENTE

Content Introduction to multimedia retrieval with a focus on classical text retrieval, web retrieval, extraction and machine learning of features for images, audio, and video,

With other words: for each retrieval and learning task, a new evaluation is required to determine the best approach.. Generalization do work to a certain degree, but

• SALSA considers the graph of the base set as a bipartite graph with pages having a double identity, once as a hub identity and once as a authority identity. – To compute

• Pixel based comparison: a naïve approach is to consider the changes per pixel along the time scale and compute a distance between subsequent

• In order to create a decision tree, the machine learning approach must identify a set of tests against the features of the training data sets that lead to the observed labels with

• The algorithm uses a priority queues for nodes and points. The priority corresponds to the distance of the query point to the data point or to the minimal bounding region. The

• A first simple method is based on Rocchio: the query vector is combined with the vectore of relevant and non-relevant documents. The new vector should lie closer to the vectors

2.1.1 Text Retrieval – Overview.. The query is analyzed similarly to the documents in the offline mode, but often we apply additional processing to correct spelling mistakes or