• Keine Ergebnisse gefunden

Bulgarian internet usership in 2009 increased by 31%

in comparison with 2007 and already 46% of the total population uses the internet. According to a study by gemiusAudience [15], published in the report “Do you CEE?” [16] Bulgaria is amongst the countries with the highest percentage of internet penetration.

According to data published by internetworldstats.co [17] there are about 3.5 million internet users in Bul-garia, and aer the statistics published by Gemius the growth of sites observed by analysts is almost 10.7% on an annual basis. In 2010 there was a further 5% increase in usership.

In addition to the ubiquitous international web sites, the most popular web sites on the Bulgarian part of the Internet are Bulgarian news portals (dir.bg, gbg.bg news.bg, etc.). Bulgarian Wikipedia as an important source for natural language processing contains app.

117 000 articles, a considerably smaller size than the biggest Wikipedias – English, German and French – but in the number of articles it is in the 34th position [18] among 270 Wikipedias in other languages.

It is oen claimed that English dominates computers and the internet, and that those wishing to use either must first learn English. at may have been true in the early days of the technology but lack of English is no longer the barrier it once was. What began as an an-glophone phenomenon has rapidly become a multilin-gual affair. Soware has been made capable of display-ing many different kinds of script. Many corporate web-sites now employ multilingual strategies making choice of language a ‘user preference’. Machine translation of web content is only a mouse-click away.

The Internet gives access to large amounts of language data and also offers a wide range of application areas involving language technology.

For language technology, the growing importance of the internet is important in two ways. On the one hand, the large amount of digitally available language data rep-resents a rich source for analysing the usage of natural language, in particular by collecting statistical informa-tion. On the other hand, the internet offers a wide range of application areas involving language technology.

One important aspect of equal opportunities the Law on Equal Opportunities for the Disabled, which came into force in 2002, and addresses the issue of barrier-free information technology. It enjoins public agencies to make sure that the disabled can use their websites and Internet services without any restrictions. User-friendly language technology tools are a key solution to this re-quirement by offering for example speech synthesis to enunciate the content of web pages for the blind.

Internet users and providers of web content can also use language technology in less obvious ways, for ex-ample, by automatically translating web page contents from one language into another. Despite the high cost of manually translating this content, comparatively little language technology has been developed and applied to the issue of website translation in light of the supposed need. is may be due to the complexity of the Bulgar-ian language and to the range of different technologies involved in typical applications.

e next chapter gives an introduction to language tech-nology and its core application areas, together with an evaluation of current language technology support for Bulgarian.

4

LANGUAGE TECHNOLOGY SUPPORT FOR BULGARIAN

Language technology is used to develop soware sys-tems designed to handle human language and are there-fore oen called “human language technology”. Human language comes in spoken and written forms. While speech is the oldest and in terms of human evolution the most natural form of language communication, com-plex information and most human knowledge is stored and transmitted through the written word. Speech and text technologies process or produce these different forms of language, using dictionaries, rules of grammar, and semantics. is means that language technology (LT) links language to various forms of knowledge, in-dependently of the media (speech or text) in which it is expressed. Figure1illustrates the LT landscape.

When we communicate, we combine language with other modes of communication and information media – for example speaking can involve gestures and facial expressions. Digital texts link to pictures and sounds.

Movies may contain language in spoken and written form. In other words, speech and text technologies over-lap and interact with other multimodal communication and multimedia technologies.

In this section, we will discuss the main application areas of language technology, i. e., language checking, web search, speech interaction, and machine transla-tion. ese applications and basic technologies include

‚ spelling correction

‚ authoring support

‚ computer-assisted language learning

‚ information retrieval

‚ information extraction

‚ text summarisation

‚ question answering

‚ speech recognition

‚ speech synthesis

Language technology is an established area of research with an extensive set of introductory literature. e in-terested reader is referred to the following references:

[19,20,21,22].

Before discussing the above application areas, we will briefly describe the architecture of a typical LT system.

4.1 APPLICATION ARCHITECTURES

Soware applications for language processing typically consist of several components that mirror different as-pects of language. While such applications tend to be very complex, figure2shows a highly simplified archi-tecture of a typical text processing system. e first three modules handle the structure and meaning of the text input:

1. Pre-processing: cleans the data, analyses or removes formatting, detects the input languages, and so on.

2. Grammatical analysis: finds the verb, its objects, modifiers and other sentence elements; detects the sentence structure.

Multimedia &

Multimodality

Technologies Language

Technologies Speech Technologies

Text Technologies

Knowledge Technologies

1: Language technology in context

3. Semantic analysis: performs disambiguation (i. e., computes the appropriate meaning of words in a given context); resolves anaphora (i. e., which pro-nouns refer to which pro-nouns in the sentence); rep-resents the meaning of the sentence in a machine-readable way.

Aer analysing the text, task-specific modules can per-form other operations, such as automatic summarisation and database look-ups.

In the remainder of this section, we firstly introduce the core application areas for language technology, and follow this with a brief overview of the state of LT re-search and education today, and a description of past and present research programmes. Finally, we present an expert estimate of core LT tools and resources for Bul-garian in terms of various dimensions such as availabil-ity, maturity and quality. e general situation of LT for

the Bulgarian language is summarised in figure7(p.69) at the end of this chapter. is table lists all tools and resources that are boldfaced in the text. LT support for Bulgarian is also compared to other languages that are part of this series.