Thai Language Segmentation by Automatic Ranking Trie with Misspelling Correction

(1)

Chalermpol Tapsai

Thai Language Segmentation by Automatic Ranking Trie with Misspelling Correction

Dissertation

Fakultät für

Mathematik und

Informatik

(2)

Thai Language Segmentation by Automatic Ranking Trie

with Misspelling Correction

Dissertation

zur Erlangung des akademischen Grades DOKTOR-INGENIEUR

der Fakultät für Mathematik und Informatik der FernUniversität

in Hagen von

Chalermpol Tapsai Nonthaburi, Thailand

Hagen 2019

(3)

Betreuer Betreuer Betreuer Betreuer ::::

Prof. Dr.-Ing. habil. Herwig Unger

(4)

IV

Contents Contents Contents Contents

Abstract Abstract Abstract

Abstract 1111

Acknowledgements Acknowledgements Acknowledgements

Acknowledgements 2222

Glossary Glossary Glossary

Glossary and Transcriptionand Transcriptionand Transcriptionand Transcription 3333 1

1

1 1 IntroductionIntroductionIntroduction Introduction 5555

1.1 Motivation . . . . . . . 5

1.2 Research objective and concepts . . . 6

1.2.1 Datasets . . . 6

1.2.2 Research steps . . . 7

1.3 Contribution . . . . . . . 8

1.4 Structure of the Thesis . . . . . . 8

2 222 IntroductionIntroductionIntroduction Introduction 10101010 2.1 Natural Language Processing . . . . . 10

2.2 Fundamental knowledge of Thai language principles . . . 11

2.2.1 Thai alphabets . . . 11

2.2.2 Thai word. . . 19

2.2.3 Transforming of verbs into nouns . . . 20

2.2.4 Transforming of adjectives into nouns . . . 22

2.2.5 Transforming of adjectives into adverbs . . . 22

2.2.6 Numeral and quantity representation . . . 22

2.2.7 Quantifying Noun . . . . . 23

2.3 Thai sentences . . . . 25

333 3 Thai word SegmentThai word SegmentThai word SegmentationThai word Segmentationation ation 30303030 3.1 Syllable segmentation . . . 30

3.2 Word segmentation . . . 31

3.3 Trie vs Tree . . . . . . . . . . . 32

3.3.1 Trie creation algorithm . . . 33

3.3.2 Trie parsing algorithm . . . 34

3.4 Word segmentation based on surrounding contexts . . . . 35

3.5 Comparison of Thai Segmentation algorithms. . . 37

3.6 Problems in Thai word segmentation . . . 37

444 4 New technoques and segmentation algorithmNew technoques and segmentation algorithmNew technoques and segmentation algorithm New technoques and segmentation algorithm 41414141 4.1 Ranking Trie . . . . 41

4.1.1 Ranking Trie creation algorithm. . . . 42

4.2 Word Usage Frequency analysis . . . . 44

(5)

4.2.1 Text Corpus . . . . 44

4.2.2 The results of Word Usage Frequency analysis. . . . 45

4.2.3 Character statistics . . . . 47

4.2.4 Consonants and Vowels. . . . 49

4.2.5 Word length . . . . 51

4.3 Word segmentation with Automatic Ranking Trie . . . 53

4.3.1 Model evaluation . . . 54

4.4 Solving problems of misspelling and various patterns of spelling . . . 57

4.4.1 Soundex . . . . 57

4.4.2 Traditional Soundex code. . . .. . . 58

4.4.3 Completed Soundex . . . . 68

4.4.4 Complete Soundex encoding. . . . 72

4.4.5 Completed Soundex encoding process. . . . 77

4.4.6 Completed Soundex similarity values . . . . 82

4.4.7 Evaluation of Completed Soundex . . . . 83

4.5 Solving of compound words problem . . . . 84

4.6 Word Segmentation with Misspelling Correction . . . . 84

4.6.1 The Experiment for performance evaluation . . . . 86

4.6.2 The results on word segmentation performance testing . . . 86

4.6.3 The results on misspelling words correction performance testing . . . . 87

555 5 Application on word segmentationApplication on word segmentationApplication on word segmentation Application on word segmentation 91919191 5.1 Conceptual framework . . . . . 91

5.2 Semantic analysis . . . . 95

5.3 Output transformation . . . . 96

5.4 Information Retrieval Model . . . 97

5.5 Lexical Analysis . . . . 98

5.6 Syntactic Analysis and Semantic Analysis . . . . 99

5.7 SQL Transformation . . . . 102

5.8 SQL Processing . . . . . 103

5.9 Functional testing and model improvement . . . . 103

5.10 Performance evaluation of the model . . . . 103

5.11 Conclusion . . . . 104

666 6 Conclusion and Future WorkConclusion and Future WorkConclusion and Future Work Conclusion and Future Work 105105105105 6.1 Contribution and Review of Results . . . . 105

6.2 Conclusion and Future Work . . . . . . . . . 106

Reference Reference Reference Referencessss 108

Appendix Appendix Appendix Appendix 112

(6)

Abstract Abstract Abstract Abstract

The objective of this research is to present a high-performance word segmentation algorithm named "Thai Language Segmentation by Automatic Ranking Trie with Misspelling Correction (TLS-ART-MC)," which will be contributed to advanced Natural Language Processing for practical use. New techniques named

"Automatic Ranking Trie (ART)" and "Completed Soundex" are proposed to improve the segmentation efficiency and solve the crucial problems of Thai word segmentation that occur with previous algorithms.

Automatic Ranking Trie is a new algorithm that reorganized the structure of Traditional Trie to reduce the number of vocabularies and comparison tasks used in the segmentation process. By using the actual Word Usage Frequency (WUF) analyzed from a text corpus cover 14 types of contents, words with higher frequency are placed at the beginning of Trie that can be found and segmented more quickly. Results from each segmentation will also be used to update the frequency of words. Hence, the structure of Trie has improved relevant to the actual usage of each user automatically.

Completed Soundex is another new techniques related to the coding system of spelling-sound applied to overcome the problem of misspelling words, and multiple spelling forms of specific names and foreign vocabularies. With a new code structure and encoding rules, Completed Soundex encoded all components of a word to represent more precise pronunciation sound, which solved the errors that occurred with misspelling correction by Traditional Soundex.

For the compound word problem, the segmentation process of TLS-ART-MC divided into two steps. In the first step, a text message will be segmented into base-words by parsing with the Automatic Ranking Trie.

Then, in the second step, all base-words are analyzed and formed to be compound words based on rules created from Thai grammar. The performance evaluation was performed comparing with the state of the art algorithms. For the first time, the TLS-ART-MC algorithm able to fix the problem of compound words, as well as misspelling, and multiple spelling forms of specific names and foreign words spelling, with a high level of accuracy and efficiency. The accuracy, precision, and recall values are hereby at the same level as comparable, state of the art algorithms.

Zusammenfassung Zusammenfassung Zusammenfassung Zusammenfassung

Das Ziel dieser Forschung ist es, einen leistungsstarken Wortsegmentierungsalgorithmus mit dem Namen

"Thailändische Sprachsegmentierung durch Rank Trie mit Rechtschreibfehlerkorrektur (TLS-ART-MC)"

vorzustellen, der für die praktische Anwendung zur fortgeschrittenen Verarbeitung natürlicher Sprachen beitragen wird. Es werden die neuen Techniken "Automatic Ranking Trie (ART)" und

"CompletedSoundex" vorgeschlagen, um die Segmentierungseffizienz zu verbessern und die entscheidenden Probleme der thailändischen Wortsegmentierung zu lösen, die bei den vorherigen Algorithmen auftreten.

Automatic Ranking Trie ist ein neuer Algorithmus, der die Struktur von Traditional Trie neu organisiert hat, um die Anzahl der im Segmentierungsprozess verwendeten Vokabeln und Vergleichsaufgaben zu reduzieren. Durch die Verwendung der tatsächlichen Wortgebrauchshäufigkeit (WUF), die aus einem Textkorpus analysiert wurde, werden 14 Arten von Inhalten an den Anfang von Trie gestellt, die schneller gefunden und segmentiert werden können. Die Ergebnisse jeder Segmentierung werden auch verwendet, um die Häufigkeit von Wörtern zu aktualisieren. Daher verbessert sich die Struktur von Trie in Bezug auf die tatsächliche Nutzung für jeden Benutzer automatisch von selbst.

CompletedSoundex ist eine weitere neue Technik im Zusammenhang mit dem Kodierungssystem für Rechtschreibfehler, das angewendet wird, um das Problem von falsch geschriebenen Wörtern und mehrfachen Rechtschreibformen bestimmter Namen und Fremdvokabeln zu überwinden. Mit einer neuen Codestruktur und Kodierungsregeln hat CompletedSoundex alle Bestandteile eines Wortes kodiert, um einen präziseren Aussprache-Sound darzustellen, wodurch die Fehler behoben wurden, die bei der Rechtschreibfehlerkorrektur von Traditional Soundex auftraten.

Für das zusammengesetzte Wortproblem wurde der Segmentierungsprozess von TLS-ART-MC in zwei Schritte unterteilt. Im ersten Schritt wird eine Textnachricht durch Parsen mit dem Automatic Ranking Trie in Basiswörter unterteilt. Dann werden im zweiten Schritt alle Basiswörter analysiert und zu zusammengesetzten Wörtern auf der Grundlage von Regeln gebildet, die aus der thailändischen Grammatik erstellt wurden. Die Leistungsbewertung wurde im Vergleich zu den neuesten Algorithmen durchgeführt. Zum ersten Mal ist der TLS-ART-MC-Algorithmus in der Lage, das Problem von zusammengesetzten Wörtern sowie Rechtschreibfehlern und mehrfachen Schreibweisen bestimmter Namen und Fremdwörter mit einem hohen Maß an Genauigkeit und Effizienz zu beheben. Die Genauigkeits-, Präzisions- und Recallwerte entsprechen dabei denen der neuesten, vergleichbaren Algorithmen.

(7)

Acknowledgements Acknowledgements Acknowledgements Acknowledgements

This thesis has been accomplished with the support of various people. I need to express my gratitude for their kindness. First of all, my parents, who inspired me to have a commitment to education, development of knowledge, overcome all obstacles.

I am thankful to Fern Universität in Hagen as well as the King Mongkut's University of Technology North Bangkok, for the opportunity to be a student in the binational Ph.D.

program, including all supports, from the beginning to success.

I am profoundly grateful to my supervisors, Prof. Dr. Dr.-Ing. Habil. Herwig Unger and Prof. Dr. Phayung Meesad, who have given advice and encouragement on research and thesis writing successfully.

And, last but not least, I would like to thank you to Dr.-Ing. Habil. Mario Kubek and the staff of Kommunikationsnetze der FernUniversität in Hagen, Barbara Kleine, Erik Deussen for facilitating and supporting as I traveled to do research and presentations in Germany and Spain.

(8)

Glossary Glossary Glossary

Glossary and Transcription and Transcription and Transcription and Transcription

1. 1.

1. 1. Acronyms Acronyms Acronyms Acronyms

NLP NLP NLP

NLP Natural Language Processing Natural Language Processing Natural Language Processing Natural Language Processing

is the processing of human language sentences or texts to analyze the meaning, create results or interface between computers and humans, which allows general users to use and operate computers in their language.

WUF WUF WUF

WUF Word Usage Frequency Word Usage Frequency Word Usage Frequency Word Usage Frequency

is the count on each word usage that is active in daily life.

RT RT RT

RT Ranking Trie Ranking Trie Ranking Trie Ranking Trie

is a new type of Trie structure that apply Word Usage Frequency as the criterion for arranging the higher frequency words closer to the beginning of Trie structure in order to increase the efficiency of word segmentation.

TLS TLS

TLSTLS----ARTARTARTART Thai Language SThai Language Segmentation with Automatic Ranking Trie Thai Language SThai Language Segmentation with Automatic Ranking Trie egmentation with Automatic Ranking Trie egmentation with Automatic Ranking Trie

is a word segmentation algorithm that uses Ranking Trie techniques to increase the word segmentation efficiency.

2.

2. 2. Symbols, Annotations, and Describing Patterns Symbols, Annotations, and Describing Patterns Symbols, Annotations, and Describing Patterns Symbols, Annotations, and Describing Patterns

In order to help the reader understand the Thai language text more clearly, the following symbols, annotations, and describing patterns are used in all parts of this book:

1. The pronunciation of Thai characters, words, and sentences is written within brackets [ ] using the IPA annotation symbols and separated each syllable by the symbol -.

2. The meaning of Thai words and sentences in English is written within parentheses ( ).

3. In some cases, the author uses " " to specify the scope of characters, words, and sentences to prevent undesired confusion.

The following examples showed the pronunciation and meaning description of some Thai words and sentences:

ก กกก [k][k][k] [k]

The character ก has an IPA phonetic annotation [k]

(9)

ะะะะ [[[[àààà] ] ] ◌] ◌◌◌ะะะะ [[[[àààà]]]]

The vowel ะ [à] and ◌ะ [à] has an IPA phonetic annotation [à]

กกก

กะะะะ [k][k][k] (estimate)[k](estimate)(estimate) (estimate)

The word กะ has an IPA phonetic annotation [kà] ], which have the meaning in English as "estimate"

สะพาน สะพาน สะพาน

สะพาน [[[[sàsàsà----pan] (bridge)sàpan] (bridge)pan] (bridge) pan] (bridge)

The word สะพาน has an IPA phonetic annotation [sà-pan], which have the meaning in English as "bridge"

นกกินหนอนตัวใหญ

นก กิน หนอน ตัว ใหญ

Birds eat big worms

→ [nók̚] [kin] [nɔ̌n] [tua] [jài] → (bird) (eat) (worm) (body) (big)

"นกกินหนอนตัวใหญ" is a sentence that consisted of the words นก [nók̚] (bird), กิน [kin] (eat), หนอน [nɔ̌n] (worm), ตัว [tua] (body), and ใหญ [jài] (big), which have the meaning in English as "Birds eat big worms."

Note: In the Thai language, there is no articles or adding s to specifying singular/plural nouns; therefore, "นกกินหนอนตัวใหญ" may mean like one of these sentences:

• Birds eat big worms.

• Birds eat a big worm.

• A bird eats big worms.

• A bird eats a big worm.

4. Since Thai is a language that writes words continuously without spaces, it may cause difficulty for readers to make understanding. Therefore, in some cases, the author uses the symbol - as a separator to make readers quickly notice the boundaries of each word.

(10)

1 Introduction 5

Chapter 1 Chapter 1 Chapter 1 Chapter 1

Introduction Introduction Introduction Introduction

1 11

1....1 1 1 Motivation 1 Motivation Motivation Motivation

Natural Language Processing (NLP) is a processing system of Natural Language, which refers to the human language used in daily life, to analyze the meaning and produce results that accurately respond to user requirements in various ways. Many studies of NLP have been conducted since the 1950s, with the aim of creating an interface system that helps humans communicate and command computers in familiar Natural Languages, which allow general users to easily use computers without having to practice or learn any computer languages. For these reasons, NLP has subsequently gained widespread interest, with many studies applied to various purposes. For example, , language translation [1], summarization of information from text documents [2], [3], [4], humans-computers interaction [5], and data retrieval from database [6], [7], etc. Currently, NLP has conducted in many languages such as English, Chinese, Japanese, Arabic, and also Thai, etc. For the Thai language, NLP has begun since 1981, focused on syllable and word segmentation, which is the most important process that will severely affect the precision quality of the following process, including Semantic Analysis and creating the results. The first phase of these studies related to syllable segmentation [8], [9], [10], in order to serve the Word Processing application, which is popular programs during that time. Then many more studies focused on word segmentation have conducted with various techniques, which divided into three types, including Rules-based Word Segmentation (RBWS) [11], Dictionary-based Word Segmentation (DBWS) [12], [13], and Learning-based Word Segmentation (LBWS) [14], [15]. However, despite the extensive research and development of word segmentation as mentioned above, the results still have many errors, and unable to solve significant problems, which summarized into four topics as follows:

1. Problem with the efficiency of word segmentation 2. Problem with misspelling words

3. Problems with Name entity and words with multiple spelling patterns 4. Problems with compound words

These problems mentioned above are a significant cause of incorrect results and produce many non-word fragments in all Thai word segmentation algorithms. As a result of these problems, other studies related to Thai NLP cannot be conducted progressively with good quality results. Therefore, only a few further studies on Thai NLP have conducted at a later time. For this reason, the researcher intends to conduct studies to develop new techniques, which improves the algorithm of Thai word segmentation for more accurate and efficient.

(11)

1 Introduction 6

1 11

1....2222 Research objective and concepts Research objective and concepts Research objective and concepts Research objective and concepts

The objective of this research is to conduct a new high-performance word segmentation algorithm, which provides high precision results for practical use. By focusing on solving the four problems discussed above, this research implements two new techniques, namely, Ranking Trie (pronounce as [ráŋ kìŋ trai]) and Completed Soundex, to cover the problems of word segmentation efficiency, misspelling words, and multiple spelling patterns. For compound words problem, a two-passes segmentation process is applied with some predefined rules without saving all compound words in the dictionary.

Moreover, analysis for the use of Thai words that are consistent with real-life usage, covering most types of content and at all levels of language, is also performed for more accurate and reliable results. The concepts of this research shown as a diagram in Figure 1.1.

FigFig

FigFig. . . . 1111....1111:::: Research concepts 1.2.1 Datasets

1.2.1 Datasets 1.2.1 Datasets 1.2.1 Datasets

The datasets used in this research are 6,658 text files collected from various sources, both online and offline consistent with real-life usage, covering 14 types of content, including:

(12)

1 Introduction 7

• Agriculture

• Economics and business

• Society, Politics

• Entertainment

• Education

• Health

• Fashion and beauty

• Sport

• Technology

• Religion

• Royal news

• Nature

• Others

All text files divided into two sets, which are the Development Dataset and the Test Dataset. The Development Dataset consists of 5,958 text files divided into two sub-sets, in which 4,558 files are randomly selected and used as a Text Corpus for analysis of Word Usage Frequency (WUF), which is essential data used for word segmentation processing. The remaining 1,400 text files are used for model functional testing to improve the model. For the Test dataset, 700 text files are used as input for model evaluation.

1.2.2 1.2.2 1.2.2

1.2.2 Research stepsResearch stepsResearch steps Research steps

The process of this research divided into three steps, which are:

1. Dictionary Creation 2. Model Development 3. Model Evaluation.

In the first step, the Dictionary Creation, all vocabulary from the Lexitron dictionary [28], which is the most popular Thai word segmentation dictionary, were inputted to generate the Soundex code and specify the type of each word under the Royal Institute Dictionary definition. The output from this step will store in a database used as the initial dictionary for the next step.

(13)

1 Introduction 8

Step 2, the Model Development consists of three processes, which are Word Usage Frequency Analysis, Model Creation, and Model Improvement. In the Word Usage Frequency Analysis, the development dataset (4,558 text files) is inputted to segment into words and count to provide the WUF used to improve the dictionary. For the Model creation, all functional modules, including Word Segmentation, Unknown-word Scoping, Completed Soundex Analysis, User Interfacing, and Dictionary Update, are created.

Then, in the Model Improvement, 1400 text files are inputted for functional testing and improve the model to be more accurate and efficient.

Step 3, which is the last step, it is a test of the performance of the model by inputting a Test dataset consisting of 700 text files for word segmentation compared with the results when segmented by LexTo.

1 11

1....3333 Contrib Contrib Contribution Contrib ution ution ution

Many Thai word segmentation algorithms have been researched and developed since the 1980s based on three major techniques, namely Rule-based technique, Dictionary- based technique, and Learning-based technique. So far, four crucial problems, including the efficiency of the segmentation process, misspelling words, multiple spelling patterns, and an enormous amount of compound nouns, are still the main causes of errors and unexpected results. Consequently, the major goal of this thesis is to develop a high- performance algorithm of word segmentation focused on solving the four crucial problems mentioned above. Two new techniques, Ranking Trie and Completed Soundex are invented and implemented for this purpose. The results comparing with the state of the art algorithm showed that the new algorithm could solve the problems of misspelling, multiple spelling patterns, and compound words, which no previous algorithm can do.

Moreover, the dictionary size used by the new algorithm is significantly smaller, as well as the parsing process is faster than the state of the art. At the end of this thesis, ongoing research is to confirm the empirical contribution of the segmentation algorithm was conducted and presented by a model for retrieving and processing data from the database using Natural Language. The results confirm that the high precision of word segmentation makes the process of Semantic analysis, and Output Transformation can be done accurately with high efficiency that will detail in Chapter 5.

1 11

1....4444 Struct Struct Structure of Struct ure of ure of the ure of the the Thesis the Thesis Thesis Thesis

The rest contents of this thesis divided into five chapters. Chapter 2 present for Natural Language Processing, and fundamental knowledge of Thai language principles, to provide the essential background that helps readers understand the content in the next chapters. Chapter 3 presents related principles and theories, as well as many studies on

(14)

1 Introduction 9

Thai word segmentation that has been studied and developed before this. At the end of the chapter, the researcher outlines the problems and solutions that must be improved to make the segmentation algorithm more efficient and accurate. Chapter 4 presents the new techniques that the researchers have invented and developed to create a highly efficient word segmentation algorithm that solves the main problems in previous studies.

Details of each technique, as well as the working process of the algorithm, will be presented one by one with the experiment and evaluation results. Chapter 5 provides the content related to the contribution of the research results for a higher level of NLP to confirmed that effective word segmentation with high accuracy would result in other NLP processes, such as Semantic Analysis and the Output Transformation to be accurate and effective as well. Chapter 6, which is the last chapter, is about the conclusion and discussion for future works.

(15)

2 Fundamental knowledge 10

Chapter 2 Chapter 2 Chapter 2 Chapter 2

Fundamental knowledge Fundamental knowledge Fundamental knowledge Fundamental knowledge

This chapter presents some basic knowledge related to the research. Starting with the process of Natural Language Processing. Then, fundamental knowledge about Thai language principles, including all different types of letters, pronunciation, words, and sentence construction, will be present with clear examples to help readers understand the contents in the next chapters easier.

2 22

2....1111 Natural Langua Natural Langua Natural Language Processing Natural Langua ge Processing ge Processing ge Processing

The work processes in NLP divided into four steps [6] as shown in Figure 2.1.

Fig Fig Fig

Fig. . . . 2222....1111:::: Natural Language Processing steps.

1. Lexical Analysis

This step, analyzing natural language sentence and split to small item call “Token”, addition with identification of type and some essential information which will be used by the next step. This process is the most important part of Natural Language Processing, especially in non-segmentation languages, such as Thai, Laos, Japanese, and Chinese, etc. Since all words in these languages are written continuously without any spaces or delimiters, it is a nontrivial task to identify the precise word boundaries without errors and cover all usage cases. Many more details about the word segmentation will be presented in Chapter 3.

2. Syntactic Analysis

In this step, all tokens are parsed with predefined sentence structure (Syntax) for validity checking and provide some information to be used in the meaning analysis process.

(16)

3. Semantic Analysis

The semantic analysis, interpreted meaning of a sentence by parsing information which derived from former step with semantic structure such as ontology or semantic web structure to provide some data that represents the meaning of a sentence.To make computers understand natural language, Semantic Analysis is an important process which interprets the meaning of the sentences. From the past to the present time, many studies related to semantic analysis were presented by using various techniques such as Semantic pattern matching [16], semantic pattern mining [17], Rule base (fixed rules and algorithm) [18], Ontology [19], and Ontology-based Semantic Web Service Architecture (SWSA) [20]. However, despite plenty of studies on the semantic analysis and application of NLP, these studies only cover natural languages at a general usage level without processing- type sentences which are usually used in daily life.

4. Output Transformation Process

This step, transform outputs derived from Semantic Analysis into the results that meets the objectives of target’s work, such as, SQL commands for information retrieval from database and so on.

2 22

2....2222 Fundamental knowledge of Thai language principles Fundamental knowledge of Thai language principles Fundamental knowledge of Thai language principles Fundamental knowledge of Thai language principles

This section presents basic information about Thai principles. Beginning with the smallest components, including all types of Thai letters. Then, the structure of a word, how to form a word, and types of words, are the next topics for detailed. Lastly, the sentence structure will be presented, respectively. In order to explain Thai words, the IPA phonetic notation [21], and the meaning in English is added to help the reader for better understanding. The IPA phonetic notation is written within the symbol [ ], while the meaning in English is written within the symbol ( ). In the case of the word have more than one meaning, the symbol slash (/) is used as a separator. The example of the Thai word description shown as follows:

• นอน [nɔːn] (Sleep)

• สะพาน [sà pan] (bridge) 2

22

2.2..2..2..2.1111 Thai alphabets Thai alphabets Thai alphabets Thai alphabets

Words are the smallest units of a language that have meaning. Each word in the Thai language may consist of one or more syllables, which are formed by three types of alphabets: Consonants, Vowels, and Tonal Marks. Thai Consonants function as Initial Consonants and Final Consonants that can be used either as a single alphabet or a cluster of alphabets that may combine with a vowel and a tonal mark to form a word, which will be more detailed in the next section. Some examples of Thai words shown in Table 2.1.

(17)

Table 2 Table 2 Table 2

Table 2....1111:::: Examples of Thai words.

WordWord WordWord

First syllable First syllable First syllable

First syllable Second syllableSecond syllableSecond syllableSecond syllable Initial

Initial Initial Initial Cons Cons ConsCons....

Final Final Final Final Cons ConsCons

Cons.... VowelVowelVowelVowel Tonal Tonal Tonal Tonal Mark Mark MarkMark

Initial Initial Initial Initial Cons ConsCons Cons....

Final Final Final Final Cons Cons

ConsCons.... VowelVowelVowelVowel Tonal Tonal Tonal Tonal Mark Mark MarkMark กา [kaː] (crow) ก

[k] - า

[aː] - - - - -

สั้น [sân] (short) ส [s]

น [n]

◌ั

[a]

◌

[tʰoː] - - - -

สะพาน [sà pan]

(bridge)

ส

[s] - ะ

[a] - พ

[p]

น [n]

า

[aː] - เครื่องจักร

[kʰrɯ̂aŋ tɕàk̚]

(machine)

คร [kʰr]

ง [ŋ]

เ◌ือ [ɯa]

◌

[ʔàːk̚]

จ [tɕ]

กร [kr]

◌ั

[a] - โคลน [kʰloːn]

(mud)

คล [kʰl]

น [n]

โ

[o] - - - - -

ทั้งคู [tʰáŋ kʰûː]

(both)

ท [tʰ]

ง [ŋ]

◌ั

[a]

◌

[tʰoː]

ค

[kʰ] - ◌ู

[uː]

◌

[ʔàːk̚]

The method of writing words in Thai is to write the alphabets from left to right, and also can be written up to 4 levels in each position, as shown in Figure 2.2. The base level (Level 1) is used to write the consonants and some vowels while the lower level (Level 0) is used to write some vowels, and the other two upper levels (Level 2, 3) are used to write vowels and tonal marks. Each Thai alphabet type will be detailed as follows.

Fig Fig Fig

Fig. . . . 2222....2222:::: Level of Thai words written.

Consonant Consonant Consonant Consonant

Thai language has 44 consonants, which can be divided into 6 groups based on the organ in the mouth that causes the sound of the alphabet:

1. กัณฐชะ [kan tʰa tɕʰá] is the consonants pronounce from the base of the throat, which are:

ก [k] ข [kʰ] ฃ [kʰ] ค [kʰ] ฅ [kʰ] ฆ [kʰ], and ง [ŋ].

(18)

2. ตาลุชะ [taː lu tɕʰá] is the consonants pronounce from the base of the palate, which are:

จ [tɕ] ฉ [tɕʰ] ช [tɕʰ] ซ [s] ฌ [tɕʰ], and ญ [j].

3. มุทธชะ [mut̚ tʰa tɕʰá] is the consonants pronounce from the base of the gum with tongue, which are:

ฎ [d] ฏ [t] ฐ [tʰ] ฑ [tʰ] ฒ [tʰ], and ณ [n].

4. ทันตะชะ [tʰan ta tɕʰá] is the consonants pronounce from the base of the teeth with tongue, which are:

ด [d] ต [t] ถ [tʰ] ท [tʰ] ธ [tʰ], and น [n].

5. โอฐชะ [ʔoːt̚ tʰa tɕʰá] is the consonants pronounce from the base of the lip, which are:

บ [b] ป [p] ผ [pʰ] ฝ [f] พ [pʰ] ฟ [f] ภ [pʰ], and ม [m].

6. อวรรค [ʔà wak̚] is the consonants pronounce from the base other organs, which are:

ย [j] ร [r] ล [l] ว [w] ศ [s] ษ [s] ส [s] ห [h] ฬ [l] อ [ʔ], and ฮ [h].

Currently, two consonants, including ฃ and ฅ, have been canceled. Therefore, the remainder 42 consonants are used as initial consonants. According to the level of pronunciation, the initial consonants are divided into three types: Low-tone, Medium- tone, and High-tone, as follow:

• The Low-tone initial consonants consist of 24 consonants including:

ค [kʰ] ฅ [kʰ] ง [ŋ] ฆ [kʰ] ช [tɕʰ] ซ [s] ฌ [tɕʰ] ญ [j] ฑ [tʰ] ฒ [tʰ] ณ [n] ท [tʰ] น [n] ธ [tʰ] พ [pʰ]

ฟ [f] ม [m] ย [j] ร [r] ล [l] ว [w] ภ [pʰ] ฬ [l] ฮ [h].

• The Medium-tone initial consonants consist of 9 consonants including:

ก [k] จ [tɕ] ด [d] ต [t] ฎ [d] ฏ [t] บ [b] ป [p] อ [ʔ].

• The High-tone initial consonants consist of 11 consonants including:

ข [kʰ] ฃ [kʰ] ฉ [tɕʰ] ฐ [tʰ] ถ [tʰ] ผ [pʰ] ฝ [f] ส [s] ษ [s] ศ [s] ห [h].

For final consonants, only 35 Thai consonants are used and divided into eight categories according to its pronunciation as follow:

1. The final consonants, which pronounced as [k̚], are ก ข ค, and ฆ.

2. The final consonants, which pronounced as [t̚], are ด จ ช ซ ฎ ฏ ฐ ฑ ฒ ต ถ ท ธ ศ ส, and ษ.

3. The final consonants, which pronounced as [p̚], are บ ป พ ฟ , and ภ.

4. The final consonant, which pronounced as [ŋ], is ง.

(19)

5. The final consonants, which pronounced as [n], are น ญ ณ ร ล , and ฬ.

6. The final consonant, which pronounced as [m], is ม.

7. The final consonant, which pronounced as [j], is ย.

8. The final consonant, which pronounced as [w], is ว.

Nine consonants which cannot be a final consonant are ฃ ฅ ฉ ฌ ผ ฝ ห อ, and ฮ.

As mentioned above, the pronunciation of each consonant may differ according to using it as an initial consonant or final consonant, as shown in Table 2.2.

Table 2....2222:::: Thai consonants.

Consonant ConsonantConsonant Consonant

Pronunciation when used as Pronunciation when used asPronunciation when used as Pronunciation when used as

Consonant Consonant Consonant Consonant

Pronunciation when used as Pronunciation when used as Pronunciation when used as Pronunciation when used as Initial

Initial Initial Initial consonant consonant consonant consonant

Final Final Final Final consonant consonantconsonant consonant

Initial Initial Initial Initial consonant consonantconsonant consonant

Final Final Final Final consonant consonant consonant consonant

ก [k] [k̚] ท [tʰ] [t̚]

ข [kʰ] [k̚] ธ [tʰ] [t̚]

ฃ [kʰ] N/A น [n] [n]

ค [kʰ] [k̚] บ [b]]]] [p̚]

ฅ [kʰ] N/A ป [p] [p̚]

ฆ [kʰ] [k̚] ผ [pʰ] N/A

ง [ŋ] [ŋ] ฝ [f] N/A

จ [tɕ] [t̚] พ [pʰ] [p̚]

ฉ [tɕʰ] N/A ฟ [f] [p̚]

ช [tɕʰ] [t̚] ภ [pʰ] [p̚]

ซ [s] [t̚] ม [m] [m]

ฌ [tɕʰ] N/A ย [j] [j]

ญ [j] [n] ร [r] [n]

ฎ [d] [t̚] ล [l] [n]

ฏ [t] [t̚] ว [w] [w]

ฐ [tʰ] [t̚] ศ [s] [t̚]

ฑ [tʰ] [t̚] ษ [s] [t̚]

ฒ [tʰ] [t̚] ส [s] [t̚]

ณ [n] [n] ห [h] N/A

ด [d] [t̚] ฬ [l] [n]

ต [t] [t̚] อ [ʔ] N/A

ถ [tʰ] [t̚] ฮ [h] N/A

(20)

Thai Vowel Thai Vowel Thai Vowel Thai Vowel

There are 21 characters used solitary or combined as a diphthong to be 32 Thai vowels that divided into two types: Short-sound vowels and Long-sound vowels. Each vowel and its pronunciation is shown in Table 2.3.

Table 2....3333:::: Thai vowels Short

Short Short

Short----sound sound sound sound vowel vowelvowel vowel

Phonetic Phonetic Phonetic Phonetic annotation annotation annotation

annotation LongLong----sound LongLongsound sound sound vovo

vovowelwelwelwel

Phonetic Phonetic Phonetic Phonetic annotation annotation annotation annotation

ะ, ◌ั◌

^[à]

า

^[aː]

◌ิ

^[ì]

◌ี

^[iː]

◌ึ

^[ɯ^{̀ ]}

◌ื◌

^[ɯː]

◌ุ

^[ù]

◌ู

^[uː]

เ◌ะ, เ◌็◌

^[è]

เ

^[eː]

แ◌ะ, แ◌็◌

^[ɛ̀]

แ

^[ɛː]

โ◌ะ

^[ò]

โ

^[oː]

เ◌าะ, ◌็อ◌

^[ɔ̀]

◌อ, ◌็

^[ɔː]

เ◌อะ

^[ɤ̀ʔ]

เ◌อ

^[ɤː]

เ◌◌ียะ

^[iàʔ]

เ◌◌ีย

^[ia]

เ◌◌ือะ

^[ɯàʔ]

เ◌◌ือ

^[ɯa]

◌ัวะ

^[uàʔ]

◌ัว

^[ua]

ฤ

^[rɯ́]

ฤๅ

^[rɯː]

ฦ

^[lɯ́]

ฦๅ

^[lɯː]

◌ํา

^[am]

ใ◌

^[ai]

ไ◌

^[ai]

เ◌า

^[au]

How to use Thai vowels How to use Thai vowels How to use Thai vowels How to use Thai vowels

Usually, vowels can't be written without combination with initial consonants. Only four vowels, including ฤ [rɯ́], ฤๅ [rɯː], ฦ [lɯ́], and ฦๅ [lɯː] adapted from the Sanskrit language, are an exception which can be written alone without having to combine with

(21)

the initial consonants [22]. The method of using vowels in Thai can be summarized into 5 characteristics as follow:

1. Regular Forms: most vowels are written in the regular forms as some examples shown in Table 2.4.

Table 2 Table 2Table 2

Table 2....4444:::: Examples of Thai vowels in the Regular Forms usage.

Word WordWord

Word Phonetic annotationPhonetic annotationPhonetic annotationPhonetic annotation Meaning in EnglishMeaning in English Meaning in EnglishMeaning in English

มา [maː] Come

กิน [kin] Eat

ดู [duː] Look

ปี [piː] Year

อาหาร [ʔaː hǎːn] Food

2. Reduction: is to write the vowel part or do not write at all, but still pronounced that vowel. Some examples are shown as follow:

Initial Initial Initial Initial consonant consonant consonant

consonant VowelVowelVowelVowel Final Final Final Final consona consona consona

consonantntnt nt VowelVowelVowelVowel---- form form form form Reduction Reduction Reduction

Reduction WordWordWordWord Meaning Meaning Meaning Meaning in in in Englishin English English English ล

[l] + โ◌ะ

[ò] + ง

[ŋ] → โ◌ะ → ลง

[loŋ] Down พ

[pʰ] + ◌อ

[ɔː] + ร

[n] → ◌อ → พร

[pʰɔːn] Blessing ล

[l] + เ◌อ

[ɤː] + ย

[j] → ◌อ → เลย

[lɤːj] Pass ส

[s] + ◌ัว

[ua] + น

[n] → ◌ั → สวน

[suǎn] Garden

3. Changing : is to change a vowel form, For example:

Initial Initial Initial Initial consonant consonant consonant

consonant VowelVowelVowelVowel Final Final Final Final consonant consonant consonant

consonant VowelVowelVowelVowel---- form form form form Changing Changing Changing

Changing WordWord WordWord Meaning Meaning Meaning Meaning in English in Englishin English in English ก

[k] + ◌ะ

[à] + ด

[t̚] → ◌ะ→◌ั

[à] → กัด

[kàt̚] Bite

(22)

Initia InitiaInitia Initial l l l consonant consonant consonant

consonant VowelVowelVowelVowel Final Final Final Final consonant consonant consonant

consonant VowelVowelVowelVowel---- form form form form Changing Changing Changing

Changing WordWord WordWord Meaning Meaning Meaning Meaning in English in Englishin English in English ล

[l] + เ◌ะ

[è] + ง

[ŋ] → เ◌ะ→เ◌็◌

[è] → เล็ง

[leŋ] Point to a target ป

[p] + เ◌อ

[ɤː] + ด

[t̚] → เ◌อ→เ◌ิ

[ɤː] → เปิด

[pɤ̀ːt̚] Open ก

[k] + เ◌าะ

[ɔ̀] + N.A. → เ◌าะ→◌็

[ɔ̀] → [ɔ̂ː] → ก็

[kɔ̂ː]

Then;

So ก

[k] + เ◌าะ

[ɔ̀] + ก

[k] → เ◌าะ→◌็อ

[ɔ̀] → [ɔ́] → ก็อก

[pɔ́ːk̚] Tap

4. Cutting: is to cut the initial consonant อ [ʔ] at the beginning of some words that come from Pali and Sanskritwhich pronounces as [ʔà], for example:

Original word Original wordOriginal word

Original word Vowel Vowel Vowel Vowel Cutting Cutting Cutting

Cutting New New New New Word Word

WordWord Meaning in Meaning in Meaning in Meaning in English EnglishEnglish English อดิเรก

[ʔà dì ràk̚] → อ

[ʔà] → ดิเรก

[dì ràk̚] Prosper อภิปราย

[ʔà pʰî plaːj] → อ

[ʔà] → ภิปราย

[pʰî plaːj] Debate

5. Adding: is an additional form of a vowel, a specific case in adding อwhen a vowel◌ื

is used without any final consonant, for example:

Original word Original wordOriginal word

Original word Vowel Vowel Vowel Vowel Adding AddingAdding

Adding New New New New Word Word

WordWord Meaning in Meaning in Meaning in Meaning in English EnglishEnglish English คื

[kʰɯː] → อ → คือ

[kʰɯː] Is

ลื

[lɯː] → อ → ลือ

[lɯː] Spread

มื

[mɯː] → อ → มือ

[mɯː] Hand

ถื

[tʰɯ̌ː] → อ → ถือ

[tʰɯ̌ː] Carry

(23)

Tonal marks Tonal marks Tonal marks Tonal marks

Thai words have five different levels of tone. To modify the sound of a word to a different level, four tonal marks, as shown in Table 2.5 are used to combined with consonants and vowels.

Table 2....5555:::: Thai tonal marks Tonal mark

Tonal mark Tonal mark

Tonal mark Name of Tonal markName of Tonal mark Name of Tonal markName of Tonal mark

◌

◌◌

◌ เอก [ʔàːk̚]

◌

◌◌

◌ โท [tʰoː]

◌

◌◌

◌ ตรี [triː]

◌◌◌

◌ จัตวา [tɕàt̚ taː waː]

Forming of tone Forming of tone Forming of tone Forming of tone

The combination of an initial consonant, vowel, tonal mark, and final consonant may form 5 different tones of pronunciation including Mid, Low, Falling, High, and Rising, depended on two types of words: คําเป็น [kʰam pen], and คําตาย [kʰam taːj] [22].

คําเป็น [kʰam pen] is the words that are composed by a long-sound vowel without any final consonant, or the words that are composed by a final consonant sounded [ŋ], [n], [m], [j], or [w]. The usage of tonal mark and pronunciation of คําเป็น [kʰam pen] shown as Table 2.6.

Table 2....6666:::: The usage of tonal marks in คําเป็น [kʰam pen]

Initial consonants Initial consonants Initial consonants

Initial consonants ToneToneToneTone MidMid MidMid LowLowLowLow FallingFallingFalling HighFalling HighHigh RisingHigh RisingRising Rising Low-tone:

ค [kʰ] ฅ [kʰ] ง [ŋ] ฆ [kʰ] ช [tɕʰ]

ซ [s] ฌ [tɕʰ] ญ [j] ฑ [tʰ] ฒ [tʰ]

ณ [n] ท [tʰ] น [n] ธ [tʰ] พ [pʰ]

ฟ [f] ม [m] ย [j] ร [r] ล [l] ว [w]

ภ [pʰ] ฬ [l] ฮ [h]

Words ทา N.A. ทา ทา N.A.

Phonetic Phonetic Phonetic Phonetic annotation annotationannotation

annotation [taː] N.A. [tâː] [táː] N.A.

Medium-tone:

ก [k] จ [tɕ] ด [d] ต [t] ฎ [d]

ฏ [t] บ [b] ป [p] อ [ʔ]

Words กา กา กา กา กา

annotation [kaː] [kàː] [kâː] [káː] [kǎː]

High-tone:

ข [kʰ] ฃ [kʰ] ฉ [tɕʰ] ฐ [tʰ] ถ [tʰ]

ผ [pʰ] ฝ [f] ส [s] ษ [s] ศ [s] ห [h]

Words N.A. ผา ผา N.A. ผา

annotation N.A. [pʰàː] [pʰâː] N.A. [pʰǎː]

(24)

คําตาย [kʰam taːj] is the words that are composed by a short-sound vowel, or the words that are composed by a final consonant sounded [k̚], [t̚], or [p̚]. The usage of tonal mark and pronunciation of คําตาย [kʰam taːj] shown as Table 2.7.

Table 2....7777:::: The usage of tonal marks in คําตาย [kʰam taːj]

Initial consonants Initial consonants Initial consonants

Initial consonants ToneTone ToneTone MidMidMidMid LowLow FallingLowLow FallingFallingFalling HighHighHigh RisingHigh RisingRisingRising Low- tone:

ค [kʰ] ฅ [kʰ] ง [ŋ] ฆ [kʰ]

ช [tɕʰ] ซ [s] ฌ [tɕʰ] ญ [j]

ฑ [tʰ] ฒ [tʰ] ณ [n] ท [tʰ] น [n]

ธ [tʰ] พ [pʰ] ฟ [f] ม [m] ย [j]

ร [r] ล [l] ว [w] ภ [pʰ] ฬ [l] ฮ [h]

Words N.A. N.A. คะ คะ คะ

annotation N.A. N.A. kʰâ kʰá kʰǎ Medium-tone:

ก [k] จ [tɕ] ด [d] ต [t] ฎ [d]

ฏ [t] บ [b] ป [p] อ [ʔ]

Words N.A. กะ กะ กะ กะ

annotation N.A. kà kâ ká kǎ High-tone:

ข [kʰ] ฃ [kʰ] ฉ [tɕʰ] ฐ [tʰ] ถ [tʰ]

ผ [pʰ] ฝ [f] ส [s] ษ [s] ศ [s] ห [h]

Words N.A. ขัด ขั้ด N.A. N.A.

annotation N.A. kʰàt̚ kʰât̚ N.A. N.A.

2 22

2.2..2..2..2.2 Thai word2 Thai word2 Thai word 2 Thai word

There are seven types of Thai words: nouns, pronouns, verbs, modifiers (adjectives and adverbs), conjunctions, prepositions and exclamation [23].

Thai words are divided into two types, which are Base-word and Compound-word.

• A Base-word is the smallest unit of a language that is meaningful and cannot be separated into two or more smaller words. Each Base-word may have only one syllable or many syllables. Some examples of Basic words shown in Table 2.8.

Table 2....8888:::: Examples of Thai words Amount of syllables

Amount of syllables Amount of syllables

Amount of syllables WordWordWordWord Phonetic Phonetic annotatiPhonetic Phonetic annotatiannotationannotationon on Meaning in EnglishMeaning in EnglishMeaning in EnglishMeaning in English

1 กิน [kin] Eat

เดิน [dɤːn] Walk

2 สะอาด [sà-ʔàt̚] Clean

สหาย [sà-hǎij] Friend

More than 2 สวัสดี [sà-wàt̚-diː] Hello

อนาคต [ʔà-na-kót̚] Future

(25)

• Compound-word is a word formed by two or more Base-words to create a new word in various combination types, as shown in Table 2.9.

Table 2....9999:::: Examples of Compound-words Combination

CombinationCombination

Combination First wordFirst word First wordFirst word Second WordSecond WordSecond WordSecond Word CompoundCompoundCompoundCompound----wordwordword word Noun + Noun คน [kʰon]

(Human) + สวน [suǎn]

(Garden) → คนสวน [kʰon-suǎn]

(Gardener) Noun+Verb หอง [hɔ̂ŋ]

(Room) + นอน [nɔːn]

(Sleep)

หองนอน [hɔ̂ŋ-nɔːn]

(Bed room) Noun + Modifier ขาว [kʰâuw]

(Rice) + สวย [suǎj]

(Beautiful) → ขาวสวย [kʰâuw-suǎj]

(Steamed rice) Verb + Verb เดิน [dɤːn]

(Walk) + ทาง [taːŋ]

(Way) → เดินทาง[dɤːn-taːŋ]

(Travel) Verb + Noun ลง [loŋ]

(Down) + โทษ [tôːt̚]

(Punishment) → ลงโทษ [loŋ-tôːt̚]

(Punish) Verb +Modifier กิน [kin]

(Eat) + แหลก [lɛ̀ːk̚]

(Crushed) → กินแหลก [kin-lɛ̀ːk̚]

(Eat everything) Modifier

+Modifier

ดี [diː]

(Good) + งาม [ŋaːm]

(Beautiful) → ดีงาม [diː-ŋaːm]

(Very good)

There are also compound-words that are created for other purposes, such as transform a verb into a noun, transform adjectives into nouns, and transform adjectives into adverbs as follows:

2 22

2.2..2..2..2.3 3 3 Transforming of verbs into nouns3 Transforming of verbs into nounsTransforming of verbs into nounsTransforming of verbs into nouns

In the Thai language, we can transform verbs into nouns by adding some words in front of the verb, such as "ผู [pʰû]", "นัก [nák̚]", "ตัว [tua]", "เครื่อง [kʰlɯ̂ŋ]", "การ [kaːn]", etc., which can be divided into four cases as follows:

Case 1 Case 1 Case 1

Case 1:::: Transforming of verbs to nouns which are living things by adding the words "ผู

[pʰû]" or "นัก [nák̚]" in front of the verb (same as adding the suffix "er," or "our" in English), like the examples shown in Table 2.10.

Table 2....101010:::: Examples of word Transforming by "ผู [pʰû]" and "นัก [nák̚]" 10 Verb

Verb

VerbVerb Phonetic Phonetic Phonetic Phonetic annotation annotation annotation annotation

Meaning in Meaning in Meaning in Meaning in

English English English English

NounNounNounNoun Phonetic Phonetic Phonetic Phonetic annotation annotation annotation annotation

Meaning in Meaning inMeaning in Meaning in

English English English English เขียน [kʰiǎn] Write ผูเขียน [pʰû-kʰiǎn] Writter เขียน [kʰiǎn] Write นักเขียน [nák̚-kʰiǎn] Writter

ซื้อ [sɯ́ː] Buy ผูซื้อ [pʰû-sɯ́ː] Buyer

แสดง [sà-dɛːŋ] Act นักแสดง [nák̚-sà-dɛːŋ] Actor

(26)

Case 2: : : : Transforming of verbs to nouns which are Non-living things by adding the words

"ตัว [tua]" or "เครื่อง [kʰlɯ̂ŋ]" in front of the verb (same as adding the suffix "er", or "or" in English). The word "เครื่อง [kʰlɯ̂ŋ]" refers to the noun that is a machine or device, while The word "ตัว [tua]" refers to the other cases. Some examples are shown in Table 2.11.

Table 2....11111111:::: Examples of word Transforming by "เครื่อง [kʰlɯ̂ŋ]" and "ตัว [tua]"

VerbVerbVerb

Verb Phonetic Phonetic Phonetic Phonetic annotation annotationannotation annotation

Meaning Meaning Meaning Meaning in English in English in English

in English NounNounNounNoun Phonetic annotationPhonetic annotation Phonetic annotationPhonetic annotation Meaning in Meaning in Meaning in Meaning in English EnglishEnglish English พิมพ [pim] Print เครื่องพิมพ [kʰlɯ̂ŋ-pim] Printer คํานวณ [kʰam-nuan] Calculate เครื่องคํานวณ [kʰlɯ̂ŋ-kʰam-nuan] Calculator

เรง [rêːŋ] Accelerate ตัวเรง [tua-rêːŋ] Accelerator กระตุน [krà tûn] Activate ตัวกระตุน [tua-krà-tûn] Activator

Case 3:::: Transforming of verbs to the nouns that refer to action by adding the word "การ [kaːn]" in front of the verb (same as adding the suffix "ing" in English). Some examples are shown in Table 2.12.

Table 2....121212:::: Examples of word Transforming by "การ [kaːn]" 12 Verb

VerbVerb

Verb Phonetic Phonetic Phonetic Phonetic annotation annotationannotation annotation

English EnglishEnglish

English NounNounNounNoun Phonetic Phonetic Phonetic Phonetic annotation annotationannotation annotation

English English English English ทํางาน [tum-ŋaːn] Work การทํางาน [kaːn-tum-ŋaːn] Working

กิน [kin] Eat การกิน [kaːn-kin] Eating

วิ่ง [wîŋ] Run การวิ่ง [kaːn-wîŋ] Running

Case 4:::: Transforming verbs to the nouns, which are not referred to action, by adding the word "ความ [kʰwaːm]" in front of the verb, as the examples shown in Table 2.13.

Table 2....13131313:::: Examples of word Transforming by "ความ [kʰwaːm]"

Word Word

WordWord Phonetic Phonetic Phonetic Phonetic annotation annotation annotation annotation

English English English

English NounNounNounNoun Phonetic Phonetic Phonetic Phonetic annota annota annota annotationtiontiontion

English EnglishEnglish English

ตาย [taːj] Die ความตาย [kʰwaːm-taːj] Death

เห็น [hěn] See ความเห็น [kʰwaːm-hěn] Opinion

คิด [kíːd] Think ความคิด [kʰwaːm-kíːd] Thought

(27)

2 22

2.2..2..2..2.4 4 4 Transforming of adjectives into nouns4 Transforming of adjectives into nounsTransforming of adjectives into nounsTransforming of adjectives into nouns

To transform adjectives into nouns, the word "ความ [kʰwaːm]" should be added in front of the adjectives. Some examples are shown in Table 2.14.

Table 2....141414:::: Examples of word Transforming by "ความ [kʰwaːm]" 14 Adjective

Adjective Adjective

Adjective Phonetic Phonetic Phonetic Phonetic annotation annotation annotation annotation

English NounNoun NounNoun Phonetic Phonetic Phonetic Phonetic annotation annotationannotation annotation

Engli Engli Engli Englishshsh sh

ดี [diː] Good ความดี [kʰwaːm-diː] Goodness

สูง [sǔːŋ] High ความสูง [kʰwaːm-sǔːŋ] Height

สวย [sǔaj] Beautiful ความสวย [kʰwaːm-sǔaj] Beauty

222

2.2..2..2..2.5555 Transforming of adjectives into adverbsTransforming of adjectives into adverbsTransforming of adjectives into adverbsTransforming of adjectives into adverbs

By adding the word "อยาง [jàŋ]" in front of an adjective, the result word can be used as an adverb . Some examples are shown in Table 2.15.

Table 2....15151515:::: Examples of word Transforming by "อยาง [jàŋ]"

Adjective Adjective Adjective

Adjective Phonetic Phonetic Phonetic Phonetic annotation annotation annotation annotation

English AdverbAdverb AdverbAdverb Phonetic Phonetic Phonetic Phonetic annotation annotationannotation annotation

English English English English

ดี [diː] Good อยางดี [jàŋ-diː] Well

สูง [sǔːŋ] High อยางสูง [jàŋ-sǔːŋ] Highly

เร็ว [rew] Quick อยางเร็ว [jàŋ-rew] Quickly

As the reason that the Compound-word can be broken down into smaller words with different meanings, it is one of the main problems in word segmentation, which bring about a severe impact on the Semantic Analysis of sentences if the result of the word segmentation is incorrect.

2 22

2.2..2..2..2.6666 Numeral and quantity representationNumeral and quantity representationNumeral and quantity representationNumeral and quantity representation

Numbers and quantity can be represented by Thai numerals, Arabic numerals, or written in text as the examples shown in Table 2.16.

(28)

Table 2....16161616:::: Thai numbers and quantity representation.

Arabic Arabic Arabic Arabic numerals numerals numerals numerals

Thai Thai Thai Thai numerals numeralsnumerals numerals

Thai numeral Thai numeral Thai numeral Thai numeral

texts texts texts

texts Phonetic annotationPhonetic annotationPhonetic annotationPhonetic annotation

0 ๐ ศูนย [sǔːn]

1 ๑ หนึ่ง [nɯ̀ ŋ]

2 ๒ สอง [sɔ̌ːŋ]

3 ๓ สาม [sǎːm]

4 ๔ สี่ [sìː]

5 ๕ หา [hâː]

6 ๖ หก [hòk̚]

7 ๗ เจ็ด [tɕèt̚]

8 ๘ แปด [pɛ̀ːt̚]

9 ๙ เกา [kâu]

10 ๑๐ สิบ [sìp̚]

20 ๒๐ ยี่สิบ [jîː-sìp̚]

27 ๒๗ ยี่สิบเจ็ด [jîː-sìp̚-tɕèt̚]

53 ๕๓ หาสิบสาม [hâː-sìp̚-sǎːm]

605 ๖๐๕ หกรอยหา [hòk̚-rɔ̂ːj-hâː]

1380 ๑๓๘๐ หนึ่งพันสามรอย

แปดสิบ [nɯ̀ ŋ-pʰaːn-sǎːm-rɔ̂ːj-pɛ̀ːt̚sìp̚]

23456 ๒๓๔๕๖ สองหมื่นสามพันสี่

รอยหาสิบหก [sɔ̌ːŋ-mɯ̀ ːn-sǎːm-pʰaːn-sìː-rɔ̂ːj- hâː- sìp̚-hòk̚]

2 22

2.2..2..2..2.7777 QuantifyiQuantifyiQuantifyiQuantifying Nounng Nounng Nounng Noun

Quantifying Noun is a word used to compose with numerals to specify a volume of a noun [23], as some examples of Quantifying Noun shown in Table 2.17.

Table 2....17171717:::: Examples of Quantifying Nouns Category

Category Category

Category NounsNounsNounsNouns Quantifying NounQuantifying NounQuantifying NounQuantifying Noun Clothes เสื้อ [sɯ́a] (Shirt);

กางเกง [kaŋ-keŋ] (Trousers, Short) ตัว [tua]

Animals นก [nók̚] (Bird);

ปลา [plaː] (Fish) ตัว [tua]

Land vehicles

รถยนต [ród-yon] (Car);

รถบัส [ród-bât̚] (Bus) คัน [kʰan]

(29)

Category Category Category

Category NounsNounsNounsNouns Quantifying NounQuantifying NounQuantifying NounQuantifying Noun Air or water

vehicles

เครื่องบิน [klɯ̂aŋ-bin] (Airplane);

เรือ [rɯaː] (Boat) ลํา [lam]

Uncountable noun

กาแฟ [kaː-fɛː] (coffee);

ขาว [kʰâːw] (rice);

นํ้าตาล [nâːm-taːn] (sugar)

ถวย [tûaj] (cup);

จาน [tɕaːn] (dish);

กรัม [kram] (gram);

Others

ดินสอ [din-sɔ̌ː] (pencil);

ขนมปัง [kʰà-nǒm-paŋ] (bread);

เนื้อ [nɯ́a] (meat);

แทง [tɛ̂ŋ] (stick);

กอน [tɛ̂ŋ] (loaf, lump);

ชิ้น [tɕʰín] (piece);

อัน [ʔan] (stick, loaf, lump)

Different from English, Thai Quantifying Noun must be specified in all case of denoting the amount of a noun, for example:

เสื้อ[sɯ́a]

+ สอง [sɔ̌ŋ]

+ ตัว [tua]

→ เสื้อสองตัว [sɯ́a-sɔ̌ŋ-tua]

(shirt) (two) - (Two shirts)

นก [nók̚]

+

สาม

[sǎːm] + ตัว [tua]

→ นกสามตัว [nók̚- sǎːm-tua]

(bird) (three) - (Three birds)

รถยนต [rôd-yon]

+

สาม

[sǎːm] +

คัน

[kʰan] →

รถยนตสามคัน [rôd-yon-sǎːm- kʰan]

(Car) (three) - (Three cars)

ดินสอ [din-sɔ̌ː] + 2 [sɔ̌ ːŋ] + แทง [tɛ̂ ŋ]

→ ดินสอ2แทง [din-sɔ̌ː-sɔ̌ːŋ-tɛ̂ŋ]

(pencil) (two) (stick) (Two pencils)

ขนมปัง [kʰà-nǒm-

paŋ] + หา [hâː] + กอน[kɔ̂ n]

→

ขนมปังหากอน[kʰà-nǒm-paŋ- hâː-kɔ̂n]

(bread) (five) (loaf) (Five loaves of bread)

เนื้อ [nɯá]

+ 4 [sìː] + ชิ้น

[tɕʰín] → เนื้อ4ชิ้น [nɯá-sìː-kôn]

(meat) (four) (piece) (Two pencils)

ดินสอ [din-sɔ̌ː] + 2 [sɔ̌ːŋ] + แทง [tɛ̂ŋ]

→ ดินสอ2แทง [din-sɔ̌ː-sɔ̌ːŋ-tɛ̂ŋ]

(pencil) (two) (stick) (Two pencils)

กาแฟ [kaː-fɛː] + 3 [sǎːm] + ถวย

[tûaj] → กาแฟ3ถวย [kaː-fɛː-sǎːm-tûaj]

(coffee) (three) (cup) (Three cups of coffee)

(30)

Aside from telling the number, Quantifying Nouns are also used with nouns in the case of adding modifiers. For example:

สุนัข[sù-nák]

+ ตัว [tua]

+ นี้ [ní]

→ สุนัขตัวนี้ [sù-nák-tua-ní]

(dog) (body) (this) (This dog)

นก [nók̚]

+ ตัว [tua]

+ ใหญ [jài]

→ นกตัวใหญ [nók̚-tua-jài]

(bird) (body) (big) (Big bird)

บาน [bâːn]

+ หลัง[lǎŋ]

+ นั้น [nán]

→ บานหลังนั้น [bâːn-lǎŋ-nán]

(house) (building) (that) (That house)

2

222....3333 Thai sentencesThai sentencesThai sentencesThai sentences

As shown in Figure 2.3, the Thai sentence consists of two main parts: Subject and Predicate. The Subject is the part to represent the actor, which may be a noun, pronoun, noun-phrase, or sentence. In some cases, modifiers are added to describe properties or details of the Subject.

The Predicate is a verb or verb phrase used to shows the manner of acting of the Subject. Similar to Subject, Predicate may consist of a modifier. Moreover, some verb requires an Object to complete the sentence’s meaning. Same as the Subject, Object is a noun, pronoun, noun-phrase, or sentence, which sometimes has a modifier [22].

Fig Fig Fig

Fig. . . . 2222....3333:::: Thai Sentence Structure

Examples of Thai sentences shown as follows:

(31)

Sentence Sentence Sentence

Sentence WordsWordsWordsWords Meaning in EnglishMeaning in English Meaning in EnglishMeaning in English

ฉันเดิน

ฉัน เดิน

I walk

→ [tɕʰǎn] [dɤːn] →

I walk

นกกินหนอน

นก กิน หนอน

A bird eats a worm

→ [nók̚] [kin] [nɔ̌n] →

Bird eat worm

นกกินหนอนตัวใหญ

นก กิน หนอน ตัว ใหญ

A bird eats a big worm

→ [nók̚] [kin] [nɔ̌n] [tua] [jài] → Bird eat worm body big

นกตัวใหญกินหนอน

นก ตัว ใหญ กิน หนอน

A big bird eats a worm

→ [nók̚] [tua] [jài] [kin] [nɔ̌n] → Bird body big eat worm

Note: In the Thai language, there is no articles or adding s to specifying singular/plural nouns; therefore, "นกกินหนอน" may mean like one of these sentences:

• Birds eat worms.

• Birds eat a worm.

• A bird eats worms.

• A bird eats a worm.

From the examples above, modifiers always placed after the nouns or verbs that need to expand the meaning, for examples

ทํา [tam] + ดี [di] → ทําดี [tam-di]

(do) (good) (Do good things)

นก [nók̚] + ตัว [tua] + ใหญ [jài] → นกตัวใหญ [nók̚-tua-jài]

(bird) (body) (big) (Big bird)

บาน [bâːn] + สวย [sǔaj] → บานสวย [bâːn-sǔaj]

(house) (beautiful) (Beautiful house)