• Keine Ergebnisse gefunden

3.2 Representation of Information

3.2.3 Hierarchical Representations

Data and its contained information can be structured by grouping it and by specifying the relations between objects of the data. The method of a thesaurus can be regarded as an interstage towards more elaborated methods. In a thesaurus, collections of words and terms, which can be regarded for example as synonyms, are stored: “A thesaurus is a set of items (phrases or words) plus a set of relations between these items.135 This can be supported by the comment of Daniel Naber:

“A thesaurus is a dictionary that lists words which have a similar or related mean-ing. The most simple case is a pair of synonyms, i.e. two words which have the same meaning. [...] It can serve as a simple dictionary-replacement that explains the meaning of a word by listing other words with the same or with a similar mean-ing. [...] Knowledge about synonyms might also be used by search engines to find documents that contain information about the subject one is looking for, but that uses different terminology.” (Naber 2004)

By using this method, it is tried to group terms according to their characteristics and relations.

It might also be possible to arrange sets of terms hierarchically.

This leads to a method called taxonomy. One should be aware that this term is used differently in several disciplines. For example, in natural science, it is often a hierarchical classification of subjects of research according to their features. In computer science or IT, a more prag-matic comprehension is given: “A taxonomy indicates only class/subclass relationship [...] A taxonomy is a hierarchy and a unique code is usually assigned to each node of the hierar-chy. This code also encodes its path.”136 A format that was released in the context of the Semantic Web is, as mentioned, RDFS. RDFS can be seen as a format in which taxonomies can be realised. For example, classes and subclasses or property ranges and domains can be provided.137 The authors Eckstein and Eckstein even say: “RDF-Schema dagegen erlaubt außerdem die Bildung von Begriffshierarchien, so genannten Ontologien, f¨ur die semantische Einordnung von Begriffen.”138 One can treat taxonomies and ontologies as similar methods, but there are slight differences between taxonomy and ontology. Furthermore, one should note that some researchers subsume thesauri, taxonomies, and maybe also semantic networks under the term of ontology. Barry Smith also mentions connections between a taxonomy and an ontology:

135Jing and Croft 1994

136Dogac, Laleci, Kabak, and Cingil 2002

137see Ziegler 2004, p.128

138Eckstein and Eckstein 2004, p.235

34 Chapter 3 Information Modelling and Representation

“Gradually, however, it was recognized that the provision, once and for all, of a common reference ontology – a shared taxonomy of entities – might provide sig-nificant advantages over such case-by-case resolution, and the term – ontology – came to be used by information scientists to describe the construction of a canon-ical description of this sort. An ontology is in this context a dictionary of terms formulated in a canonical syntax and with commonly accepted definitions designed to yield a lexical or taxonomical framework for knowledge-representation which can be shared [...] More ambitiously, an ontology is a formal theory within which not only definitions but also a supporting framework of axioms is included (perhaps the axioms themselves provide implicit definitions of the terms involved).” (Smith 2003)

One can observe that there can be only slight distinctions between a taxonomy and an ontology.

But an important difference might be the “framework of axioms” which is only included in the model of an ontology. In this thesis, taxonomies and ontologies are regarded as separated methods. Even though taxonomies and ontologies are similar methods to express hierarchical structures. Because of the importance of ontologies to the presented approach, a brief excursus to the origin of ontologies is given in the following.

The term ontology originates from philosophy and contains the study of being or existence in general.139 Puppe, Stoyan, and Studer explain that by using the method of an ontology, the existence of things in the world can be described and classified.140 By transferring the term to the AI, it was restricted on modelling of concepts of the real world in computers.141 The following definition is often preferred in the research of AI: “An ontology is a formal, explicit specification of a shared conceptualisation.”142In general, an ontology represents a hierarchical conception of a part of the world, called “domain”. The representation should be produced in a machine-readable language. Natalya Fridman Noy and Deborah L. McGuinness explain:

“An ontology defines a common vocabulary for researchers who need to share information in a domain.”143 An ontology consists of a set of objects, concepts, and further entities, which are related to each other.144 These objects are divided into classes, also concepts, properties (alsoslots, orroles) and the restrictions of the roles.145 Additionally, so-calledinstances of the classes represent individual objects of the selected domain. By relating main classes and more specialised ones of an ontology, a hierarchy can be created.

Regarding the relations between objects of a domain, different kinds of relations are possible:

The most common relation is called isa. Charniak and McDermott explain: “...isa says that one class is a more general version of another. Roughly speaking, the distinction is like that in set theory betweenelement andsubset.”146 This also means that a more specific class inherits properties of the more general one. The isa relation is often used for building taxonomies

139Philosophie-Lexikon 1991

140Puppe, Stoyan, and Studer 2000, p.622

141see Puppe, Stoyan, and Studer 2000, p.622

142Gruber 1992, p.199

143Noy and McGuinness 2001, p.1

144see Noy and Hafner 1997, p.53

145see Noy and McGuinness 2001, p.3

146Charniak and McDermott 1985, pp.26/27

3.2 Representation of Information 35

of animal species, wine sorts, etc. A second relation can be a part-whole relation. As the name says, relations between an object and its containing parts are defined. They are used in many fields: “Part-whole relations are one of the basic structuring primitives of the universe, and many applications require representation of them - catalogues of parts, fault diagnosis, anatomy, geography, etc..”147 But this kind of relation should not be mixed with the relation between a class and its instances. Charniak and McDermott define this relation asinst: “...inst says that a particular individual is a member of some class...”148 In contrast to a part-whole relation, a set is related to an atomic element which cannot be divided further. But two classes can be represented as a whole to its part. The class, which represents a part, can have further instances.

In addition to these different kinds of relations included in ontologies, different types of ontolo-gies can also be defined. Because of the usage of ontoloontolo-gies in many application areas, Puppe, Stoyan, and Studer distinguish the following types of ontologies: Domain ontologies are only applied in certain areas whereas the conception of a general ontology is valid beyond a single domain. The third type of an ontology is the method ontology which is often used for problem solving strategies. More functional ontologies define types of special tasks.149

Different languages developed are available for representing ontologies of any kind in a machine-readable format. They can be divided into informal (or graphical) and formal representation languages.150 The language CML (Conceptual modelling Language), a semi-formal language with graphical notation, can be called as informal whereas KIF (Knowledge Interchange For-mat), which enlarges the first-order calculus, belongs to the formal languages.151

As mentioned, OWL is a standard, which was developed for purposes in the context of the Semantic Web. For a realisation of data in OWL, a XML syntax is used even though the theoretical background consists of more ideas, e.g. formal-logic conceptions. As Antoniou and Harmelen say OWL builds on RDF and RDF Schema, and uses RDF’s XML syntax.152 Because of the combination of logic-based constructs and the syntax of the popular mark-up language XML, a wide range of applications is possible. Thus, logic-based applications as well as manipulations and retrieval with XML related programming languages could be executed.

The formal language OWL is available in three species, OWL Full, OWL DL (Description Logic), and OWL Lite. They are developed for the different needs of users. OWL Full uses all OWL language primitives and also allows possibilities to change pre-defined primitives.153 This means that every object of an ontology can be defined in any way. In contrast, the sub-language OWL DL restricts some options possible in the full version to avoid problems for further applications. So, especially logic-based applications should be used with an OWL DL ontology. Antoniou and Harmelen state about the restrictions: “...any resource is allowed to be only either a class, a datatype, a datatype properties, an object properties, an individuals,

147http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole(last accessed October 30, 2007)

148Charniak and McDermott 1985, p.26

149see Puppe, Stoyan and Studer 2000, p.623

150Puppe, Stoyan und Studer 2000, p.624

151Puppe, Stoyan und Studer 2000, p.624

152see Antoniou and Harmelen 2003, p.68

153see Antoniou and Harmelen 2003, p.70

36 Chapter 3 Information Modelling and Representation

a data value or part of the built-in vocabulary, and not more than one of these.”154 Objects have to be defined clearly and they cannot be used in different ways. Users only creating a limited ontology or hierarchic structure can use OWL Lite, a subset of OWL DL. In OWL Lite, enumerated classes, disjointness statements, and arbitrary cardinality are not available.155 In the next section, the presented modelling methods are compared and discussed to make a clear distinction which kind of model should be used for the presented approach.