• Keine Ergebnisse gefunden

Ontological Analysis

Im Dokument What is the Real Question? (Seite 53-59)

Duffand Johnson (2001) remain close to the syntactic level of the inquiry when analyzing the given and wanted information entities. The Wanteds remain abstract and unspecific while the Givens describe only isolated information entities with no further meaning or context.

Furthermore, the types of questions submitted reveal the essentially archival point of view taken by Duffand Johnson (2001) during the analysis of the inquiries. The Wanteds in particular demonstrate shortcomings in the analysis of and reflection upon the interest of the users.

Since the contents of archival materials cannot be easily conveyed to the user because of the sheer quantities of records available in archives, digital archival information systems and (digital) archival finding aids are eligible means for enabling and facilitating access. These systems rely on the reduction and normalization of the available information pertaining to records, in order to handle formal queries and to retrieve sets of relevant records. Merely knowing the type of given information users bring to an archive and archival information systems, however, does not suffice in order to gain a deeper understanding of user needs and interests, and particularly not for the design of better archival information systems.

The ultimate aim of the study at hand is to contribute to the improvement of automatic query facilities of archival information systems. These automatic queries must be formulated in an ontological form. The intension of the termontologyhas significantly expanded over the last decade. In philosophy, the term means the study of being. Since the adaption of the term, especially by the computer and information sciences, its scope has grown to include a broad range of data models, controlled vocabularies, or explanatory models. In this study, ontology is understood as aconceptual modelwhich comprises formalized knowledge as clearly defined concepts and relationships pertaining to the possible state of affairs in a knowledge domain.

Ontology does not denote a data model as in the computer sciences but rather the domain of discourse referenced by the inquiries.

The analysis conducted by Duffand Johnson (2001) exhibits asemantic mismatchwith a query system. This is because it does not provide the necessary ontological structures with which to investigate how an information system needs to be designed in order to adequately respond to the interest of a real or hypothetical question or to develop new and better data structures. In order to better understand the sense of an inquiry and to identify the subject matter of its interest, the character of the Givens and especially of the Wanteds needs to be further investigated and analyzed on an ontological level.

This study, therefore, further interprets the Wanteds by analyzing the subject matter which typically appears as theinterestof the inquiry from a user point of view, or, in other words, the common domain of discourse to which the inquiries refer. Duffand Johnson (2001) adhere to thesyntactic form (Aussageform)of an inquiry while the methodological approach of this study proceeds further to the analysis of theepistemological form (Erkenntnisform)of an inquiry. The guiding question during the interpretative analysis asks what the user needs to know in order to satisfy the perceived interest of the inquiry.

In order to answer questions, the first basic prerequisite for an information system is a structuring of its information contents according to an appropriate schema. However, there is a semantic mismatch between the question and the schema because one cannot expect to recognize a schematic entity in the question. An ontology is needed between the schema and the question because the meaning of natural language is only comparable with an ontology. The technical reasons are that our thinking, and hence the meaning of natural language, implies deep subsumptions of hierarchies of concepts and relationships, and hence cannot be represented with relational or XML schemata (Oldman et al.,2014). Examples for ontologies created for natural language processing are CyC27, one of the first formal ontologies, and WordNet28. No simpler method of representing the meaning of natural language has been found so far.

As has been shown in Doerr (2003) and Doerr and Iorizzo (2008), the representation of relationships is pivotal to such ontologies because queries need relationships as structural parts, whereas terms can be dealt with as data items. For this reason, this study creates an ontology and does not supplement a schema such as EAD29because this would render the results dependent on a particular form of the schema, here the tree-like data structure of XML and the tags and attributes of EAD, and would neglect the sense of the questions. The result would only be a projection of the latter, and there would be no control over the adequacy and completeness of the data structure (based on the schema) in relation to the original question. Rather, this study deems it more appropriate to begin with the inquiry, its grammar and words, to proceed to the ontology, and then continue from the ontology to the schema; in doing so, this study focuses on the penultimate step in the process. Different schemas and data structures can be created from the ontology and remain compatible. These data structures can then be queried in an information system. Whether these queries are useful or meaningful can only be determined on the level of the ontology because the latter represents the sense and interest of the inquiries. This relationship between schema and ontology has been discussed, for example, by Gruber (1995, 2007), Guarino (1998), and Calvanese et al. (1998), and has not been challenged since.

More in-depth knowledge of the ontological character of the given and wanted information is therefore needed in order to obtain a deeper understanding of the character and subject matter of the interest of the inquiries. Their interpretation is determined not only by the immediate recognizable sense of the questions and associated contextual information, both made explicit by the given and wanted entities, but also by an epistemological framework. In other words, the inquiries are posed in the context of a particular domain of discourse, which determines how the sense and subject matter of the interest of a question can be further determined through interpretation. This epistemological framework comprises thearchival domainof record keeping and thedomain of historical inquiryfor which traces and evidence can be expected to be found in the archive. However, the selected ontology for formalizing the results from the

27http://www.cyc.com/

28https://wordnet.princeton.edu/

29http://www.loc.gov/ead/

interpretative analysis also influences the interpretative process. Furthermore, the interpretation and ontological analysis of the inquiries rely on educated intuition regarding both domains and necessarily filters probable implicit questions.

The first step in the ontological analysis is the identification and categorization of the given and wanted entities referenced in the inquiries. They constitute the first building blocks of the ontological model.

1.2.1 Referenced Entities

While Duffand Johnson (2001) have conducted a predominantly linguistic analysis by identifying different types of information entities as Givens and Wanteds, thus adhering to the paradigm of the keyword, this study applies an ontological analysis. In order to create an adequate ontological model, the entities referenced in the domain of discourse of the inquiries need to be identified.

In the case of theGivens, for example, Duffand Johnson (2001) categorize according to the type of given information entity such asproper namefor a given “personal name”, “corporate name”, or any other “name of an entity”. In contrast, this study analyzes which entity is referenced by a Given, for example, apersonis referenced by a personal name or agroupis referenced by a corporate name. In order to distinguish between the different conceptualizations, the termgiven entitywill henceforth be used instead of “Given”. Agiven entityrepresents an entity referenced by the user in the question or the associated contextual information by a word or expression in order to describe and qualify the wanted entity.

The given entity is determined by its use in the wider context of the inquiry. For example,

“German Democratic Republic (GDR)” may be referred to as a geographical location, as a conscious actor, or as a period denoting the time frame of the existence of the GDR. Furthermore, the given entities are further distinguished into particular entities and general types of entities;

for example, a particular person, such as “Konrad Adenauer” or “Erich Honecker”, and a type of person, such as “photographer” or “grandfather”.

In this study, supplementary information is not excluded from coding or analysis. On the one hand, it is difficult to categorize any information as insignificant, especially in an archival setting where contextual information can take on a vital role in discovery processes and the location of archival materials. On the other hand, all information a user provides to describe the information need can be deemed relevant in some way simply because the user decided to provide it. Finally, every item of information provided does carry a potential relevance for the analysis of the subject matter of the interest of the inquiry and should therefore be considered.

In the case of theWanteds, Duffand Johnson (2001) only provide general and highly accu-mulative categories broadly determining the type of the wanted information such as “form”,

“recommendation”, or “general or background”. Determining the Wanteds in this way proves to be neither suitable nor productive for the purposes of the study at hand, particularly in view of the unspecific and overly broad categories for the wanted information types determined by Duff and Johnson (2001).

Here, however, the Wanteds will be referred to aswanted entities. These represent a type of entity primarily referenced by the wanted archival or non-archival material or by a wanted fact. For example, in the case of a material-finding question for documents about a particular person, the wanted entity would be that particular person. In case of a factual question asking, for instance, for the name of a particular person who has been at a particular place, the wanted entity would again be that particular person.

Since the type of question already indicates whether the user is interested in – orwants– a resource or factual information, the wanted entity further substantiates the information the user seeks to obtain by asking the question which is typically not simply the location of a specific document, or general or background information, but relates to a specific type of entity such as a particular person, event, or place. The wanted entity thus also indicates the most recognizable general interest of an inquiry.

Interpreting the Wanteds this way allows our analysis to move away from the immediate utterance of the question and to take a closer look at the actual epistemological interest of the user embodied in the inquiry. To be sure, the Wanteds only approximate the information need and immediately recognizable primary interest of the questions – again beyond an interest in resources or factual information – in terms of the most recognizable referenced entity.

The categories for the referenced given and wanted entities both emerged iteratively from the analysis of the questions and the associated contextual information. Similar kinds of entities were successively allocated broader categories, mainly based on the conceptualization of the ontology CIDOC CRM (IV:3.2), which provides a sound conceptual framework for determining and formalizing the type of referenced entities.

The given entities are grouped into the following seven categories:Actors, including persons and groups,Documents,Events and Activities,Time,Place,Things, andOther Entities, including general context and identifiers. The wanted entities comprise as principle categoriesActors, including persons and groups,Events, including unintentional events and activities – both concepts will be further explained later on (IV:3.2) –,Documents,Things,Places, andGeneral Topics. Most categories contain one or more sub-categories, further specifying the nature or context of the wanted entity. Both the wanted and given entities are discussed in more detail and exemplified in the next chapter (V:1.2).

The identification and abstraction of the given and wanted entities to common conceptualiz-ations is the first step of the ontological analysis. The occurring phenomena in the inquiries and their domain of discourse are reduced significantly during this process.

However, the ontological analysis of the wanted and given entities is only the initial step away from the level of utterance of the question and towards the identification and ontological formalization of the wider shared domain of discourse and the subject matter of the inquiries.

1.2.2 Patterns

The next step in the ontological analysis further interprets the inquiries with regard to their shared domain of discourse. The goal is to ontologically formalize relevant entity types and especially the relationships between these entities, which together describe the shared subject matter of the interests of the inquiries. The previously identified given and wanted entities provide initial building blocks.

Formally, an ontology engineering approach is employed in that the inquiries and their interpretations are translated into an ontological model. The domain of discourse described by these entity types and relationships represents the typical and shared subject matter of the interests of the inquiries. The result of this process is an ontology called theArchival Knowledge Model(AKM).

As a whole, the AKM can be partially understood in analogy to what has been described as a frame. Minsky (1974) has defined frame as a “framework to be adapted to fit reality by changing details as necessary” and as a “data-structure for representing a stereotyped situation” consisting of a “network of nodes and relations”. The AKM resembles such a frame in that it describes a general network of entity types, the “nodes”, and relationships (IV:3.1) between these nodes, and thus also describes general “stereotyped situations” of a past historical reality appearing as the subject matter of user needs and interests of inquiries made of archives.

The AKM can be further differentiated into severalgeneral patterns, which constitute typical segments or aggregations of the ontology as a whole. These general patterns also resemble frames and consequently embody various specific “stereotyped situations” of historical contexts to which the subject matter of the interest of inquiries epistemologically refer. In other words, they describe typical and specific contexts of user interests communicated to archival information systems.

The AKM and its general patterns are the result of the interpretation and ontological analysis of the inquiries. The main constituents of the ontology AKM are relationships, entity types and scope notes; that is, definitions of the relationships and entity types based on the further analysis of the given and wanted entities, and the relevant relationships that exist between them.

Which relationships are relevant depends on the further determination of the common subject matter that adequately describes a historical reality to which the interests of inquiries refer. In other words, entities and relationships together provide the necessary historical and contextual knowledge in order to formulate a query that would retrieve potentially adequate documents or facts serving the perceived interest of an inquiry.

Since the inquiries are not self-contained, the determination of said subject matter is further based on the wider epistemological context of the interpretation, which includes the historical and archival domain (IV:2), as well as common sense background knowledge. Both domains provide concepts necessary to adequately represent the subject matter of the interest of an inquiry.

A second principal type of pattern which primarily serves demonstrative purposes will be

introduced at this point. The general patterns are situated on the schema level of the AKM and hold entity types and relationships as the general concepts of the relevant phenomena which exist in the domain of discourse. The instance level contains particular examples of the general entity types and relationships, and together with the schema level constitutes the knowledge base. Here we will distinguish between two types ofinstantiated patterns.

The first type of instantiated pattern is that ofquery patterns. The general patterns can be instantiated with specific examples of entity types and thus adapted to the aforementioned

“specific needs”. In the context of this study, these “specific needs” are the specific interests of inquiries. Query patterns represent instantiated general patterns with given entities from an inquiry and specify a primary query target. In other words, query patterns indicate potential adequate queries for retrieving relevant documents or facts in order to respond to the interest of a question.

As will be discussed in greater detail in the next chapter (V:1.5), query patterns always instan-tiate elements from two general patterns: one provides entities which describe the provenance of the wanted archival materials or facts while the second provides entities which represent the

“aboutness” of the same.

The query patterns are based on the assumption that machine-supported search and discovery systems may effectively process data described by a schema based on the AKM. The search system;

that is, the query patterns, allow potentially unlimited numbers of queries as instantiations of general patterns and thus a virtually unlimited number of questions.

As will be discussed in more detail during the ontological analysis of the inquiries (V:1.3), the interpretation of questions often allows for conclusions on more than one potential adequate query, so-calledindirections. Since distinct query patterns cannot be created for each individual potential query, query patterns typically subsume more than one. In this regard, the query patterns neither constitute efficient query formulas nor do they even cover every potential indirection. The query patterns illustrate that the interest of a question can be served by a formal potential query against the subject matter described by the AKM. As such, the AKM provides a relatively small set of relevant entity types and relationships necessary for creating data structures adequate for serving typical interests of archival inquiries.

The second type of instantiated pattern is called anarchival patternand is solely used in the chapter “Application” (VI). While query patterns indicate potential queries and exemplify the relationship between the inquiries and the ontology, the AKM, archival patterns show how real-life metadata, here encoded with EAD (VI:1), can be represented in general patterns of the AKM. In this regard, archival patterns exemplify the relationship between schemas and data structures and the AKM. Again, archival patterns are auxiliary vehicles for demonstrating how real-life data can be analyzed and that the latter already contain explicit and implicit information relevant to the AKM.

The general patterns, query patterns, and archival patterns will be visualized in diagrams.

The notation used in these diagrams will be introduced later in this chapter (IV:3.3).

Im Dokument What is the Real Question? (Seite 53-59)