Fundamental Phenomena - What is the Real Question?

As previously discussed (V:1.3), query patterns subsume several possible queries and not all possible indirections, or interpretations, but only the most relevant are either shown in diagrams or discussed in the accompanying text. Additionally, some basic modelling patterns of funda-mental phenomenaare omitted from the following discussions and diagrams in order to retain readability and reduce unnecessary complexity and redundancy. These fundamental phenomena

represent basic and reoccurring ontological structures which apply to all general patterns, such as the fact that activities may occur in sequence or that documents may form logical parts of other documents.

These fundamental phenomena are normally hidden, or implicit, in the diagrams and not explicitly discussed in the text since they typically pertain to any general pattern or query pattern.

Such fundamental phenomena would need to be considered in query formulation in the context of an information system. In the context of this study, query patterns as exemplary instantiations of general patterns demonstrate the principle feasibility of representing interests of inquiries. As already mentioned, the purpose of the query patterns is not to constitute actual computational queries. Tzompanaki and Doerr (2012b) presented some fundamental patterns (by which they mean short-cuts) which hide more complex query patterns. Fundamental phenomena are similar and indeed, if proceeding to actual query implementation, such more complete representations would need to be considered. The focus of this study is, however, the theoretical and ontological level.

Thefirstfundamental phenomenon pertains to actors; that is,persons and groupswho carry out activities and actively or passively participate in events. If the inquiry clearly allows for the conclusion that either must have acted in the particular case, then this entity is used in the query pattern. However, in any case, the person or the group could have been a member of another group which may have carried out the activity or under whose authority the person or group may have acted. Therefore, as shown in Figure22, a query will always have to consider a person and a group, both of which may be the member of one or more other groups, and all of which may have carried out an activity.

Figure 22– Groups or persons carrying out an activity.

An inquiry may not allow for the conclusion as to whether a person or group must have carried an activity. In such cases, the query pattern will only contain anE39 Actor. Figure23 shows the complete underlying pattern of the fundamental phenomena.

The classE39 Actorentails possible instances of the classesE21 PersonandE74 Group, and, as previously discussed, membership in a group also needs to be considered. Figure23is thus a shorthand version of Figure22.

Figure 23– Actors carrying out an activity.

The aforementioned fundamental phenomenon essentially also holds for the participation of persons and groups in events. Actors may do so as a single person, a group, or member of a group.

Thesecondfundamental phenomenon pertains topart-whole relationsandsequences. Events and activities as well as information objects and physical objects may be composed of sub-parts or they may be related to each other in some sequential order. Even though the sequences and part-whole relations are briefly introduced in the context of general patterns, these fundamental phenomena receive no further attention in the query patterns.

While the previous two fundamental phenomena are only hidden from the general and query patterns, the following are excluded from the discussion that follows, mostly due to their complexity, which would exceed the scope of this study and indeed be unnecessary from an illustrative point of view.

Thethirdfundamental phenomenon pertains totemporalandspatial entities. As already mentioned during the introduction of the CRM (IV:3.2), the representation of time and place is not further explored. Time-spans during which activities or events occurred are simply represented using the classE52 Time-Spanand the geographical location of an activity or event is represented using the classE53 Place. The fully fledged and adequate representation in both cases would require additional classes and properties, and entail further conceptual issues and use cases such as one time-span of an activity falling within another.

Thefourthfundamental phenomenon pertains to the representation of particulars and types in the CRM. As already mentioned, the CRM allows for a form of type categorization that does not result in new classes but in a terminological differentiation of an existing instance.

This facility can be used to address interest; that is, queries concerning particular things and kinds of the same thing. For example, the particular document with the minutes of a particular press conference versus all documents of the type “minutes” about all activities of the type

“press conference”. In principle, this mechanism can be used everywhere and is not explicitly

represented in the general patterns. The classE55 Typeis also used in this role in several specific cases already in the CRM, such as to represent the general use of an item.

Finally, thefifthfundamental phenomenon pertains to the representation ofappellationsand identificationsof CRM entities. The instances represented in a knowledge base using the CRM can be named and provided with an identifier. The CRM provides the classE41 Appellation, which comprises “all proper names, words, phrases or codes, either meaningful or not, that are used or can be used to identify a specific instance of some class within a certain context” (Crofts et al.,2006, 20). Several sub-classes allow for the representation of specific kinds of appellations, such as actor appellation, place appellation, or time appellation, addresses, spatial coordinates, titles, or identifier. The general patterns neither explicitly represent such a level of detail nor will the query or archival patterns do so. This study refers to the facilities of the CRM for the representation of appellations and identifiers but will not discuss them further.

In particular, the identification and location of the physical items within the archive is explicitly excluded from the following discussion. The workings of archival identification systems for archival items, for example via call numbers, is neither discussed nor explicitly included in the general patterns.

The interest of the study at hand is not the identification of particular resources within an archival information or documentation system. The query patterns do not, therefore, make statements about the actual identification of physical items in an archive. For example, call num-bers are only explicitly represented in theDocumentspattern (V:2.2.6) in order to demonstrate that, in principle, all necessary archival identification information could be incorporated. Here too, the AKM does not include a detailed model of the archive. Its primary interest is in the

“historical reality”, not the current “archival reality”.

The meaning and identity of instances of classes will be sufficiently apparent from their labels provided in the diagrams. In reality, such labels would have to be replaced by proper global identifiers and readable titles or texts that would allow for precise searches and string-matching.

Their implementation is the task of cataloguing rules such asResource Description and Access (RDA).⁶⁷

The general patterns will therefore typically omit the aforementioned fundamental phenom-ena for better readability. The fundamental phenomphenom-ena will normally be introduced only once at the first and most general level of occurrence.

1.7 Summary

The current section has reported on various results from the interpretative analysis, each of which constitutes important outcomes but also, at the same time, paves the way for the elaboration of the general patterns as the primary contribution of this study.

The initial, linguistically oriented analysis of the user and case files from the German

67http://www.rda-jsc.org/

Federal Archives and the National Archives of Norway resulted in 762 single inquiries. The determination of thetype of questionshowed that 364 inquiries (48%), nearly half of the sample, were of the typeresource discoveryand 112 inquiries (15%) were of the typefact-finding. The remaining 286 inquiries (38%) were of the typenon-discovery. Inquiries of the typeresource discoveryandfact-findingconstitute the sample analyzed and interpreted further, in total 476 single inquiries (62%).

The next step in the analysis identified the referenced entities. The given entities are those referenced by propositions in the questions and the associated contextual information, and the wanted entities are those primarily and most recognizably referred to by the wanted archival or non-archival material, or by the wanted fact. They indicate the principal interest of the inquiry.

In other words, the wanted entities are things about which the user is seeking factual information or information objects referring to these, and the given entities are those which contextualize the wanted entities and render them identifiable.

The primary categories of given entities identified wereActorswith 609 occurrences, which equals 37% of all given entities, followed byTimeandEventswith 284 (17%) and 281 (17%

of) occurrences respectively. Documentswith 207 (13%) andPlaceswith 168 (10% of) counts came next.Other Entities; that is, identifiers and general contexts as well asThingsplayed only a minor role. The given entities have been further differentiated into particulars and types and will reappear as the basic building blocks of the general patterns.

The primary categories of the wanted entities wereActors, including persons and groups, amounting to 292 counts which equals 61% of all wanted entities. Next cameEvents /Activities, including unintentional events and activities, where unintentional events were near to insignific-ant with only 2 occurrences, compared to 120 activities which equal 25% of all winsignific-anted entities.

Documents with 43 (9%),Things with 12 (2.7%),General Topics with 4 (0.8%), andPlaceswith only 2 (0.4% of) occurrences were much less important.

The determination of the referenced entities made it possible to significantly reduce the complexity of the phenomena in the inquiries. Further, the wanted entities already indicate the perceived immediate ontological interest of the inquiry. In combination with the type of question they allow for an assessment of the general interest of the inquiries, where the type of question indicates whether the user is first and foremost seeking information objects or facts, and the wanted entity constitutes an approximation of what the inquiry is about in general as well as what the desired information objects and facts are about in particular.

However, the wanted entities constitute only a preliminary understanding of the users’

interests and need to be interpreted further in order to arrive at a more thorough understanding and explicit representation of archival user needs. On theontologicallevel, the inquiries refer to or areabouta specific historical context which may be expressed and described more or less implicitly or explicitly, in greater or lesser detail. The interpretative analysis of the inquiries renders this historical context explicit by formally describing its relevant entities in the AKM.

The interpretative and ontological analysis of the inquiries reveals threelevels of interest:

material fact,psychologicalandcollective statistical. These levels indicate how immediately the interest of an inquiry can be translated into an ontological model adequately representing the subject matter of their interest in the form of material facts. Most questions (399) are of the type material factwhich amounts to 84% of all inquiries. Only 53 (11%) are collective statistical and 24 (5%) are psychological. On the one hand, most inquiries seem to contain many questions already fit for appropriate representation, i.e. as material facts, and on the other hand one can expect to be able to translate and answer most of these questions more or less directly.

The continuing analysis further results in anontological core frameworkrepresenting the most dominant entity types and relationships in the domain of discourse of the analyzed inquiries. As a point of reference for the interpretation of the inquiries, the framework guided the ontological modelling process. The general patterns are derived from the framework by further extending and describing segments of the historical reality described. Dominant entities were information objects as well as plans and wills, acts and actors.

While ontologically modelling the general patterns, two fundamental kinds of historical contexts emerged. Theprovenance contextgenerally describes how things have been created or produced, who kept things, or to whom these things have been designated or sent. Within the scope of this study, these things are mostly textual information objects. Theaboutness context generally describes the historical reality to which these information objects relate in a descriptive or topical sense. Both the provenance context and the aboutness context are principle ontological categories which subsume several concrete general patterns. While the AKM as such comprises a range of interrelated entities (entity types and properties as well as scope notes) ageneral patternrepresents a particular aggregation of adequate entities from the AKM, describing typical contexts or situations, pertaining either to the provenance context or to the aboutness context.

Each general pattern therefore belongs to one of the two contexts.

In the case ofresource-discovery questionsinquiring about any kind of information object (material-finding, research question, consultation), specific types of information object (specific type), or a particular information object (specific item), it would appear reasonable to describe the context of provenance or the topical aboutness of the desired information object. However, in the case offact-finding questions, the fact may be found in either context. Since this study assumes a fundamental need for evidence for the requested factual information (IV:2.1), information objects potentially capable of providing such evidence are implicitly considered secondary objects of the interest of the question. Fact-finding questions are therefore always described with both contexts.

The next section will present the general patterns that have been identified based on the method previously discussed and the insights obtained thus far from the interpretative analysis of written natural language inquiries placed of archives. The discussion will begin with the general patterns belonging to the provenance context and then proceed to those from the aboutness context.

2 General Patterns

In this section, the general patterns will be introduced along with exemplifying query patterns.⁶⁸ The general patterns are the result of the deeper interpretative analysis of the inquiries and of the subject matter of their interests. The CRM has been used as a means to formalize and explicate these results in the form of ten general patterns, which together constitute an ontological model, the AKM. In the first sub-section, the general patterns from the provenance context will be discussed. The second sub-section will introduce the general patterns pertaining to the aboutness context.

Each general pattern will be introduced with an informal description of its scope and a diagram showing its ontological structure. Then each component of the general pattern will be discussed in more detail, followed by one or more query patterns providing additional and exemplary explanations.⁶⁹ Since query patterns always instantiate entities from two general pat-terns, one from the provenance context and one from the aboutness context, those query patterns that are provided first in the context of theProvenancepatterns will necessarily anticipate the components used from general patterns from the aboutness context, introduced in detail later.

However, the examples have been chosen in such a way to render them comprehensible without further explanation of the aboutness context. Further, as already mentioned, the query patterns are exemplary in that they typically indicate more than one potential query adequate to retrieve relevant information objects or facts. The examples are neither exhaustive nor comprehensive.

The presentation of each general pattern closes with a statistic on its occurrence within the sample and a summary of its semantics.

Im Dokument What is the Real Question? (Seite 134-140)