Object Parts (Structures) - Structuring Descriptive Data of Organisms

Property/Method/Data type

Figure 39. A sparsely filled matrix based on 40 object-parts (top)×25 observed properties (left).

200 out of the 1000 potential combinations selected for data entry are shown as a black circle.

Figure 40. In the SDD model (left), views (for data entry, report generation, sorting, etc.) may be added as another concept tree (right side of left diagram). In the Prometheus model (right), a se-parate selection mechanism (called “pro-forma”) is used multiple times (represented here through differently shaded dots, and combined into a single diagram). To simplify the illustration, only non-overlapping views that are congruent with part and property hierarchies are shown. How-ever, this is not a constraint of the model.

Table 38. Comparison of the character decomposition and concept hierarchy models.

Issue

Decomposition model

(e.g., Prometheus description model)

Concept hierarchy+character model (e.g., similar to SDD)

One hierarchy is central to the model and the basis for data storage or retrieval. This hierarchy must be centrally managed and accepted by all participants in a federation. Additional secondary hierarchies based on different data structures may be added; only these can be federated.

All hierarchies are symmetric using the same structures. None is relevant for data storage. In a federated model, different participants may choose different hierarchies.

Multiple property hierarchies

Similar to part-hierarchy (but in current Prome-theus probably flat rather than hierarchical)

As above; all concept hierarchies function in the same manner.

Multiple method hierarchies

Not supported in Prometheus, but extending the decomposition model with a further dimension is possible.

As above; all concept hierarchies function in the same manner.

Multiple views Based on a selection process defining a subset of matrix cells. May be federated.

Based on a concept tree, defining a subset of characters. May be federated.

Creating a new terminology

Well-defined part and property hierarchies need to be set up before starting data entry. For well understood groups this may be done centrally in funded projects; for small groups this may be-come a problem.

An “ad-hoc mode” is explicitly supported: char-acters may be introduced by defining a type and labels, without including them into any organizing concept hierarchies.

Adding new

“characters”

Not required. Any matrix cell may be used for data storage. Federation only depends on the defining part, property, and property value hier-archies.

This is a separate process. Creating a character creates an ID used when storing of retrieving data and requires some metadata (type, a simple label). IDs must be managed globally, but when using GUIDs federations may independently add new characters.

Adding new

concepts In the defining part and property hierarchies this may require central management. Similar to characters, GUIDs might allow federations to add new parts and properties independently.

Uncritical, for coded data only local to a single hierarchy. However, concepts may be used for natural language markup as well, which creates a similar situation to Prometheus and is man-aged through GUIDs.

Revising semantics of concepts

Very problematic. In a federated situation any matrix cell may contain data. Changing the con-cept for a part would require identifying and re-viewing all existing data (for all properties/ meth-ods) in all databases.

Uncritical, data do not depend on the concepts.

The analog in SDD is the revision of a character, which would, however, involve only a single char-acter at a time. On the other hand, in combina-tion with the support of the ad-hoc definicombina-tion of characters, poorly defined characters that have to be revised are much more likely.

Extracting ontological information

Presumably relatively easy since the model is based on explicit ontological concepts

Possible, but less reliable. SDD introduces some mechanisms that let authors express that a hier-archy may be read as ontological information.

Full ontological information is expected to be associated with the concept definitions, rather than with concept usage in the concept trees.

Supporting rela-tional characters (p.122)

Special support exists for part, but not two-property relations. Properties depending on two parts must be defined asymmetrically.

Characters may be associated with multiple property- as well part-concepts. Relations are always symmetrical.

Both the “character decomposition” and the “character+concept hierarchy” models have ad-vantages and disadad-vantages, some of which are compared in Table 38. A major disadvantage of the “character+concept hierarchy” model is the lack of methods to express more than a general association between concept and character. For example, a (constructed and hypothetical!) char-acter: “ratio of profemur length to distance from front to middle leg insertion at the body” could be associated with:

The complexity of this example cannot be adequately solved in decomposition models either (compare Table 38, last row), but for simpler examples with exactly one property and two parts the decomposition model may be less ambiguous.

In general, a number of reasons exist why the association between a character and parts (or-gans, morphological structures) may be ambiguous:

■ If an object part is dominant, its properties tend to be viewed as properties of the entire organ-ism. Examples: the color of fern leaves or the fruiting body of a fungus is generally considered the color of the entire fern or fungus, even though the fern roots or the fungal mycelium are usually differently colored.

■ If the border between object parts is not easily perceived (though perhaps well-defined in theory). Example: the hypocotyl of a plant may be considered part of the “root” as well as part of “shoot”.

■ If object parts are distinguished morphologically, but evolutionary forces generated conver-gent appearance. Examples: cladodes (or “cladophylls”, flattened stems resembling and func-tioning as a leaf) or rhizomes (stems resembling and funcfunc-tioning as roots). Even trained botan-ists, lacking the time to do adequate developmental or anatomical studies, may misinterpret such concepts. Authors of the data set may decide to include characters in a place where users expect them, rather than where they properly belong. Although this problem can be solved by distinguishing between part-of and kind-of relations, the data offer no validation that the con-tent author understood or cared about these problems.

■ If the compositional hierarchy allows multiple logical arrangements. Example: A “pedicel”

(i.e., stalk bearing a single flower) may with equally good reasons be treated as “child of flower” or as “sibling of flower” and “child of inflorescence”.

■ Finally, and highly relevant to the problem of identification, the relation of an observable character and a body structure may only be observable with great difficulty in the field. Parts of legs of small insects may be colorful and form a good field characteristic, but detecting which part of the leg exactly has which color may require a stereomicroscope. Similarly, in many sitting or swimming birds the exact relation between coloration and tail versus wing feathers hardly matters and is difficult to ascertain.

In principle these problems are all solvable, but in practice one data set may be optimized for phylogenetic analysis and another for routine identifications, leading to a certain ambiguity when attempting to retrieve ontological information from concept trees that are superimposed on a character list.

Whether the character decomposition is more or less amenable to handle these cases is not immediately clear. The simple model (Table 32 or Fig.39) has similar problems (e.g., providing exact information about the part/property relations in the ratio example requires substantially more complex data structures in the character decomposition model as well, see p.117).

Character-plus-concept hierarchy models often fulfill the same requirements as character decom-position models. As an example, the feature of Prometheus to directly score character states for an object part (compare p.119) shall be discussed. In Prometheus, after selecting an object component, a state may be scored from the full list of all applicable states. Provided that the list of states is not overly long, this kind of scoring come close to natural language descriptions, where the character or property is often implied in the term (compare the example on p.39). In Prometheus, the property is not stored, but implied through the relationship defined between states and categorical properties (“state groups”) in the terminology.

If a non-decomposition model is supported by a compositional concept hierarchy (e.g., as in SDD), the same user interface may be generated by browsing through the compositional concept hierarchy, and at a given node request a list of all applicable characters and states. It is now a question of the implementation, whether all characters applicable to this node and all subnodes, or only for the current node are displayed (the latter is possible desirable). Similarly, the imple-mentation may display the categorical states organized by character, organized by another con-cept (e.g. property), or directly in a sorted list. Selecting a state from the sorted list works identi-cal in the user interface to the Prometheus model. Only the data are stored by retrieving the

char-acter-variable associated with the state (implied information), and storing the state as character data for the current description.

An interesting aspect of multiple concept hierarchies is that combined hierarchies may be created algorithmically, thus often removing the need to also create “operational” hierarchies for practi-cal purposes (see Figs. 41-42).

Plant parts

Figure 41. Two concept hierarchies (compositional and a methodological) associated with a flat character list (represented by character IDs in square brackets). Compare Fig.42 for further information.

Figure 42. Multiple concept hierarchies (from Fig.41) may be combined, providing additional information wherever the primary hierarchy contains multiple characters at a concept.

126. Concept hierarchies that are superimposed on a flat list of character may be a desirable alternative to strict character decomposition models.

127. The combination of concept hierarchies with a flat character list is desirable when the sup-port of existing (“legacy”) data is a requirement. Concept hierarchies may be modeled as an optional part of the information model, whereas strict character decomposition models require decomposition information to be available to handle descriptive information. Con-cept hierarchies provide a large amount of the organizational and semantic advantages of character decomposition models without breaking compatibility with existing data.

128. Multiple concept hierarchies are desirable to express – in addition to object-part and prop-erty classification – also aspects of methodology, instrumentation, or simply arbitrary character subsets/filters.

4.12. Descriptive ontologies

The choice between “character decomposition” and “character plus concept hierarchy” discussed above is highly relevant to data exchange models like SDD. Although both models are equivalent with respect to many requirements, they introduce strongly divergent data representation models.

This makes future migration of data difficult and applications may require a redesign rather than evolutionary development. A sound basis for the decision is therefore necessary (compare Table 38, p.128). To a large extent, the answer depends on whether well-defined and stable concept hierarchies (semantic ontologies) for object parts (structures), properties, and methods or instru-mentation, and development points, applicable to all organism groups from mammals to viruses, can be developed in the coming years. The following sections therefore discuss the problems the author sees in this respect. Many of the problems presented can be solved in principle, but often no immediate solution is known or has been tried. Work on the problems presented would be highly valuable. The goal of enlisting difficult cases without necessarily proposing solutions is to prepare a foundation for future decisions, helping information model designers in achieving a balance between abstraction (which at some level may hinder communication about a problem), complexity and functionality of a model being evaluated.

The concepts discussed here largely have analogs in explicit ontology languages such as OWL (McGuinness & van Harmelen 2004). However, because of the desire to continue to use UML to illustrate the examples, the following discussion will largely use the terms and diagrams used in software development. The UML diagrams are to be read as illustrations of a problem, not as pro-posals for a descriptive information model. Most diagrams are drawn under a naïve perspective which would result in a non-generalized software model applicable only to one taxonomic group (compare section “Level of abstraction of descriptive information models”, p.42).

Im Dokument Structuring Descriptive Data of Organisms — Requirement Analysis and Information Models (Seite 127-131)