• Keine Ergebnisse gefunden

Ambiguous or border-line cases of ‘descriptive data’

4. Fundamental aspects of description models models

4.2. Context, recognition, and language

An essential prerequisite to enable the comparison of diverse objects is a generalized conceptual model (or “framework”), within which differences may be compared in a meaningful way. Such a model is essentially a language for object descriptions, defining the concepts and names of ob-ject parts, methods, properties, and property values. Unfortunately, a language general enough to be applicable to all objects is often not expressive enough to distinguish similar objects, at least not in an efficient way. This is easily overlooked, since humans experience the context-specific vocabularies (i.e. “linguistic registers”) that are required to communicate about everyday objects as a part of the general language.

For example, many context-specific terms will be used in the description of a car model, de-fining parts and their arrangements, functional concepts, complex properties and property values:

“steering wheel”, “automatic transmission”, “sunroof”, “metallic” paint, etc. Although most read-ers will undread-erstand these terms, it is easily imagined how, without knowledge of the “car vocabu-lary”, many terms could be misunderstood. Without an appropriate language, describing the com-position of objects may become difficult, if not impossible. It is not conceivable to start in a corner of a car and describe the properties of every part encountered in a linear sequence.

The names of object parts (also called “components” or “structural elements”) often imply more than position in a composition. A term like “steering wheel” will imply a general functional and structural description. Often even default (or “most common”) properties like shape and color are implied. The description of the steering wheel of a specific car model will then use the general concept as well as omit the most general properties being already inferred by the consum-er of the descriptions (e.g., “the brightly red steering wheel is shaped almost in the form a rect-angle …”, but not: “the steering wheel is circular and painted black”).

Similarly, property names and values may be context-specific. This may even apply to very general concepts like color, which may be overloaded with context-specific terminology like the proprietary marketing terms for color that car manufacturers use.

The problem of recognizing a part as belonging to a certain category (e.g., “rear mirror”,

“plant leaf”, “car antenna,” or “insect antenna”) is difficult to address formally. The relationship

between the composition of parts (presence and multiplicity) and object properties is one of reci-procal dependencies and consequently difficult to model (Figs. 8-9). Creating part definitions from general properties alone (color, shape, etc.) is often very difficult. Many properties will be specific to the context of the described part (e.g., “venation”, the arrangement of vascular bun-dles of the leaf), or conversely imply the presence for further child parts (e.g., “hairiness”). Simi-larly, the relation to neighboring or parent parts will often be part of the definition of a part (e.g.,

“sepals” as opposed to “bracts”).

This can best be studied when considering out-of-context identifications of components. For instance, during a criminal investigation it may be necessary to identify car parts without know-ing their context. A classification system of car parts that is truly general and may be applied by someone who has never previously studied the parts is severely constrained in the properties it can use. In general a much more successful method would be to have someone study all car parts in their context (placement within the car, function, manufacturer, and manufacturing period) and use the human associative memory. This memory could be supported by a classification system that is partly based on compositional context (e.g., “parts of light bulbs”) and supported by gen-eral properties (e.g., “metallic”). Clearly, knowing that a piece of glass is a part of the light bulb greatly simplifies the task of identifying the car model to which it may belong.

In biology, complete out-of-context identification of parts (e.g., air-borne spores or animal parts when studying predator diet) is rare. However, this is offset with a huge diversity of form and properties of life on earth. The missing context is often not the placement in a part composi-tion, but the placement in a taxonomic group (virus, bacteria, annelids, fungi, nematodes, etc.).

Independent terminologies have been developed for most taxonomic groups, sometimes introduc-ing superfluously synonymous concepts (similar to car colors), but most often introducintroduc-ing terms and concepts for object parts and properties that are necessary, efficient and convenient. Again, returning to the car example, a similar situation may be found in the criminal investigation: re-cognition that a machine part is from a racing car, a truck, or a building machine may be vital to solve the case.

Some other problems involved in the reciprocal dependencies of recognition and analysis are:

■ Individual variability: This problem is typically smaller for manufactured objects like cars than for biological object that may show high within-class variation (due to genetic polymor-phisms and phenological plasticity).

Object Part

Object Composition

Temporal stage

PropertyValues

Orientation &

Composition Initial ...

Recognition of properties Neighboring

parts

Rec ognition of ob

ject part

(Heuristic cycle)

Figure 8. Recognition of object parts is a cyclic identification pro-cess based on recognition of object properties, life-cycle stage, and composition of previ-ously recognized object parts.

Figure 9. Interpretation of the recognition of object parts as a heuristic cycle. Each recognition step is a hypothesis tested through evaluation of properties and information about neighboring parts. Ultimately, the general orientation (front/ back, top/bottom) and the object composition are resolved.

Additional steps may be required to recognize life stage or taxonomic context.

■ Aging and life-cycle dependency: A new brake may be as different from a worn-out one as a young plant from an old (“senescent”) one. Recognition of the life-cycle stage is an essential part of the heuristic process of recognizing parts and properties of objects (Fig.8).

■ Descriptions are strongly influenced by the methods used to observe properties. A property value without knowledge of the method used to obtain it may be of limited value (Fig.10).

The relation between properties, methods, and part-composition is a central problem when modeling descriptive data and will be discussed repeatedly in the following sections.

■ The taxonomic diversity leads to increased diversity of components, properties, and methods.

The greater the taxonomic diversity, the more difficult it becomes to establish a common framework within which comparisons remain meaningful (Fig.10).

depends on

Taxonomic Diversity

depends on depends on depends on

Which property is observed?

Examples: presence, shape, color

Which part is observed?

Examples: head, hair, pigment

How is it observed (method)?

Examples: unaided eye, hand-lens, microscope, growth conditions

depends on

depends on depends on

Figure 10. Additional dependencies on observation methodology and taxonomic diversity, exten-ding Figs. 8-9. The diversity of methods, properties, and components (i.e. structural diversity) is increased with taxonomic diversity.

One solution to this problem is to choose analytical methods that are as context-independent as possible. In the car-part-example, if small machine parts have to be identified out-of-context, forensic practice may resort to expensive, but highly general and feature-rich properties like iso-tope composition to enable out-of-context identification. Similarly, biological identification uses more and more context-free identification methods based on molecular patterns or sequences (including “DNA bar-coding”). Although still comparatively expensive, these methods typically provide sufficient information to obtain an identification result in a single step, are usually appli-cable to a large taxonomic domain, and are usually independent of the compositional complexity of biological objects (the latter applying to DNA-based, but often not to RNA- or protein-based methods).

The examples above illustrate how much associative thinking biologists normally use when they, for example, generalize strongly different structures as a “leaf” – while recognizing other leaf-like structures as “stipules” or “cladodes”. Language can incorporate many of these intui-tions, and descriptions addressing humans will be able to make use of these. However, the spe-cialized language used by taxonomists is often specific for a given taxonomic group and depends on intensive training and experience. With an increasing diversity of objects, and with an in-creased diversity of expertise of the users, finding an appropriate common description language becomes more and more difficult.

No complete solution is known to the problems of descriptive data, i.e., combining the desir-able properties of allowing associative thinking while retaining analytical properties of data sets and adequate manageability by biologists. Although it is probably possible to build a fully axio-matic descriptive terminology for biology, where each term is defined from first principles or based on other terms previously described, the author knows of no such attempt. Also, at least for the purpose of identification, the value of such an enterprise is questionable. Even many

“nor-mal” biological definitions that require anatomical, ultrastructural or other properties that are very difficult or expensive to observe, may, in the practice of biological identification, be re-placed by terms appealing to intuition and associative memory.

The following sections discuss the advantages and trade-offs of various description models.

The first section introduces the most traditional and perhaps intuitive method of describing biolo-gical objects by free-form text.