Basic property types - Structuring Descriptive Data of Organisms

In the context of the Nemisys/Genisys model, Diederich, Fortuner and Milton developed a spe-cial classification system for quantitative and categorical data called “basic properties” (Diede-rich & al. 1997, 1998, Table 9). The system primarily covers morpho-anatomical descriptive data. The authors do not use the term “type” or “data type” for the “basic properties” they define.

Instead, basic properties are developed as part of a character decomposition system (see “The Nemisys/Genisys model”, p.117). They do point out, however, that some fundamental semantics of basic properties must be further defined through a “default range” with the values “binary, dis-crete, continuous” and a measurement scale with the supported values “nominal, ordinal, interval, ratio” (Diederich & al. 1997).

Table 9. “Basic morpho-anatomical properties” proposed in Diederich & al. (1997, 1998; identi-cally). A “*” indicates properties expressing a relation between two structures.

Appearance Posture Shape Kind Texture Arrangement Symmetry Placement/Location

Position-relative-to * Distance-to * Orientation Angle

Dimension Length Height Width Diameter Depth Ratio-of * Size Quantity

Presence Quantity Number

(Differences in Diederich (1997) are: “Presence” is classified in Appearance rather than Quantity; “Posture” is missing;

“Color” is accepted as a basic property within Appearance – rather than presumably being subsumed in “Kind” as shown above).

Identifying type-like concepts on a higher level than data-analysis has several advantages in that semantics and processing rules may be defined closer to the descriptive language (with less ab-straction) and in a reusable way. On the other hand, the list above is incomplete, and some points may be argued, e.g.:

■ “Ratio-of” is not a basic property acting on an object part (structure), since it must be com-bined with a basic property like size, length, width. It is a relation between two parts plus a property, rather than a single property acting on two parts. Diederich & al. (1997) mention this (guideline 5), but they do not propose a solution to handle this in an information model. The example shows that the simplicity of basic properties as a short, flat list is somewhat decep-tive.

■ In addition to an absolute “Angle” (presumably relative to the earth surface, in normal living conditions), a relational expression “Angle *” between two structures seems to be missing.

■ Why is “Presence” not a special case of “Quantity=0”?

■ Why are certain quantitative measurements recognized under “Dimension” whereas others (weight, temperature, speed, conductivity, etc.) are subsumed under “Number”?

■ Why is diameter (of circle) recognized as a special property, uniquely different from length, height, or width, but parameters of other shapes (e.g., “eccentricity of an ellipse”) not?

■ Why is depth recognized, but not thickness?

■ Presence and Quantity should be relational properties (“Presence-at *”, defaulting to the entire organism). They express a part-of relation (optionally with multiplicity) between two object parts (structures). In many cases it does not matter whether a specific structure is the named or not (e.g., “eyes present” → two insect eyes are part of the insect head, “insect wing count = 4”

→ 4 wings are part-of thorax), but in other cases this does matter (e.g., “three bars on hind-wing”, “bristles at tip of antennae”).

Most notable, as discussed in Diederich & al. (1997), all properties not considered of primary im-portance are subsumed under catch-all “generic properties”:

■ unassigned categorical properties under “Kind”, and

■ unassigned quantitative properties under “Quantity” or “Number”.

Table 10. A modified basic-property-concept extended with a generalization hierarchy (example).

Appearance Shape

Two-dimensional (perhaps: open / closed?) Three-dimensional (perhaps: open / closed?) Symmetry

color space values (sRGB, HSL, etc.) Pigmentation

Color granularity Taste/Smell

Five basic taste sensations:

acid, salty, sweet, bitter, umami

smell (human taste impressions are a combination of these and smell sensations)

Cardinality/Multiplicity

against absolute orientation Angle-to *

Ratio (including, e.g., Density) is excluded because it has to be considered on a higher level. A mapping exists between

“count” and “presence” that may allow to calculate one property from the other: three legs → legs present, legs present → at least one leg, one leg → legs absent, legs absent → zero legs. – No special property of Boolean type is considered:

Coding “winged/wingless” is not different from “wings present: true/false”, and any property with exactly two states (set of two things) are equivalent to Boolean. Sets of only two states or things do have special properties in calculations, but software can deduce this without a need to introduce a Boolean data or property type.

Which properties are considered more important, depends to a substantial degree on the orga-nism group and the methodology used to describe orgaorga-nisms. It seems problematic to embed such decisions into the general model. For example, while some basic properties already provide reus-able sets (enumerations) of states (i.e. categorical values), no reuse is possible for properties sub-sumed under “kind” – although reuse would be desirable. An example for a property hierarchy that is richer than Diederich's (as in Table 9) is shown in Table 10. This is, however, only another selection of preferred properties, informed by a specific point of view. Other views would lead to a different selection.

Diederich & al. (1998) argue that the concept of basic properties may be extended to cover physiological data, but doing so requires the introduction of abstract concepts (such as “resting period”) in the place of structures. These pseudostructures then may have properties like pres-ence, duration, etc. However, it seems that much of the clarity and reusability that is present with morphological data is lost by doing so. Most notably, it is not possible to have these pseudo-structures refer to specific morphological pseudo-structures. The approach offers a way to occasionally include physiological data, but it appears a fix rather than a general solution.

One consequence of these problems is that basic properties will probably be under revision for an extended period of time. Designers of information models would be wise to provide a general-ization mechanism rather than limiting themselves to those basic properties shown above. To the present author’s knowledge no information model for basic properties has been published yet that would achieve both the desired generalization and specificity of basic property types. Further studies are necessary to decide whether “basic properties” indeed are fundamental enough to just-ify the development of such a model.

An interesting example for the kind of problems encountered with basic properties is that even the seemingly intuitive example of “shape” as a basic property seems to be full of pitfalls. Going back to the example in Fig.15 (p.45) it is obvious that shape is a typology to classify objects.

“Shape” in Fig.15 uses categories that combine more fundamental attributes or properties (num-ber of corners, symmetry, num(num-ber and length of linear sections, angles, etc.). Thus shape can also be viewed as a “summary character” in the sense of Diederich (1997) and Diederich & al. (1997),

which stands in contrast to their view of shape as a basic and atomic property. Nevertheless, the shape typology is indeed in most scenarios a very basic, useful element in descriptions. A combi-nation of the constituent properties/attributes is difficult to recognize and visualize, whereas the shape type names are easily recognized and remembered by humans. However, the same argu-ment applies to most typologies that under the basic property concept should be treated as sum-mary characters and split into more atomic characters.

The fundamental question is therefore often how much information to include in a complex type, and where to split it into multiple elements. For example, one might want to define equal-sided triangle or right-angled triangle as additional types. Conversely, the definition of a square with rounded corners as a type will become questionable as soon as triangles, hexagons, etc. with rounded corners are to be described. Either one might extract “rounded corners” as an additional property from the complex “shape type”, or one may want to model an additional substructure

“corner” (present only for some shape type values!) with a property “rounded”. These are just examples of the kind of terminological instability and inconsistency that Diederich and cowork-ers seek to prevent.

Furthermore, shape values are a complicated mixture of singular (“distinct”, like triangle, four-sided polygon, pentangle, etc.) and extended categories (i.e., shape instances may be inter-mediate between categories). Fig.17f on p.54 (with some categories connected but others dis-tinct) has been drawn with the example of “shape” in mind. Thus, in addition to the question of decomposition of shape types into properties, also the definition of the resulting categorical val-ues will be under discussion. For example, the ellipse shown in the second row of Fig.15 (p.45) is a very weak ellipse that may well be called “nearly circular”. Since in biology exact shapes al-most never occur, it would be natural to a biologist to do so (calling it “subglobose” in biological terminology). A very similar problem occurs when attempting to separate the aspects of rounded corners from the shape typology, since an infinite number of intermediate shapes exists between a square with rounded corners and a circle (Fig.18).

Figure 18. Squares with rounded corners form a continuum of intermediate shapes between a square and a circle.

The probable consequence of the problems discussed here is that the information model should expect terminological change and should not (as the basic property model seems to do) aim to reach stability by fixing a set of “basic” properties as special and unchangeable. Further, although it is desirable that different data sets use the same terminology, it is probably more im-portant to focus on comparability. One way to do this is to make the assumptions behind the terminology readily available to machine-reasoning. An attractive model may be to treat values in complex characters (type-value, such as shape) as objects that again may have descriptions in terminologies using more completely atomized properties. However, many alternatives are con-ceivable (e.g., multiple class inheritance: a triangle with rounded corners being both an “object with rounded corners” and a “polygon with three sides”). A more detailed analysis of actual data storage models follows in a later section (“Description storage models”, p.104).

31. “Basic properties” according to the Nemisys/Genisys model are a derived type system opti-mized for morpho-anatomical data. The simple system of 20 basic properties with one level of hierarchy offers pragmatic guidance for structuring such data, but is incomplete and not suitable as a general information model for descriptive data. The selection of quantitative measures not subsumed under “Quantity” or “Number”, and the selection of categorical properties not subsumed under “Kind” may be pragmatic for common morpho-anatomical data but is not essential.

32. A property classification, preferably with more than one level of hierarchy, is desirable to structure descriptive data. The model should be able to cope with different generalization hierarchies of properties, rather than fixing these.

33. A generalized “property type” system is conceivable, but will be complex and may be ex-pected to be under considerable terminological evolution for an extended period. The in-formation model must be able to support property inin-formation in a way that does not affect existing applications relying on the information model.

34. The information model should provide means to make description based on different termi-nologies (using different property choices, different level of decomposition of value types, such as shape) comparable using machine-reasoning.

4.6. Mapping between data types

As already mentioned, the same fundamental biological phenomenon can often be measured in various ways, resulting in data on different measurement scales (p.49). Color may be measured quantitatively (spectrographically or as color model values, see p.59), or categorically by com-paring it with the fine categories of a color standard, or by referring to common “fuzzy” vernacu-lar color names. Clearly these data are related and it is desirable that the information model pro-vides mechanism to make relations explicit. The following sections discuss various cases.

Im Dokument Structuring Descriptive Data of Organisms — Requirement Analysis and Information Models (Seite 62-66)