Generalization of object parts (compositional concepts)

Composition versus generalization

In parallel to the object compositions discussed so far, a generalization hierarchy exists for the concepts referring to the physical parts of objects. Whereas composition hierarchies are express-ed through “has-a/is-part-of” (or “contains/contained”) relationships, generalization hierarchies define “is-kind-of” or “is-type-of” relationships. In information modeling, this is generally called a classification or generalization/specialization relationship. Another term is “typological hierar-chy”.

Generalization and composition hierarchies are often confused. For example, the taxonomic hierarchy of non-extinct (recent) organisms on earth is a generalization hierarchy (whether it is phylogenetic or based on arbitrarily selected generalizations). A species concept is a refinement of the more general genus concept, or the genus is a generalization of all species it contains.

However, when illustrating the taxonomy, a tree will be drawn where genus and species are ar-ranged in space (as a tree, as headings in a text, or as nested boxes). These visualizations may lead to the erroneous intuition that the species is a “part-of” the genus “container”.

An abstract geometrical example and associated UML class model for object generalization and composition is shown in Fig.70. The composition and the generalization hierarchy are in-dependent. Biological Examples of objects that occur multiple times on various parts of indivi-duals are hairs, finger and toe nails, or various kinds of spores or leaves (e.g., Fig.52, p.137).

Often more relevant are, however, the cases where related parts have different names and some-what different definitions in different taxonomic classes. These parts remain comparable (e.g., for phylogenetic analysis or identification) only through generalizations.

ComplexObject Cross

SymmetricCross BroadCross

HatchedCircle Circle

2 2

1 1

1 1 1

Rectangle SymmetricCrossWithCircles BroadCrossWithCircles

Figure 70. UML class diagram showing an object generalization (white triangles, top to bottom) and composition (black diamonds, bottom to top) model for the geometric object shown in the note shape.

Whether a hierarchy is a composition or a generalization may be ambiguous. An example is the anatomical composition hierarchy of the major organ systems of the human body (Fig.71).

The parts are distributed through various parts of the body, but because they are generally con-nected to each other it is plausible to assume that they form a composition (part-of relations). On closer examination, however, not all bones or muscles of the musculoskeletal system are directly connected, making it dependent on other parts not in the system. In the case of the endocrine sys-tem, the parts are not only often disconnected, but also belong to other organ systems: testicles (reproductive system) produce testosterone, kidneys (excretory system) produce renin, and the hypothalamus (nervous system) produces corticotropin-releasing hormone (CRH) and many

other hormones. Clearly the endocrine system is not appropriately described as a composition, where each part may be member of only a single composite object. It may perhaps be modeled as a UML aggregation (this would be depicted with a white diamond instead of the black composi-tion diamond). Alternatively, a generalizacomposi-tion hierarchy may be more appropriate: skin is a kind of integumentary system, liver is a kind of endocrine system (Fig.71 right). However, the con-cept of “system” strongly suggests composition; “shoulder bone is a kind of musculoskeletal sys-tem” seems the wrong perspective. It may be desirable to create a mixture of the interpretations shown in the left and right side of Fig.71, using composition for all unambiguously assigned elements, aggregation for parts belonging to multiple objects, and generalization for the endo-crine system.

Figure 71. UML class diagrams showing the nine major organ systems of the human body with examples of organs for each system. The left diagram interprets the organ systems as an anato-mical composition hierarchy. The right diagram shows primarily a generalization with a more general composition added. Note: multiplicities are omitted from this and some of the following diagrams.

In some cases it seems that composition relations (part-of) may also have a generalization (kind-of) quality. For example, in the case of a surface hierarchy (upper/lower surface, margin, Fig.72), both “margin is part of the surface” and “margin is a kind of surface” make sense.

Fig.73 adds a specific leaf context to the surface terms (as subclasses of the general terms). It is now no longer entirely clear, whether the specific leaf concepts inherit their compositional quali-ties from the more general classes, or whether the general classes have two implicit concepts for which the same names are used.

Surface

UpperSurface Margin LowerSurface

FlatOrientedObject

Figure 72. UML class diagram modeling the surface (e.g., of a plant leaf). Surfaces of flat ori-ented objects can be decomposed into a lower, upper, and marginal surface (composition, black diamond). Independently, each part is also a kind of surface (generalization, white triangle).

Surface

Figure 73. UML class diagram adding a leaf context to the surface abstraction shown in Fig.72.

159. Generalization and composition are distinct forms of relations that have different proper-ties and lead to different conclusions. For physical objects (parts of the described objects) the information model must support both a composition and generalization hierarchy.

Generalization concepts for object parts

Similar to the multiple composition hierarchies discussed above, multiple generalization concepts exist. Examples are morphological (e.g., “arm and leg to extremities”) and histological tissue types (e.g., epithelial tissues, muscle tissues, nerve tissues, and connective tissues). The follow-ing generalization concepts are of special interest in descriptive information models for biology:

■ phylogenetic homology,

■ functional similarity,

■ appearance or morphological similarity (things look similar),

■ compositional similarity (things consist of similar parts down to chemical compounds).

Generalizations may combine several of these aspects (Table 42); it is unclear how this can be expressed.

Table 42. Examples from biology for generalizations of object parts under three different general-ization concepts.

Object Phylogenetic origin:

part¹: Function: Identical (homologous) Not identical (analogous or heterologous) Similar pairing chromosomes; basal leaf and stem

leaf → leaf, conidia of hyphomycetes and urediniospores of rusts → mitospores;

different pheromones of sister species (with minor chemical modification)

conidia (“mitospores”) and meiospores → diaspores; leaf and phylloclade → leaf-like structure; mimicry systems like warning coloration (Müllerian or Batesian mimicry)

| Similar

Dissimilar mandibles of normal beetles versus stag beetles (Lucanus cervus) → eating versus sexual attraction; tail feathers of pheasant versus male peacock → flying versus sexual attraction (function often changes morphology beyond immediate similarity, so these cases are rare)

plant twigs and walking sticks (Phasmida, using camouflage)

Similar human and elephant toenails; leaves of woodruff (Galium verum, needle-like) and of horse chestnut (large palmate)

eyes of mammals or snails; wings of birds, bat and insects; beak of woodpeckers and elongated third finger of aye-aye

(Daubentonia madagascariensis, a squirrel-like primate), both gathering wood-boring insects; spines (derived from leaves) and prickles/thorns (derived from stem tissue)

| Dissimilar

| Dissimilar Photosynthetic leaves and scales on a rhi-zome; flipper of seals and human arm.

Middle ear of mammals: malleus (ham-mer), incus (anvil), and stapes (stirrup)

(pairs without generalization: perhaps animal eye and plant root)

1 Compositional or structural similarity = morphological, anatomical, or chemical composition.

The phylogenetic homology of object parts is a central aspect of many evolutionary or phylo-genetic studies. Identifying this homology is necessary both to infer a phylogeny based on de-scriptive data specific to parts (mostly morpho-anatomical data, but also organ-specific protein expression or gene regulation patterns), and to reconstruct individual characteristics based on a known phylogeny (based on other, e.g., molecular characters). In many cases phylogenetic ho-mology is already embedded in common composition (and property) concepts in biology. How-ever, the information model for descriptive data should not be restricted to data fulfilling homol-ogy assumptions. For example, although data expressed in NEXUS (which was developed for phylogenetic purposes) will usually contain homologous characters, NEXUS is also used in the context of Linnaeus II identifications, where this assumption does not hold.

Functional similarity generalizations without respect to homology are of interest, for example, when studying the geographical or ecological distribution of functional characteristics, when studying correlations of functions in organism communities, or when studying the evolution of functional characteristics by mapping them to a known phylogeny. Examples of functional gen-eralizations used in the study of DNA sequences are: (a) transcribed or non-transcribed; (b) pro-tein-coding, rRNA-coding, or non coding; (c) intron/insert, or exon; (d) conserved, variable, or hypervariable; (e) structural or regulatory; or (f) monomorphic or polymorphic (i.e., single or multiple alleles in population). Most of these classifications overlap considerably, and preference for a certain classification depends strongly on the purpose of the user of the data.

In some cases, the question whether a generalization is a homology or not, may even be debat-able and depend on the perspective. For example, the genes within gene families (e.g., myoglo-bin, α- and β-hemoglobins belong to the globin gene family) are homologous insofar as they are assumed to be derived from a common ancestor gene within the genome, and the homology as-sumption is meaningful for the purpose of gene-phylogenies. However, for the purpose of phylo-genetic inference of the organism, the different members of a gene family are not homologous.

Appearance and compositional similarity are the most important aspects for identification.

This is only partly about achieving error tolerance by generalizing object parts likely to be con-fused during identification.

The difference between appearance and compositional similarity, as proposed here, is fre-quently blurred because the aspects often occur together. However, for example, color may be a result of dissimilar structures (object compositions) such as pigments or structures causing the creation of “physical colors” by means of interference (the colors in the wings of certain dragon-flies or butterdragon-flies like Morpho are the result of the same effect that colors thin oil films swim-ming on water). Conversely, compositional similarity may be difficult to judge under the aspect of morpho-anatomical appearance. In molecular biology compositional similarity is the basis of the design of PCR primers or oligonucleotide probes for DNA microarrays. Both methods depend on the similarity of nucleotide sequences, not on their homology.

In contrast to composition hierarchies, generalization hierarchies can, in principle, always be joined into a single directed acyclic graph. However, when forcing the hierarchy to be a tree rather than a directed acyclic graph, information may be lost. Generalization trees can usually only be joined at the top, whereas in reality more direct generalizations would be possible.

160. Multiple generalization perspectives exist (e.g., phylogenetic, functional, morphological similarity, or compositional similarity) and must be supported in the model.

161. If generalization hierarchies support directed acyclic graphs, a single graph may incorpo-rate the generalization hierarchies for multiple perspectives. However, for the clarity of expressions clearly labeled separate graphs may be preferable.

Problems with specialized, context-dependent names for object parts Specialization is the opposite of generalization and after discussing the latter it may appear re-dundant to discuss problems of specialized part names. However, as already introduced earlier, one of the most fundamental problems of managing descriptive data is the reciprocal dependency between recognition of object properties and parts (compare Figs. 8-9, p.37). In the context of object identification therefore an important difference exists between

■ parts that can be recognized (arms and legs) and for which also one or several generalization perspectives exist, and

■ parts that are initially recognized on a generalized level, but specific names or concepts are normally used in descriptions.

Descriptive terminology often uses terms for object parts that encapsulate knowledge that is dif-ficult or impossible to obtain during routine identifications of biological organisms. In most cases the knowledge is not customarily obtained in a prescribed analytical process, but supplied in retrospective, after the organism itself has been recognized (“post recognition problem”). Until then, the specific terms for organism parts are used in the mind of the identifying person as “po-tential terms” that are used operationally only on a generalized level, i.e., only the properties of the generalized terms are used when testing concept hypotheses. The following situations may be distinguished:

■ The correct term for an object part (or “structure”) depends on studying properties that are difficult to observe, either in the part itself (e.g., anatomical properties), or in other parts of the organism. Examples:

□ Spines, prickles, thorns (see Table 39, p.136), or cladodes (i.e., phylloclades) and leaves are morphological concepts that may require anatomical or developmental studies for dif-ferentiation.

□ The difference between a leaf and a leaflet is recognized after the compound nature of the leaf is recognized first. In plants without stipules and axillary buds, this distinction often poses a substantial problem for the general public when identifying a plant.

□ A name is occasionally modified if multiple similar structures occur on the same organism.

For example, asexual spores in fungi are normally called “conidia”, but are called “micro-conidia”, “meso“micro-conidia”, and “macroconidia” if two or three kinds of conidia of strongly different size are produced. This can only be recognized if both conidial types are produced concurrently (which is not necessarily the case) or, more commonly, if the species is al-ready tentatively pre-identified to genus level and the existence of multiple spore types in that genus is known.

■ Highly similar object parts may have different names in different generations of a life cycle or sexual stage. Examples:

□ In most fungal groups sexual and asexual spores (conidia) are named differently. Ascomy-cetes often have multiple spore types. The spores resulting from the sexual process are called “ascospores”, the asexual (“vegetative”) propagules are variously called “conidia”,

“conidiospores”, or “mitospores”. Sometimes a third class of spores, the “spermatia”, can be observed that are assumed to transfer the haploid genome to another individual for the purpose of mating. Occasionally, a single species may even have “synanamorphs”: multi-ple, differently shape asexual propagules, often generated in different conidiomata. These object parts can be classified under different aspects (Table 43).

□ Probably, the “world record” in stage-specific naming is held by the rust fungi (Uredinales, a group of obligate plant pathogens). According to their developmental stages, spermatia, aeciospores, urediniospores, teliospores, and basidiospores are distinguished. A secondary problem is that synonymous names are in current use (e.g., spermatia/pycno-/pycniospores /spore state 0, aeciospores/spore state I, uredinio-/uredospores/spore state II, telio-/teleu-tospores/spore state III, and basidiospores/spore state IV). As long as they use comparable concepts, they can be easily synonymized and standardized (e.g., following Kirk & al.

2001 as above); the concepts may, however, differ in complex ways (M.Scholler, pers.

comm.). The principle problem of different spore states, however, expresses biological knowledge. Only some of these spores types are relatively easily distinguished morpholo-gically (spermatia,basidiospores); othersmaybe difficult to distinguish prior to identifica-tion (especially aeciospores vs. urediniospores). Interestingly, this problem can in practice often be avoided by using another character first: Many rust fungi form the different life cycle stages on different host plants. While this knowledge canbeincluded intoan author-ed branching key, it is difficult to express this in a general information model, allowing machines to recognize and handle these and similar situations of context-dependency.

□ In some groups of heterobasidiomycetes producing repetitive ballistospores, sexual and asexual spores are fully identical and can be distinguished only by observing their origin.

■ The name of object parts may be specific to a higher taxonomic group. Examples:

□ The sexual spores are called ascospores, basidiospores, and zygospores in Ascomycetes, Basidiomycetes, and Zygomycetes, respectively. Mycologists will readily recognize an ascospore if it is still embedded in an ascus (a typical feature only found in ascomycetes) and a basidiospore if it is connected to a typical holo- or phragmobasidium.

□ Insect larvae are called caterpillars in butterflies, larvae in other groups.

□ The term haltera is specific for the reduced hind-wings of Diptera. (Note: The problem of taxon-dependent part names is a general one. The relevant question in the context of the present discussion is whether objects with different names can be recognized in an identifi-cation context or not. For example, the difference between a silique of Brassicaceae and a pod of Fabaceae has compositional foundations that can be studied even in the field.) These three situations may be summarized as a character, life cycle, and taxonomy dependency of the correct name for the object part. They require studying the entire organism to increasing depths: neighboring parts, recognizing life cycle stage, and recognizing the taxonomic group. (A related problem that often leads to similar problems in identification is that the correct name for an object part may depend on perspective and customs; compare “Competing classifications of object parts”, p.136).

It would in principle be possible to replace all the terms given in the above examples with gener-alized terms. However:

■ In most cases, specific terms that include morpho-anatomical, developmental, and phylogene-tic knowledge are the accepted consensus in biology. They incorporate a high amount of knowledge (are most expressive) when used in descriptions and work in most cases when used in conventional, printed identification keys (p.242) or diagnostic descriptions (p.39).

■ As discussed above, different potential generalization perspectives exist. For example, in the case of asco- and basidiospores the generalized term “meiospores” was proposed. This term is based on a combination of phylogenetic and ontogenetic perspectives. In an identification context, however, this term is operationally meaningless, since it is not practical to decide on the meiotic or mitotic status of a cell division. Indeed, the term “meiospore” was accepted only in the 8^th edition of the “Dictionary of Fungi” and has been removed again in the follow-ing version (Kirk & al. 2001). Other functional generalizations like “ballistospore” (which may refer to sexual basidiospores or asexual conidia) are more suitable, because they are cor-related with structural features that make them operationally useful in identification.

■ A conflict of interest exists between data storage and specimen identification. The different spore generations of rust fungi are different objects and have properties that need to be recor-ded separately. Abandoning the specific terms would only lead to descriptive replacement phrases or data structures (e.g., “spores of monokaryotic aecial generation of spermatial func-tion”, etc.), which do not solve the problem. Ultimately, generalization always addresses the tension between uniqueness (individual recognition) and comparison.

The importance of the perspective by which a generalization is informed can clearly be seen in the Plant Structure Ontology (PSO, Ilic & al. 2006). This ontology tries to integrate the different

taxon-specific vocabularies created for model organisms in molecular studies (like Arabidopsis, Zea, Oryza). PSO recognizes that silique and caryopsis are types (i.e. specializations) of fruit, but nevertheless treats all of “silique”, “caryopsis”, “grain”, and “kernel” as synonyms of “fruit”.

This approach is a deliberate simplification guided by the gene-annotation use case of the PSO.

Unfortunately, it implies that PSO is not usable for descriptive data, and a separate version has to be created for these purposes. When creating such an ontology, it may be desirable to annotate whether a specialization is easily recognizable in identification scenarios or not.

Some important differences exist between conventional printed keys and computer-aided identification tools using multi-access keys with respect to the names of object parts:

■ Printed keys (branching keys, e.g., dichotomous) occasionally use appearance generalizations (e.g., “spiny structures present”), but often they do not. However, care for the problem is usu-ally implied. It is highly unlikely that a question in an identification key is “do you have prickles, spines, or thorns?”, or that, when asking “spines present/spines absent” the user is expected to first study whether it indeed is a spine, and having found a thorn, follow the lead under “spines absent”. Instead, the question: “spiny or not” would normally be interpreted to imply that only spines may occur at this place in the key. Ultimately, humans performing the identification rely on this convention and provide generalization knowledge themselves, i.e., when asked for a spine they compare the object primarily against the generalized and not the specific description of a spine.

■ In a multi-access key, identification can start with any character and no sequential context can be designed into the key. The need to explicitly use terms generalizing for appearance

Im Dokument Structuring Descriptive Data of Organisms — Requirement Analysis and Information Models (Seite 153-162)