MAGE - Machine Learning Approaches - EMMA2 : a MAGE-compliant system for the analysis of microa

3.5 Machine Learning Approaches

4.1.2 MAGE

4.1. Standardization and Specification 55

Figure 4.1: Overview of the three base classes of the MAGE-OM hierarchy and all packages. All other classes are derived from one ofExtendable,Describable or Identi-fiable. The folder icons denote the 15 packages into which all classes except the base classes are divided. Diagram taken from http://www.ebi.ac.uk/arrayexpress/-Schema/MAGE/MAGE.htm

for providing bibliographic references, the Description package for providing free text and structured descriptions including database and bibliographic references.

The Measurement package provides a class hierarchy for measurement units. The Protocol package allows to define protocols for laboratory, hardware and software applications. The protocols function as prototypes with each individual application of a protocol requiring values for specific parameters of the protocol. The Audit and Security package should provide means to specify who has access to specific objects and to record object creation and changes. The object model has a deep inheritance structure for many classes. The class hierarchy relies on a hierarchy of three base classes depicted in Figure 4.1.

Extendable objects allow for arbitrary annotation in a ‘Name-Value-Type’ for-mat. It is stated that this type of annotation should not be used for standardized information that also fit into other classes. Describable objects can have added free-text descriptions and bibliographic references. Identifiable objects are used to specify objects which need a unique identifier. Identifiable classes subsume con-tent to be identified and referenced uniquely within a database or document, for example experiments, arrays or sequences. The Identifiable class provides name and identifier attributes where the identifier has to be unambiguous, which means it has to be unique within a document or within a repository but not necessarily worldwide. The attribute name can be any possibly ambiguous name.

Although MAGE-ML is intended to capture MIAME compliant annotations, it contains mostly optional associations and attributes. This seems to be in

contra-diction to the MIAME specification but in fact is stringent, as often during the analysis process not all data is already available, e.g. while designing the layout of the microarray the actual sequences to spot on the microarray are not available.

Also the MIAME specification might change over time and adoption of required and optional fields by changing the object model and thus the language syntax might render existing documents or databases incorrect.

The MAGE object model has been designed to be not restricted only to DNA microarrays, but to be suited for protein arrays and other types as well. In summary, the MAGE object model has been designed with flexibility in mind as it is able to capture far more details than specified by MIAME.

This flexibility, on the other hand, is also one of the biggest trade-offs of MAGE-OM, as by increasing the flexibility of a model its complexity is also increased.

From the point of database design, the inherent complexity has the disadvantage of introducing ambiguity. This means that there are several possibilities of how to encode a MIAME compliant annotation into MAGE. This issue has been addressed by a document by MGED describing a standard way of encoding an annotation into MAGE. This problem might also be due to a flaw in the design process of the object model as this document appeared lately after the standardization of MAGE.

Another drawback resulting from model flexibility is that the model uses un-common names. Naming of classes is often based on abstract terms different from concrete technical terms used in the laboratory. This approach serves a correct naming of different techniques. The Hybridization class, representing the process of concurrent hybridizations of labelled extract with the reporter molecules on the microarray, is a subclass of BioAssayCreation. This is a more generally valid term, as the process of binding the labelled protein extract to a protein microarray does not involve hybridization of complementary DNA. This class is described as “The process by which an array and one or more biomaterials are combined to create a BioAssayCreation”. So far Hybridization is the only subclass of BioAssayCreation and has no attributes on its own. This can be seen as unnecessary complexity in the model.

An additional disadvantage of MAGE is its weak support for data mining tech-niques. In the HigherLevelAnalysis package classes are contained that can rep-resent a tree structure of hierarchical clustering and a set of distinct clusters for e.g. k-means clusters but are restricted to hard cluster assignments. Unlike for a transformation event, neither the algorithm nor its parameters for generating these clusters can be stored.

Despite these disadvantages, the success of the MAGE-ML standard is evident.

All three major public microarray databases support it, it is supported and dis-tributed by a standardization organization, and reasonably stable. There has been only one minor revision of the model since its first publication from version 1.0 to 1.1. Furthermore, it seems that the need for expressive power of MAGE outweights need for simplicity, because the model is not primarily intended for direct human editing but for automated data-interchange and for use with software providing a simplified view on the data-structures. The need for interchanging the results of

4.1. Standardization and Specification 57 higher-level analyses can be neglected, too, because it should be possible to recon-struct them from the data given the parameters. In summary, MAGE-ML seems to be sufficient to annotate current and future microarray experiments.

Im Dokument EMMA2 : a MAGE-compliant system for the analysis of microarray data in integrated functional genomics (Seite 72-75)