Existing Systems Revisited - EMMA2 : a MAGE-compliant system for the analysis of microarray dat

ex-pression data not only to genes, but to a defined set of experimental conditions and data-types. Although, applications often require singular measurements for a gene, one has to keep in mind that this is a very reductionist view on gene-expression data. This problem needs to be solved in the design of an appropriate query interface.

In conclusion, it is necessary to provide at least two views on data-interoperability:

• A simplified access model by which simple queries for single datum-point can be processed and

• a complex access model, by which complex queries can be implemented, which are not specified at this time and may use all information in the repository.

capabili-5.4. Existing Systems Revisited 85 ties, as well as LIMS functionality. A rather easy to extend plug-in architecture is found only in BASE. The other applications either contain a fixed set of tools or rely on other analysis tools installed on the user’s local computer.

Most of the systems do not follow basic paradigms of software development and lack object-oriented architectures and structured programming interfaces. Without such interfaces, software is hard to extend with required functionality. As a result, communication interfaces for interchanging expression data and annotations with other systems are hard to implement. To add supplementary interfaces requires to restructure the software completely.

The same holds true for full support for the complete MAGE-ML format. Some provide partial implementations restricted to data export. Enhancing the MAGE-ML capabilities of such software to the level of full support is impossible without major re-implementation of their storage-backend, because not all annotation data encoded in MAGE-ML can be persistently stored in their database models. Chang-ing the database-model of a software is a very deep intervention in the system and may require changes throughout many other components of the software.

The proposition to implement a MAGE-compliant system from scratch appears sound, given the large extent of required reformations and the consequences of having to re-structure large portions of the software. Provided existing code, such as R and Bioconductor and existing communication infrastructure can be used, the ratio of expenditure of human labor and the adequacy of the system building a well-designed architecture from scratch can be much lower than refurbishing an inappropriate system.

CHAPTER 6

Object-Oriented System Design

Requirement analysis delivers a set of specifications of a desired system. The speci-fications consist of UML use-case diagrams, lists of analysis functions required and textual natural language descriptions of the desired functionality. As a look at the existing systems has shown, there are many alternative ways to design a microarray storage and analysis system.

The main development paradigms were chosen from the point of view of the de-veloper of the software: the functionality should be modular, and by that, allow to start out with a small set of functions for a prototype and then to add additional functionality afterwards. A modular system has advantages over a monolithic sys-tem in being easier to test and maintain.

The system was therefore designed using an object-oriented approach. This in-volves to use object oriented methods during each step of the design process. Classes of the application logic, persistent classes, and the software component architecture were defined using UML-based design tools.

6.1 Architecture

A so called three-tier architecture is commonly used in software development today (Shaw and Garlan (1996), see also Section 4.7). This approach applies mainly to large distributed software systems involving a database storage component for data.

The three-tier approach divides in three layers or tiers:

The database tier or backend layer provides mechanism for storage and retrieval of the data. This layer is often implemented using a database management

system, but could also be built using files, especially XML-documents. Often an abstraction layer like ODBC is used to encapsulate the database.

The application tier or business layer contains the actual application logic and also distributes data between the database and the clients.

The presentation layer is responsible for interacting with the user of the software.

It receives user input, interacts with the application tier and presents the results to the user.

An important principle of general multiple-tier architectures is, that all com-munication passes linearly through the layers. There exist no short-cuts to access the database directly from the presentation layer. Structured interfaces have to be provided by every layer to communicate with each other.

The three-tier approach, as presented here, has some limitations, and thus has to be modified and extended into a multi-tier approach. Not only has the information of the centralized server to be sent to distributed clients. Additionally, it is stated in the specification of EMMA2, that a bidirectional communication with other software is required. Therefore, it was decided to provide the complete functionality of the business logic to external applications via an extra layer. This intermediate layer communicates with the application tier and passes objects and messages to other applications.

Database invocation is based on queries and not on message passing. Direct invocation of underlying relational database can pose problems within an object-oriented approach. The table structures might be very different from the application logic. Moreover, additional documentation is required on the database structures.

Thus, using a simple relational database management system (RDBMS) would result in leaving object-oriented design at this point. Object oriented database sys-tems do not have these drawbacks. But there could arise other difficulties for time critical applications, which would benefit from SQL like queries. A suitable com-bination consists of an object-oriented database management system based on an RDBMS (Alagic, 1989). To provide object-oriented features, an RDBMS can be en-capsulated in an object-oriented abstraction layer, still allowing low-level access to SQL-queries for a small number of time-critical operations (Blaha and Premerlani, 1998).

The previous considerations result in the following modified multi-tier architec-ture for the specified system (see Figure 6.1 on page 90):

The backend layer consists of a relational database management system together with an object-relational mapping, which should provide a structured inter-face, accessed by the other layers.

The application layer provides the necessary application logic. It also provides interfaces to other applications which are used as embedded applications for the purpose of performing computational tasks.

6.2. Object Model 89

Im Dokument EMMA2 : a MAGE-compliant system for the analysis of microarray data in integrated functional genomics (Seite 102-107)