• Keine Ergebnisse gefunden

in this table. The selection can then be used as input for further analyses like clustering. This feature allows for very flexible and intuitive data-analysis, on the other hand there is a substantial risk of loosing objective criteria for data-analysis and filtering.

Rosetta Resolver is a product certified by the United States Food and Drug Administration (FDA). The certification states that the software is in accordance with FDA regulations for the deposition of experimental data from a drug-design process. Hence, it is practical to apply Rosetta Resolver in life-science companies for large-scale screening projects for potential drugs for the United States market.

The certificate does not state the correctness of the software, the accuracy of its results, or the absence of software errors (which is, of course, infeasible to prove).

In summary, the Resolver software offers promising features, like interactivity and broadcasting of selected items to support intuitive selection of expressed genes. The downside is a intuitive workflow which may direct the user from objective criteria such as p-values to subjective criteria such as visible spots. Hardware and sotware requirements, incomplete handling of normalized data, and lack of data-integration and extensibility with respect to data-analysis make the software unfavorable in an academic environment.

4.8 Extensions to Existing Programming

4.8. Extensions to Existing Programming Environments 75 additional functionality from a large collection of general purpose R packages or even implement own analysis algorithms in R. R can be further extended as there are bindings to many other programming languages, for example C, C++, Perl and Fortran. Bioconductor also contains repository functions like interfaces to database management systems.

It is unavoidable to learn the syntax and data structures of a programming language with a very rich type system providing contradictory paradigms of pro-gramming languages12, but without type-safety. Basic functionality is often hard to find within all the R packages, even with a firm background in statistics and experience in using the R language, its manuals, and online help.

In summary, R and Bioconductor provide extremely flexible, function rich, exten-sible and computationally efficient tools. The number of methods for microarrays is almost complete with respect to the published methods. They are hard to learn even for computer-scientists, and therefore absolutely inappropriate for unexperi-enced users and also for centralized repositories and data management.

12Most relevant and complicating paradigms are providing a functional and object-oriented lan-guage with two incompatible approaches to object-orientation.

CHAPTER 5

Requirements and Specification

The existing software systems depicted in the previous chapter all have specific limitations. These are mainly located in the field of software engineering like fail-ure to employ design patterns and to provide defined interfaces. They also lack completeness and extensibility of analysis functionality and data models, data in-tegration with other applications, and seamless analysis pipelines with automated annotation of transformations.

It is not desirable to maximize the sheer amount of functionality of the user-interface but to provide the functions that are required. Otherwise, unnecessary and thus unused functions could distract the user’s attention, make the interface overly complex, and finally leave the programmer with unused and untested code fragments.

Lack of important functionality without being able to extend the software later on can lead to low acceptance of the whole effort. As a consequence, requirement analysis and specification are the first steps in the development process of a soft-ware. This analysis can also be seen as the attempt to achieve the largest possible intersection between three sets of functions: the functions the users require, the functions the software finally has, and the functionality that is most beneficial to accomplish the tasks the users really need to perform1.

5.1 Use Cases

During a use case analysis, the tasks a software system has to perform are defined from the users’ perspective. A use case model can be described using the Unified

1In case of conflict one might maybe want concentrate on the latter two.

Modeling Language (UML) (Rumbaugh et al., 1999).

To gain a rough impression of the system, the functions such a system will have to perform are collected in an informal way. The required functionality of the software was gathered from two sources: First, in multiple discourse sessions with potential users having no specific background in programming, the basic requirements from the point of view of regular standard analyses were acquired. Second, it was also important to gather requirements from software developers. This is unusual for business-oriented software, which tends to follow mostly requirements or aspects of merchantability, for academic software it seems reasonable.

For this project, two aspects are important: the software shall serve as a tool for biologists to perform routine tasks of data-analysis efficiently, and it shall be used by computer-scientists as a tool to develop and evaluate and deploy new methods.

The second aspect is very important, as it provides the potential to be creative and to provide functionality that is unparalleled in other software.

The functional requirements were classified into distinct groups to gain a more structured overview:

User interaction The user interface has to support the user in handling the soft-ware and must be robust against errors. The user interface also has to be complete in that it provides all functionality to perform advanced analysis tasks.

Data handling An application for microarray data has to deal with the large amount of high dimensional datasets this techniques create. To design a consistent and efficient system, good insight into the structure of that data is of primary importance.

Data integration functionality This class contains functionality responsible for any communication, linking and data exchange with other software. The decision to follow established open standards, namely MIAME and MAGE-ML, is the most important requirement in this class.

Data analysis Data analysis functionality is directly associated with required anal-ysis methods, successive organization of methods, and their modular exten-sion. Analysis functionality was kept separate from user interface function-ality, backend functionality and data integration functionfunction-ality, as it relies on many aspects of all of the other fields and consequently would have been present in all of them.

Administrative functions These functions subsume tasks like user and account management and database maintenance.

Technical aspects describe general technical requirements resulting from the ex-isting software environment like the need to be able to have an easy to install version of the software or the expected software environment of potential users.

5.2. User Interface 79