• Keine Ergebnisse gefunden

Figure 4.2: UML activity diagram of the BioDWH configuration wizard.

proxy can be configured via dialog easily, if necessary. The different update strategies for monitoring are described in section 4.2.3. At least a list of files or a complete directory can be defined for download.

For experienced users and users that are familiar with the BioDWH software, it is also possible to configure a project or DWH configuration via XML. An example XML con-figuration is shown in figure 4.3 and will be explained in the following. A project has a nameattribute which is the public name of the project. Furthermore, a project contains a general descriptionof the project, an emailaddress of the owner or administrator, the database-configuration and aparser-list. In particular, a database configuration includes amanufacturer i.e. the SQL dialect of the database management system. In addition this configuration includes hostname or IP of the server where the database is located and as well on which port the DBMS is listening. Optional a connection-url can be used to establish the connection to the database server. Therefore, username and password are obligatory. The different parser for the integration are described in theparser-list. Each parser has a uniqueidthat is defined by the parser as described in table 4.1. Furthermore, thedownloadof the source files and data recovery (rename-old-tables), as described in the paragraph above, can be enabled or disabled. Themonitorconfiguration includes the up-datecycle, thedateas long number, the URL of the source files and thefilesto download.

If thefile-list is empty, the whole directory will be downloaded from the file server of the given URL. Otherwise only the listed files will be downloads. Moreover, it is also possible to manage several projects in one configuration file.

In summary, an easy-to-use Project Wizard is implemented that supports the user or

ad-Figure 4.3: BioDWH XML example configuration file.

ministrator to configure a DWH integration process in four steps. No additional knowl-edge in database systems or computer science is necessary. The whole configuration start-ing from database connection settstart-ings, via parser configuration to monitor configuration is supported by the graphical user interface. Furthermore, it is possible for an experienced user to create and edit projects via XML configuration files.

4.3.2 Project Management

The BioDWH main window contains all necessary information to manage several inte-gration projects. It is possible toOpenmultiple DWH configurations via theFilemenu.

In addition it is also possible to import single or multiple existing project configurations into the actual BioDWH system. On the other hand it is possible toSavethe current con-figuration of BioDWH or export single projects. Figure 4.4 shows the main window of the BioDWH software infrastructure with a project tree on the left. Every node in the tree represents a project that is managed by the software. Each project contains one or more parser that are appropriate for an integration process of a database. The leaves of the dif-ferent parser symbolize one step during the integration process. Additionally, the status of the integration process can be followed by the symbols of the leaves.

An overview of the important information and a detailed status of the integration process are given on the right panel of the main window. Starting from the root noteProjects

Manufacturer Dialect URL

IBM DB2, DB2 AS 400, http://www.ibm.com/db2/

DB2 OS390

Apache Derby http://db.apache.org/derby/

PostgreSQL Global PostgreSQL http://www.postgresql.org/

Development Group

SUN Microsystems MySQL InnoDB, MyISAM http://www.mysql.de/

Oracle Oracle 9i, Oracle 10g http://www.oracle.com/

Sybase Sybase Anywhere http://www.sybase.de/

Microsoft Microsoft SQL http://www.microsoft.com/germany/sql/

SAP SAP http://www.sap.com

IBM Informix http://www.ibm.com/software/data/informix/

hsql Development Group HSQL http://hsqldb.org/

Ingres VectorWise project Ingres http://community.ingres.com/

Progress Software Progress http://web.progress.com/

Diehl and Associates Mckoi http://www.mckoi.com/

Embarcadero Technologies Interbase http://www.embarcadero.com/

IBM Pointbase http://www.ibm.com/software/data/integration/dm/

FrontBase Frontbase http://www.frontbase.com/

Firebird Project Firebird http://www.firebirdsql.org/

Table 4.2: Supported databases of BioDWH infrastructure.

a list of the different projects will appear that are managed by the DWH infrastructure.

Detailed information about the database connection and a list of involved parsers will be displayed. The project node shows the general description of the project as well as the overall status of the project that is illustrated by a progress bar. More detailed information for each parser is displayed by selecting a particular parser node. Each integration step has its own task pane as illustrated in figure 4.4. First, the general status of the parser is given, followed by theParser informationincluding theDownload/Updatesettings that have been described in section 4.3.1. Furthermore,Download Informationof queued and active downloads are presented in a task pane, if this step is enabled. The task pane for file uncompress is similar. Finally, the status of the data integration process, which means extraction, transformation and loading, is illustrated by a progress bar.

The BioDWH infrastructure is running with multiple threads which means it is possible to run several download processes, uncompress processes or integration processes in parallel.

Integration processes are volatile tasks, hence it is strongly dependent on the hardware of the server or computer. Therefore, at the moment the BioDWH software is restricted to run two integration processes in parallel. An overview of the running processes could be found in the Process queue window that is accessible via the Properties menu. The same functionality is implemented for the file downloads that are listed in theDownload queue. From the point of implementation it is possible to run more parallel processes.

This feature will be available via configuration settings in the near future.

Figure 4.4: Screenshot of the BioDWH graphical user interface.