• Keine Ergebnisse gefunden

3. Results

3.1. GraHMMer

The developed and implemented application is called GraHMMer. The name derives from the word “graphic” and “HMMER”. The used programming language is C# with Visual C# Express as IDE.

GraHMMer has some essential prerequisites to run properly. First HMMER for Windows must be installed; second the .NET-Runtime is necessary. So far

GraHMMer runs on Microsoft Windows™ systems. It was tested on Windows XP and Windows Vista Home Edition successfully.

The intended cross platform functionality is not implemented completely, because of some incompatibilities between .NET-framework 3 and the current implementation of the Mono framework. The transfer of the graphical components made some

difficulties. Some of the used controls are not available for Mono or show a slightly changed behavior.

A graphical user interface was implemented which provides a practical overview of the available features HMMER for Windows provides itself. Beside the HMMER-subprograms, HMMER for Windows delivers some additional helpful tools. One of these tools, afetch, is included into GraHMMer, too. The other additional programs are prepared for later implementation.

The results of HMMER should get processed to get a graphical presentation. To implement this feature the usage of HMM-Logos was intended, but it turned out that the compilation of the HMM-Logo Perl package on a Microsoft Windows™ system is very difficult. This problem comes from the fact, that the HMM-Logo package itself and some of its prerequisite packages contain C-code. This C-code turned out to be impossible to compile without errors on a Windows™ system. Furthermore the Active

33 Perl software needs a certain format for added Perl packages, which is not available for the HMM-Logo package, too. To substitute the planed embedded functionality of a graphical output, a web interface got integrated into GraHMMer. With this interface the access to the online platform of the Sanger institute is made possible, where HMM-Logos can be created33. For the usage of this online platform inside GraHMMer an internet access is needed.

Additionally the text output of hmmsearch and hmmpfam is processed in a way that the parts it contains are displayed separately for a better overview. Those parts are

‘Sequence Scores’, ‘Domain Scores’, ‘Alignments’ and the produced ‘Histogram’. The full output gets displayed as ‘raw Result’.

Test-results can be edited inside GraHMMer and also they can be exported into other files. Principally GraHMMer is project based. This means a treeview on the left side of the application shows folders, which represent projects (see Figure 14).

Figure 14: Screenshot of GraHMMer, selected project is marked green in the treeview, selected HMM subprogram is hmmbuild

33 http://www.sanger.ac.uk/cgi-bin/software/analysis/logomat-m.cgi

34 All files produced by the usage of GraHMMer are stored in the current selected

project-folder. The files in this folder are screened by extensions, e.g. *.hmm, which activates certain switches in the call of the appropriate subprogram.

The menu of GraHMMer is divided into four sections: ‘GraHMMer Project’, ‘HMMer Subprograms’, ‘Options’ and ‘Help’.

The ‘GraHMMer Project’ menu provides submenus for the handling of project-folders.

New projects, in the form of folders, can be added to the main project path – which will be explained afterwards. An existing project can be selected as the current working project, files can be added to the working project and the application can be closed from here.

The HMMER subprograms are grouped into two submenu categories. They can be found under the ‘HMMer Subprograms’ menu and are called ‘Search’ and ‘Build’. The

‘Search’ menu (see Figure 15) contains all subprograms which can be used for searching or ‘reading’ with, in or from the profile HMMs.

Figure 15: Screenshot - the GraHMMer Search menu contains HMMER subprograms with search functionality

Those subprograms are hmmsearch, hmmpfam, hmmalign, hmmemit and hmmfetch as can be seen in Figure 15.

35 Figure 16: Screenshot - GraHMMer Build menu contains HMMER subprograms for building

profile HMMs and profile HMM databases

The ‘Build’ menu (see Figure 16) contains all HMMER subprograms which are used for creating or optimizing a profile HMM, respectively for building a profile HMM database. This programs are hmmbuild, hmmconvert and in another submenu –

‘Optimize’ – hmmcalibrate and hmmindex. Additionally the menu contains another entry – ‘create profile hmm database’. This entry opens a dialog box, which provides all available profile HMMs in the working project folder. The option to add already existing profile HMM databases can be selected, too.

The third submenu of the ‘HMMer Subprograms’-menu is ‘Tools’. This menu is designed to carry the additional HMMER for Windows programs. In this version only one of these programs, afetch, is implemented in GraHMMer.

As to be seen in Figure 17, the profile HMMs can be selected and a name for the new pHMM database has to be entered. The created database will be saved into the working project folder.

36 Figure 17: Screenshot - dialog box for the creation of profile HMM databases

Below the ‘Options’-menu some essential settings can be entered or changed.

Beneath the ‘Database Paths’ – menu the path to a pHMM database, which is a multiple alignment database, e.g. SWISSPROT, and another pHMM database, e.g.

Pfam can be entered. Those entries are not essential but can be very helpful, because they are added to the adequate controls of the application.

Furthermore, in this menu the user has the option to change the path to the HMMER installation, to be found in the submenu ‘HMMER Path’.

Last, the path to the main project folder can be set in ‘Set Project-Folder’. The main project path is the path of the folder which contains all project folders. The main project folder itself won’t be displayed in the project – treeview.

The ‘Help’-menu contains an ‘About’- and a ‘Help’-section. The ‘About’ menu provides information about HMMER, GraHMMer and HMMER for Windows.

The ‘Help’-submenu is divided into two subsections. The first section is ‘GraHMMer User guide (PDF)’ which opens a PDF file with usage information for GraHMMer itself. The second section, ‘HMMER2’, opens either the HMMER 2 user guide as a PDF or an online help resource34 for HMMER 2. To access the online manual pages for HMMER 2 an internet access is needed.

34 http://www.psc.edu/general/software/packages/hmmer/hmmer.html

37 GraHMMer is also divided into the project treeview on the left side and a tab-page on the right side.

The treeview displays all available project folders and their contained files. The current working project folder is marked green and unfolded. With a double click on a file inside the current working project a dialog box opens, displaying the content of the file. This dialog box (see Figure 18) provides the ability to edit the selected file and save the changes.

Figure 18: Screenshot - display a file by double clicking it in the treeview, displayed is the file nucleic.null from the HMMER2 tutorial

The main part of the application is occupied by a tab-page. This tab-page consists of five tabs, which will be explained in the following paragraph.

The first tab-page – ‘Settings’ - shows some workflow charts when starting the program. The charts explain the essential workflows when working with HMMER2.

When an HMMer subprogram is selected, the ‘Settings’-tab changes its appearance.

Depending on the chosen HMMER-program, the corresponding input and selection controls will be displayed. Besides the needed input files, options can be chosen and according values can be entered. Some HMMER programs provide additional expert options, which are only displayed in GraHMMer when the ‘show more options’

checkbox is selected. This can be seen in Figure 19. When the cursor is navigated

38 over an option control a tooltip will pop up, displaying some short information about the according option.

The option-selection of the user will be saved in the local configuration file and on the next start of the application the same options will be selected.

Figure 19: Screenshot - the input- and selection options for hmmsearch, expert options are displayed

On the left corner of the tab-page an information-button is displayed. Depending on which subprogram has been chosen a click on that button opens a dialog box with general information about the selected program (see Figure 20). This information text is taken from the manual pages for HMMER from the Pittsburgh Supercomputing Center – Website35. By clicking the ‘Execute’-Button the program will be executed with the entered options and files as input.

The ‘Cancel’-Button resets the selection of the HMMER – subprogram and clears the

‘Settings’-tab.

35 http://www.psc.edu/general/software/packages/hmmer/hmmer.html

39 When the ‘Execute’-Button is clicked and the program is executed properly, the whole result will be saved into a new file in the working project folder. This feature is only available for hmmsearch and hmmpfam; additionally the program output will be displayed on the ‘Result’-tab page.

Figure 20: Screenshot - hmmsearch information box, opens after clicking the blue 'I'-button

The ‘Result’-tab itself contains another tab page. This tab page is divided into five sections: ‘Sequence Scores’, ‘Domain Scores’, ‘Alignments’, ‘Histogram’ and ‘raw Result’. The first four tabs are only filled when hmmsearch or hmmpfam are

executed. The ‘raw Result’ tab displays the whole command output, which can also be an error message.

The ‘Sequence Scores’ tab shows the content of the output section which starts with

‘Scores for complete sequences (score includes all domains):’. The content is a list of top hits, which are sorted by the expectation Value36 (E-Value), whereas the

sequence with the most significant E-Value is displayed on the top of the list. [Eddy, 2003]

This section is shown for hmmsearch and hmmpfam results only.

In the ‘Domain Scores’ section the part of the output beginning with ‘Parsed for domains:’ can be seen. It contains the domain top hits. Every domain with a raw

36 Number of expected hits of a certain score or about by chance in a sequence database of the used size [S. Eddy 2003]

40 score over 0 of each sequence with an E-Value below 10 is listed [Eddy, 2003]. The

‘Domain Scores’ are also available for hmmsearch and hmmfpam.

The next tab is ‘Alignments’, it starts inside the raw output with ‘Alignments of top-scoring domains:’. Here each domain from the previous list will be displayed in a BLAST-like alignment [Eddy, 2003], which can be seen in Figure 21. The section begins with ‘Alignments of top-scoring domains:’ in the raw output.

As the last section the ‘Histogram’ is shown, the beginning text is ’Histogram of all scores:’The histogram contains increasing raw scores along the Y axis plus the number of sequence hits along the X axis.

The ‘Alignments’ and ‘Histogram’ tabs are only filled for hmmsearch. Both, hmmpfam and hmmsearch have two additional sections in the raw output, which are not

displayed seperatly – the header and the statistical details.

The next one of the main tab pages is the ‘Results Graph’. It includes a browser-like webinterface which automatically loads the web-portal of the Sanger institute37, which provides the LogoMat-M, a HMM-Logo creation tool. For this function an internet access is needed. On this portal a *.hmm file can be uploaded and a HMM-Logo will be created. This graphical output can be saved via context-menu on the webinterface into a file.

As the next tab-page comes ‘Source File 1’. This tab displays the first input file for the chosen HMMER subprogram. ‘Source File 2’ displays the second input file. For both tabs a restriction exists. A file which has a certain size can’t be displayed. In that case a message box will pop up with this information. An example for such a file would be the complete swissprot database file. The source files 1 and 2 can be edited and saved in the according tab pages.

37 http://www.sanger.ac.uk/Software/analysis/logomat-m/

41 Figure 21: Screenshot - Result tab page, detail view on the domain alignments from the raw

result

42