• Keine Ergebnisse gefunden

3. Results

3.3. Experiments

3.3.1. Used Data

On May 18, 2008 a Pfam-A keyword search was done on http://pfam.janelia.org/. As search term the keywords ‘tumor necrosis factor tnfa’ were used. As a result (see Figure 26) among others the PF00229 – TNF (Tumor Necrosis Factor) family was delivered (ID: TNF).

Figure 26: Screenshot - Search result from Pfam-A; search keywords: tumor necrosis factor tnfa; May 18, 2008;

The full alignment of PF00229 was downloaded in MSF-format into the folder

“C:\[…]\Experiments\TNF_Alpha” on the working machine (see Figure 27 and Figure 28).

48 Figure 27: Screenshot - Download of PF00229 alignments from Pfam-A in MSF-format;

May 18, 2008

On NCBI – Entrez Protein38 a keyword search with the term “tumor necrosis factor”

was performed on the same day. The search returned 8987 results in total. Three hits, which had no integrated link to the Swissprot-database, were chosen in FASTA format:

• NP_001009385 (Felis catus)

• CAA40591 (Sus scrofa)

• NP_001117846 (Oncorhynchus mykiss)

38 http://www.ncbi.nlm.nih.gov/sites/entrez

49

Q4W899_FUGRU/81-191 rswqefhsgg scsfvhhegs ihcrkns--- ---..--- --- Q7QIB9_ANOGA/430-450 ... ... ...--- ---..--- ---DG Q3TBX2_MOUSE/212-329 ... ... ...SGW EETKI..NSS SPLRYDRQIG Q5F2A0_MOUSE/131-248 ... ... ...SGW EETKI..NSS SPLRYDRQIG Q5F2A1_MOUSE/131-210 ... ... ...SGW EETKI..NSS SPLRYDRQIG Q8BXS2_MOUSE/132-211 ... ... ...SGW EETKI..NSS SPLRYDRQIG Q4ACW9_HUMAN/131-248 ... ... ...SGW EEARI..NSS SPLRYNRQIG TNF12_HUMAN/131-248 ... ... ...SGW EEARI..NSS SPLRYNRQIG […]

Figure 28: Excerpt from PF00229_full.msf

50 3.3.2. Sequence Analysis with HMMER for Windows on the command line

The downloaded files were copied into the folder […]\Experiments\TNF_Alpha_cmd.

A command line window was opened via Start – Execute – cmd. In the open command line window the location must be changed to the installation – path of HMMER for Windows.

First a pHMM was generated using hmmbuild.exe and PF00229_full.msf. To get the correct input method hmmbuild.exe –h was executed before (see Figure 29).

Figure 29: Screenshot - before the execution of hmmbuild.exe on PF00229.hmm

51 The whole path had to be entered for parameters, the output- and the input-file. The execution is started by pressing the ‘Enter’-key. After the execution of the command the output is displayed on the screen.

After the pHMM is generated, it must be calibrated with hmmcalibrate.exe. Again hmmcalibrate.exe -h was entered to get the information how to call the program. Also the path for the input- and output-files must be entered full. The faulty entry of the parameter --histfile, with only one strike, just led again to the display of the

information of hmmcalibrate.exe -h on the screen without any hint what was entered wrong.

For the creation of an HMM-Logo a Browser was opended and the URL to the LogoMat-M was entered. The generated pHMM was uploaded and the created graphic was saved via context menu into the desired folder.

As the next step the three downloaded protein sequences in FASTA-Format where searched against the PF00229.hmm and the output was redirected into a file to save the result. Before this could be done again, the help-function must be called.

A search against the downloaded Swissprot-database39 was performed similar to the search with the sequence files; it took 29 minutes and used 99% of the CPU

resources in average. Again the full paths had to be entered. While the duration of the execution nothing could be entered into the command line window.

Hmmconvert were executed with the parameter -p to convert the original pHMM into a pHMM in GCG format. Again the help function was needed to be called.

With hmmemit the original multiple alignment was generated and saved into another file.

To use all HMMER subprograms on the command line full paths of input- and output files must be entered and the help function was called to get information how the single subprogram has to be called exactly.

39 http://www.ebi.ac.uk/uniprot/database/download.html, downloaded on April 10, 2008

52 3.3.3. Sequence Analysis with GraHMMer

In GraHMMer the folder “C:\[…]\Experiments” was selected as main project folder and the folder TNF_Alpha was selected as the current working project.

As first step a profile hmm with the name PF00229.hmm got generated (see Figure 30 and Figure 31) with hmmbuild and the downloaded multiple alignment of PF00229 (TNFA family).

Figure 30: Screenshot - creation of PF00229.hmm with hmmbuild in GraHMMer and PF00229.msf

53 Figure 31: Sceenshot - PF00229.hmm viewed with the FileViewer-feature of GraHMMer

With the created pHMM-file a HMM-Logo was generated using the built in Web interface to the LogoMat-M of the Sanger institute. The resulting picture can be seen in Figure 39.

The three protein sequence files where searched against the newly generated pHMM PF00229.hmm with default option settings for hmmsearch. This search created three new files in the project folder: NP_001009385.hmmsearch, CAA40591.hmmsearch and NP_001117846.hmmsearch. The full result-text of NP_001009385.hmmsearch can be seen in Figure 40 in the Appendix. This raw output was split into four sections as can be seen in Figure 32, Figure 33, Figure 34 and Figure 35.

Figure 32: Screenshot - Search with NP_001009835.fa against PF00229.hmm; Sequence Scores

54 Figure 33: Screenshot - Search with NP_001009835.fa against PF00229.hmm; Domain Scores

Figure 34: Screenshot - Search with NP_001009835.fa against PF00229.hmm; Alignments

Figure 35: Screenshot - Search with NP_001009835.fa against PF00229.hmm; Histogramm

Next the PF00229.hmm was searched against a local Swissprot-database file in FASTA format (downloaded40 on April 10, 2008). This took 32 minutes and used 99%

of the CPU resources in average on the Microsoft Windows XP™ Notebook, described in Materials and Methods. During the execution of this search the application did not respond. The text of the raw result was exported into the file PF00229.hmmsearch in the working project folder.

40 http://www.ebi.ac.uk/uniprot/database/download.html

55 The calibration (see Figure 36) of the PF00229.hmm with hmmcalibrate took about 1 minute.

hmmcalibrate -- calibrate HMM search statistics HMMER 2.3.2 (Oct 2003)

Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL)

- - - HMM file: C:\Dokumente und

Einstellungen\Nadine\Desktop\Diplomarbeit_Text\v4\Experiments\TNF_Alpha\PF00229.hmm Length distribution mean: 325

Length distribution s.d.: 200 Number of samples: 5000 random seed: 1211131777 histogram(s) saved to: 0

- - -

HMM : tnfa_PF00229 mu : -67.941818 lambda : 0.220730 max : -31.461000 //

Figure 36: Textoutput of hmmcalibrate on PF00229.hmm

With hmmconvert PF00229.hmm was converted into GCG format and the output saved into PF00229_2.hmm (see Figure 37).

56 Figure 37: Screenshot - hmmconvert on PF00229.hmm into GCG Profile

The result (see Figure 38) of a run with hmmemit on PF00229.hmm was saved into tnfa.txt. The set options were ‘Output as alignment’, ‘Single consensus sequence’

and ‘Save to file’.

Figure 38: Screenshot - result of hmmemit on PF00229.hmm

57 Figure 39: HMM-Logo of PF00229.hmm; edited to fit on a A4 page; [Schuster-Böckler, et al., 2004]

58

4. Discussion

Following the results will be discussed. Both, the occurred problems during implementation and what can be done to improve this version of GraHMMer.

The usage of GraHMMer compared to the usage on the command line will be discussed, as well.

4.1. Occurred Problems

Two of the announced basic aspects regarding the application could not be implemented the way they were intended.

The first one is the cross platform functionality. The application should run as well on Linux systems as on Microsoft Windows™ systems. In the end the software runs smoothly on Microsoft Windows™ systems. Though the portability between the two systems should not make severe problems, it did.

One reason for these problems is certainly the clear difference in the implementation of Mono on Linux and .NET Framework version 3 on Windows™.

The application was developed using Microsoft’s Visual C# Express (VC# Express), because this IDE provides a much better interface and handling for both, the

implementation of the GUI and the writing of the source code. The documentation and help files are much better available and exhaustive. In comparison, the

MonoDevelop IDE was not an option. The development of the GUI turned out to be much less precise. With Visual C# Express the controls and windows can be moved by drag and drop. The size and behavior of the controls could also be changed by double clicks with the cursor. Methods and functions were added automatically by double clicking or selection in the according menu in Visual C# Express.

Whereas MonoDevelop also provides an interface for the GUI development, this interface is by far not that flexible. It reminds more on the implementation of GUIs with Java. The windows can only be divided into inflexible segments which then absorb the controls. These controls adopt the size of the according segment and cannot be moved easily inside this window. Adding the source code to the created graphical interface was not that easy as with VC# Express as well.

The usage of the Microsoft™ IDE was surely one reason for the problems while migration the application into Mono. The IDE adds code to the source automatically,

59 which cannot be processed by Mono or MonoDevelop. Second, VC# Express

provides all controls and properties which are available for Microsoft™ Systems. But many of those controls do not work on a Linux system. This turned out late during development, when not enough resources were left to change the GUI again.

Because of the different implementations of Mono and .NET 3 there are many

classes and functions, provided in .NET 3 and used with VC# Express, which cannot be accessed in Mono. One example is the user specific saving of control settings.

The application ‘remembers’ which option the user chose the last time he used it.

This works on Windows™ without problems by using the class

ConfigurationManager. But this class and according functions did not work for Mono.

The second one is the graphical output of the obtained HMMER results. In the beginning a totally new developed graphical output was considered. One solution would have been to display the alignments of found domains with colored letters and additional information, like it is done in Jalview41 for sequence alignments. But

another already used method would have been the usage of HMM-Logos. The Perl source code for the HMM-Logos can be downloaded for free in version 0.7.742. The call of the regarding method via the runCommandTool-function certainly would be no problem. The difficulties occurred in preparing the runtime environment for the usage of the Perl package. Some libraries must be installed before the HMM-Logo package itself can be installed. One of those, PDL 2.3.4 (or a higher version), did not compile without errors on a windows system. The logo package itself made similar problems.

The reason is that both libraries contain C code which made it very difficult. If more resources would have been invested this problem surely could have been solved. But beside the lack of resources, one has to think of the user. Every user, who wants to use this function, would need to install this package on his or her runtime system.

The main aim of GraHMMer is, to make it easier for users, who are no professionals with computer administration, to work with HMMER. This aim would be prevented by the usage of this package. Because of that, the decision to integrate the online platform HMM-LogoMat-M was made.

Additionally to that, the text output of hmmpfam and hmmsearch is split into four, respectively into three subsections. Each of those sections displays another

41 http://www.jalview.org/, last verified on May 18, 2008

42 http://logos.molgen.mpg.de/ or http://www.sanger.ac.uk/Software/analysis/logomat-p/HMM-current.tar.gz , both last verified on May 18, 2008

60 interesting and important part of the output. By the detached representation the overview over the single part is much easier. Furthermore this might be a preparation for a later illustration of those results using colors.

61

4.2. Usage

4.2.1. Command Line

All paths have to be entered fully by hand. Additionally on Windows™ the paths must be entered with a “ at the beginning and at the end, because the blanks lead to errors in the execution of the HMMER subprograms.

If a faulty entry of the parameter-options occurs the HMMER-subprograms do not provide a helpful error-message.

To view or edit the created files the user has to open an explorer window and has to navigate into the right folder. Then she or he has to open the file with an appropriate application.

If the user wants to save the output of a HMMER-subprogram which does not provide this function itself she or he has to be familiar how to redirect the output on the

Windows™ command line into another file.

For the search in a database the user has to enter the full path to the database or needs to know how to add these paths to the Windows™ environment variables.

4.2.2. GraHMMer

The clearly arranged menu makes it easy for the user to select the desired function.

By the arrangement of the selectable options the user does not need to know all possible alternatives and she or he does not need to call the help-function on the command line to get this information. The functions are explained via tooltip, so that a quick overview is given.

The parameter-options are saved inside the application and the user does not need to know how they must be typed to work properly.

Created files can be viewed inside the application and for many of the integrated HMMER-subprograms the result is automatically saved into a file inside the working project folder. If this feature is not implemented the user can export the result inside the application into the desired folder by using an explorer window.

If database-paths have been entered by the user in the Options-menu, those

databases are included in the according controls. The user does not need to enter or select the path of or to the databases.

62

4.3. Improvements and Future View

This version of GraHMMer is of course extensible and improvable. Some features which were planned in the beginning did not make it into this first usable and running version. Other features can be thought of to extend the functionality of this tool.

The two features which were not or not fully implemented are:

• cross platform portability

• conversion of the result into a graphical representation

Some functionality which would make the application maybe more comfortable are:

• Reporting / Mailing Function:

A build-in function to mail projects or part of projects out of the application to someone else and/or the possibility to mail occurred errors of the application to the developer.

• Batch - Mode:

A method which would provide an interface to the user, with which she or he could start a batch job. For example to search a number of files against a profile HMM database one after the other without the need of an intervention by the user.

• Alignment Tools (Smith-Waterman, Needleman-Wunsch, ClustalW):

The integration of one or more tools with which the user is able to create single or multiple alignments inside the application and add the resulting files to the project folder. This extension could also be enhanced by the integration of JalView or a comparable display function.

• Multilanguage:

At the moment the application is only available in English. But with .Net or C#

localization for an application is easy to implement. This potential should be used to provide some more languages like German or French.

• Automatically updates:

Many applications provide an automatic update via download from the

63 internet. Because of the improvability of this application this function would make much sense and the user would have the advantage that she or he never needs to look him- or herself for updates. Found bugs could get corrected before the user gets confronted with them.

• Additional tools from HMMER for Windows:

Only one of the additional HMMER for Windows tools is integrated into GraHMMer. The remaining tools could also be added, the calls and classes are prepared rudimental. Interfaces and functionality must be build and added.

• Performance improvement:

Wistrand, et al., 2005 showed that the performance on using pHMMs could be improved by combining SAM and HMMER. By using the suggestions which were made in their article the performance of GraHMMer could be improved, too.

• Threading:

Some tasks with hmmsearch or hmmpfam could take an amount of time.

During this time the application can not be used. This limitation could be eliminated by using threads. The task could run in a seperated thread and while this thread is not finished the user could start another task or examine an existing result and so on.

• Saving:

All results should automatically be saved into an extra file in the project folder.

• Incompatible options:

Some HMMER subprogram options are incompatible with other options. The usage of such combinations should be suppressed or a message box should warn the user.

64

5. Conclusion

The introductory question was:

If the usage of HMMER 2 would be much easier with a graphical user interface (GUI), which also provides a graphical interpretation of the text based output for a better and easier understanding of the results?

With this work it is proven that it is not absolute essential anymore to be a specialist in using command line based sequence analysis tools. A user-friendly interface was implemented for the usage of HMMER for Windows. It yet provides many

simplifications for the user in the handling of HMMER. Also some additional functions were added, like the automatic saving of some results into files, the display and the division of some text-results in better readable sections plus the easy creation of hmm-databases. The user does not longer need to know how exactly the parameters have to be provided to HMMER, because this application takes this over for her or him. The way the results are displayed makes them clearer and easier to interpret for the user. People who are used to Microsoft Windows™ and graphical applications which run on it now have a tool available with which they will get used to fast.

The planned features are implemented in a large part. Those are the project based handling, the simple accessibility of all HMMER subprograms, a clear display of the available and easy to choose options, containing explanations in tooltip form, display of the text based results and the integration of an online tool43 for the creation of a graphical output.

My personal impressions and experiences are that the realization of such powerful tools with an unclear and large number of input options in a user-friendly GUI is not that easy as I expected.

Furthermore, I was disappointed on the difficulties which arose by trying to port the .NET source code into Mono source code.

As a private finding for future projects I would not plan such a large number of features to implement for a single developer anymore.

Although I am glad for making this experience and gain so much knowledge by creating this diploma thesis.

43 LogoMat-M http://www.sanger.ac.uk/cgi-bin/software/analysis/logomat-m.cgi [Schuster-Böckler, et al., 2004]

xii

6. Appendix

6.1. Literature

6.1. Literature