• Keine Ergebnisse gefunden

The package is built in a modular way. Each algorithm is built separately and a library is generated. The algorithms are built in the following order: MUSCLE, ClustalW, and ClustalΩ. Finally, the libraries are merged into one shared object, which will be used for installation onR. The main make file is simple, see listing 5.16.

Listing 5.16: msa/src/Makevars – for Linux and Mac OS X

1 PKG_LIBS=‘Rscript -e "if (Sys.info()[’sysname’] == ’Darwin’) cat(’-Wl,-all_load ./libgc.a ./libClustalW.a ./libClustalOmega.a ./libMuscle.a’) else cat(’-Wl,--whole-archive ./libgc.a ./libClustalW.a ./

libClustalOmega.a ./libMuscle.a -Wl,--no-whole-archive’)"‘

2 PKG_CXXFLAGS=-I"./gc-7.2/include" -I"./Muscle/" -I"./ClustalW/src" -I"./

ClustalOmega/src" ‘Rscript -e "Rcpp:::CxxFlags()"‘

3

4 .PHONY: all mylibs 5

6 all: $(SHLIB) 7 $(SHLIB): mylibs 8

5.4 Package Build

9 mylibs: build_gc build_muscle build_clustalw build_clustalomega 10

11 build_gc:

12 make --file=msaMakefile --directory=gc-7.2

13 @echo "---"

14 @echo "--- GC ---"

15 @echo "---"

16 @echo "--- Compilation finished ---"

17 @echo "---"

18

19 build_muscle:

20 make --file=msaMakefile --directory=Muscle

21 @echo "---"

22 @echo "--- MUSCLE ---"

23 @echo "---"

24 @echo "--- Compilation finished ---"

25 @echo "---"

26

27 build_clustalw:

28 make --file=msaMakefile --directory=ClustalW 29 @echo "---"

30 @echo "--- ClustalW ---"

31 @echo "---"

32 @echo "--- Compilation finished ---"

33 @echo "---"

34

35 build_clustalomega:

36 make --file=msaMakefile --directory=ClustalOmega 37 @echo "---"

38 @echo "--- ClustalOmega ---"

39 @echo "---"

40 @echo "--- Compilation finished ---"

41 @echo "---"

R uses the properties of PKG LIBS and PKG CXXFLAGS, which are defined in the file Makevars automatically. Therefore, all dependent libraries are defined in Makevars. The shared objectmsa is built automatically, when calling R CMD build msa on the command line. The file, as depicted in listing 5.16, is useable for Linux and Mac OS X operating systems. For building the package on Windows platforms the script, Makevars.win is needed. This file is similar toMakevars. See the relevant code snippet in listing 5.17.

Listing 5.17: msa/src/Makevars.win – for Windows

1 .PHONY: all ./libGC.a ./libMuscle.a ./libClustalW.a ./libClustalOmega.a 2

3 PKG_LIBS=-Wl,--whole-archive ./libGC.a ./libMuscle.a ./libClustalW.a ./

libClustalOmega.a -Wl,--no-whole-archive

4 PKG_CXXFLAGS=-I"./gc-7.2/include" -I"./Muscle/" -I"./ClustalW/src" -I"./

ClustalOmega/src" ‘${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe -e "Rcpp:::

CxxFlags()"‘

5

6 all: $(SHLIB)

7 $(SHLIB): ./libGC.a ./libMuscle.a ./libClustalW.a ./libClustalOmega.a 8

9 clean:

10 cd ..

11 sh cleanup.win

12 cd src

Each library uses its separate make file. For consistency reasons, msaMakefile for Linux and Mac OS X andmsaMakefile.winfor Windows operating systems are used.

5.4.1 gc Library

When testing large input sequences, some memory leaks occur in the implementations of the algorithms. Originally, the multiple sequence algorithms were implemented to run once in a terminal. Hence, the memory is freed completely after the process is terminated. Since msa is running in an R session, no memory will be freed after executing the algorithms. Memory leaks were analysed by using Valgrind (Nethercote

& Seward, 2007). With the help of this tool many memory leaks were removed. Due to the complexity of the algorithms, not all memory leaks could be eliminated. BecauseC andC++do not support garbage collection by default like Java, an implementation for a garbage collector was added. For this purpose, the Boehm Garbage Collector 7.2 (Boehm & Weiser, 1988) is used. Both garbage collectors are built, for C (gcc) and C++(gccpp).

Building the garbage collector is simple on Linux and Mac OS X platforms (see listing 5.18), but rath complicated on Windows platforms (see listing 5.19):

Listing 5.18: msa/src/gc-7.2/msaMakefile – for Linux and Mac OS X

1 all: build_gc 2

3 build_gc:

4 ./configure cplusplus threads=pthreads --enable-shared --with-pic

5 make

6 cp .libs/libgc.a ../

7 cp .libs/libgccpp.a ../

5.4 Package Build Listing 5.19: msa/src/gc-7.2/msaMakefile.win – for Windows

1 INCLUDE_DIR=include

2 PRIVATE_INCLUDE_DIR=$(INCLUDE_DIR)/private 3 AO_INCLUDE_DIR=libatomic_ops/src

4

5 OBJNamesGC=allchblk.o alloc.o backgraph.o blacklst.o checksums.o

darwin_stop_world.o dbg_mlc.o dyn_load.o finalize.o gcj_mlc.o gc_dlopen.

o headers.o mach_dep.o malloc.o mallocx.o mark.o mark_rts.o misc.o new_hblk.o obj_map.o os_dep.o pcr_interface.o pthread_start.o

pthread_stop_world.o pthread_support.o ptr_chck.o real_malloc.o reclaim.

o specific.o stubborn.o thread_local_alloc.o typd_mlc.o win32_threads.o 6 OBJNamesGCCPP=gc_cpp.o

7

8 all: gc72 9

10 gc72:

11 export PKG_LIBS="$(PKG_LIBS) -I$(AO_INCLUDE_DIR) -I$(INCLUDE_DIR) -I$(

PRIVATE_INCLUDE_DIR)"; \

12 export PKG_CFLAGS="$(PKG_CFLAGS) DALL_INTERIOR_POINTERS DGC_DLL -DGC_THREADS -D_CRT_SECURE_NO_DEPRECATE -I$(AO_INCLUDE_DIR) -I$(

INCLUDE_DIR) -I$(PRIVATE_INCLUDE_DIR) ‘${R_HOME}/bin${R_ARCH_BIN}/

Rscript.exe -e "Rcpp:::CxxFlags()"‘"; \

13 export PKG_CXXFLAGS="$(PKG_CXXFLAGS) DALL_INTERIOR_POINTERS DALL_INTERIOR_POINTERS DGC_DLL DGC_THREADS

-D_CRT_SECURE_NO_DEPRECATE -I$(AO_INCLUDE_DIR) -I$(INCLUDE_DIR) -I$(

PRIVATE_INCLUDE_DIR) ‘${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe -e "

Rcpp:::CxxFlags()"‘"; \

14 ${R_HOME}/bin${R_ARCH_BIN}/R.exe CMD SHLIB -o gc72.dll *.c; \

15 ${R_HOME}/bin${R_ARCH_BIN}/R.exe CMD SHLIB -o gccpp72.dll -L. -lgc72 *.

cpp; \

16 ar rcs libGC.a $(OBJNamesGC); \

17 ar rcs libGCcpp.a $(OBJNamesGCCPP); \ 18 cp libGC*.a ../

5.4.2 MUSCLE Library

Regarding the make files for the alignment algorithm, MUSCLE is the most simple algorithm to build. The make file (see listing 5.20) uses more or less the default script for buildingRlibraries.

Building MUSCLE is pretty simple on Linux and Mac OS X platforms, see listing 5.20:

Listing 5.20: msa/src/Muscle/msaMakefile – for Linux and Mac OS X

1 CPPNames = aligngivenpath.cpp aligngivenpathsw.cpp aligntwomsas.cpp aligntwoprofs.cpp aln.cpp alpha.cpp anchors.cpp bittraceback.cpp

blosum62.cpp blosumla.cpp clust.cpp cluster.cpp clwwt.cpp color.cpp cons .cpp diaglist.cpp diffobjscore.cpp diffpaths.cpp difftrees.cpp

difftreese.cpp distcalc.cpp distfunc.cpp distpwkimura.cpp domuscle.cpp

dosp.cpp dpreglist.cpp drawtree.cpp edgelist.cpp enumopts.cpp enumtostr.

cpp estring.cpp fasta.cpp fasta2.cpp fastclust.cpp fastdist.cpp

fastdistjones.cpp fastdistkbit.cpp fastdistkmer.cpp fastdistmafft.cpp fastdistnuc.cpp fastscorepath2.cpp finddiags.cpp finddiagsn.cpp glbalign .cpp glbalign352.cpp glbaligndiag.cpp glbalignle.cpp glbalignsimple.cpp glbalignsp.cpp glbalignspn.cpp glbalignss.cpp glbalndimer.cpp globals.

cpp globalslinux.cpp globalsosx.cpp globalsother.cpp globalswin32.cpp gonnet.cpp henikoffweight.cpp henikoffweightpb.cpp html.cpp hydro.cpp intmath.cpp local.cpp main.cpp makerootmsa.cpp makerootmsab.cpp maketree .cpp mhack.cpp mpam200.cpp msa.cpp msa2.cpp msadistkimura.cpp msf.cpp muscle.cpp muscleout.cpp nucmx.cpp nwdasimple.cpp nwdasimple2.cpp nwdasmall.cpp nwrec.cpp nwsmall.cpp objscore.cpp objscore2.cpp

objscoreda.cpp onexception.cpp options.cpp outweights.cpp pam200mafft.

cpp params.cpp phy.cpp phy2.cpp phy3.cpp phy4.cpp phyfromclust.cpp phyfromfile.cpp physeq.cpp phytofile.cpp posgap.cpp ppscore.cpp profdb.

cpp profile.cpp profilefrommsa.cpp progalign.cpp progress.cpp progressivealign.cpp pwpath.cpp readmx.cpp realigndiffs.cpp realigndiffse.cpp refine.cpp refinehoriz.cpp refinesubfams.cpp

refinetree.cpp refinetreee.cpp refinevert.cpp refinew.cpp savebest.cpp scoredist.cpp scoregaps.cpp scorehistory.cpp scorepp.cpp seq.cpp seqvect .cpp setblosumweights.cpp setgscweights.cpp setnewhandler.cpp spfast.cpp

sptest.cpp stabilize.cpp subfam.cpp subfams.cpp sw.cpp termgaps.cpp textfile.cpp threewaywt.cpp tomhydro.cpp traceback.cpp tracebackopt.cpp tracebacksw.cpp treefrommsa.cpp typetostr.cpp upgma2.cpp usage.cpp validateids.cpp vtml2.cpp writescorefile.cpp RMuscle.cpp

2

3 OBJNames = aligngivenpath.o aligngivenpathsw.o aligntwomsas.o aligntwoprofs .o aln.o alpha.o anchors.o bittraceback.o blosum62.o blosumla.o clust.o cluster.o clwwt.o color.o cons.o diaglist.o diffobjscore.o diffpaths.o difftrees.o difftreese.o distcalc.o distfunc.o distpwkimura.o domuscle.o

dosp.o dpreglist.o drawtree.o edgelist.o enumopts.o enumtostr.o estring .o fasta.o fasta2.o fastclust.o fastdist.o fastdistjones.o fastdistkbit.

o fastdistkmer.o fastdistmafft.o fastdistnuc.o fastscorepath2.o finddiags.o finddiagsn.o glbalign.o glbalign352.o glbaligndiag.o glbalignle.o glbalignsimple.o glbalignsp.o glbalignspn.o glbalignss.o glbalndimer.o globals.o globalslinux.o globalsosx.o globalsother.o

globalswin32.o gonnet.o henikoffweight.o henikoffweightpb.o html.o hydro .o intmath.o local.o main.o makerootmsa.o makerootmsab.o maketree.o mhack.o mpam200.o msa.o msa2.o msadistkimura.o msf.o muscle.o muscleout.

o nucmx.o nwdasimple.o nwdasimple2.o nwdasmall.o nwrec.o nwsmall.o

objscore.o objscore2.o objscoreda.o onexception.o options.o outweights.o pam200mafft.o params.o phy.o phy2.o phy3.o phy4.o phyfromclust.o

phyfromfile.o physeq.o phytofile.o posgap.o ppscore.o profdb.o profile.o profilefrommsa.o progalign.o progress.o progressivealign.o pwpath.o readmx.o realigndiffs.o realigndiffse.o refine.o refinehoriz.o

refinesubfams.o refinetree.o refinetreee.o refinevert.o refinew.o savebest.o scoredist.o scoregaps.o scorehistory.o scorepp.o seq.o seqvect.o setblosumweights.o setgscweights.o setnewhandler.o spfast.o sptest.o stabilize.o subfam.o subfams.o sw.o termgaps.o textfile.o threewaywt.o tomhydro.o traceback.o tracebackopt.o tracebacksw.o treefrommsa.o typetostr.o upgma2.o usage.o validateids.o vtml2.o writescorefile.o RMuscle.o

4

5 all: muscle 6

7 muscle:

8 export PKG_CXXFLAGS="$(PKG_CXXFLAGS) -I"../gc-7.2/include" ‘Rscript -e

"Rcpp:::CxxFlags()"‘"

5.4 Package Build

9 R CMD SHLIB -o libMuscle.so $(CPPNames) 10 ar rcs libMuscle.a $(OBJNames)

11 cp libMuscle.a ../

The make file for Windows is similar to those for Linux and Mac OS X, despite some platform specific options, see listing 5.21.

Listing 5.21: msa/src/Muscle/msaMakefile.win – for Windows

5 all: muscle 6

7 muscle:

8 export PKG_LIBS="$(PKG_LIBS) -L"../gc-7.2/" -lgc72 -lgccpp72"

9 export PKG_CXXFLAGS="-c -O3 -msse2 -mfpmath=sse -D_FILE_OFFSET_BITS=64 -DNDEBUG=1 $(PKG_CXXFLAGS) -I"../gc-7.2/include/" ‘${R_HOME}/bin${

R_ARCH_BIN}/Rscript.exe -e "Rcpp:::CxxFlags()"‘"

10 ${R_HOME}/bin${R_ARCH_BIN}/R.exe CMD SHLIB -o Muscle.dll $(CPPNames) 11 $(AR) rcs libMuscle.a $(OBJNames)

12 cp libMuscle.a ../

5.4.3 ClustalW Library

Similar to MUSCLE, the ClustalW make file is simple. Remarkable in this case, is that we have to callconfigureto set some variables in the right way, see listing 5.22.

Listing 5.22: msa/src/ClustalW/msaMakefile – for Linux and Mac OS X

1 CPPNames=pairwise/FullPairwiseAlign.cpp pairwise/FastPairwiseAlign.cpp fileInput/MSFFileParser.cpp fileInput/FileReader.cpp fileInput/

PIRFileParser.cpp fileInput/RSFFileParser.cpp fileInput/GDEFileParser.

cpp fileInput/InFileStream.cpp fileInput/ClustalFileParser.cpp fileInput /PearsonFileParser.cpp fileInput/FileParser.cpp fileInput/EMBLFileParser .cpp tree/UPGMA/RootedClusterTree.cpp tree/UPGMA/UPGMAAlgorithm.cpp tree /UPGMA/Node.cpp tree/UPGMA/RootedGuideTree.cpp tree/UPGMA/

RootedTreeOutput.cpp tree/Tree.cpp tree/ClusterTree.cpp tree/

TreeInterface.cpp tree/UnRootedClusterTree.cpp tree/ClusterTreeOutput.

cpp tree/RandomGenerator.cpp tree/NJTree.cpp tree/AlignmentSteps.cpp interface/CommandLineParser.cpp substitutionMatrix/SubMatrix.cpp multipleAlign/Iteration.cpp multipleAlign/MSA.cpp multipleAlign/

MyersMillerProfileAlign.cpp multipleAlign/ProfileStandard.cpp multipleAlign/ProfileWithSub.cpp multipleAlign/ProfileBase.cpp

multipleAlign/LowScoreSegProfile.cpp general/OutputFile.cpp general/

UserParameters.cpp general/Utility.cpp general/InvalidCombination.cpp general/DebugLog.cpp general/ClustalWResources.cpp general/

VectorOutOfRange.cpp general/SymMatrix.cpp general/Stats.cpp Help.cpp alignment/Alignment.cpp alignment/AlignmentOutput.cpp alignment/

ObjectiveScore.cpp alignment/Sequence.cpp Clustal.cpp RClustalWMain.cpp RClustalW.cpp

2

3 OBJNames=pairwise/FullPairwiseAlign.o pairwise/FastPairwiseAlign.o

fileInput/MSFFileParser.o fileInput/FileReader.o fileInput/PIRFileParser

.o fileInput/RSFFileParser.o fileInput/GDEFileParser.o fileInput/

InFileStream.o fileInput/ClustalFileParser.o fileInput/PearsonFileParser .o fileInput/FileParser.o fileInput/EMBLFileParser.o tree/UPGMA/

RootedClusterTree.o tree/UPGMA/UPGMAAlgorithm.o tree/UPGMA/Node.o tree/

UPGMA/RootedGuideTree.o tree/UPGMA/RootedTreeOutput.o tree/Tree.o tree/

ClusterTree.o tree/TreeInterface.o tree/UnRootedClusterTree.o tree/

ClusterTreeOutput.o tree/RandomGenerator.o tree/NJTree.o tree/

AlignmentSteps.o interface/CommandLineParser.o substitutionMatrix/

SubMatrix.o multipleAlign/Iteration.o multipleAlign/MSA.o multipleAlign/

MyersMillerProfileAlign.o multipleAlign/ProfileStandard.o multipleAlign/

ProfileWithSub.o multipleAlign/ProfileBase.o multipleAlign/

LowScoreSegProfile.o general/OutputFile.o general/UserParameters.o general/Utility.o general/InvalidCombination.o general/DebugLog.o

general/ClustalWResources.o general/VectorOutOfRange.o general/SymMatrix .o general/Stats.o Help.o alignment/Alignment.o alignment/

AlignmentOutput.o alignment/ObjectiveScore.o alignment/Sequence.o Clustal.o RClustalWMain.o RClustalW.o

4

5 all: clustalw 6

7 clustalw:

8 ./configure; \ 9 cd src; \

10 export PKG_CXXFLAGS="-DHAVE_CONFIG_H -I. $(PKG_CXXFLAGS) ‘Rscript -e "

Rcpp:::CxxFlags()"‘"; \

11 R CMD SHLIB -o libClustalW.so $(CPPNames) && \ 12 ar rcs libClustalW.a $(OBJNames) && \

13 cp libClustalW.a ../../

For building the package on Windows platforms, configure can not be called, because the build environment of Bioconductor does not support all necessary com-mands. To tackle this problem, configure has been executed on a local Windows machine and the generated files are stored for further usage. When building msa, these prepared files are copied into the build folder. The prepared build files are lo-cated in themsa/src/ClustalW/windowsfolder.

5.4.4 ClustalΩ Library

Building the ClustalΩ library is more complicated, than building the other algorithms.

By default, ClustalΩ assumes, that argtable2 (Heitmann, 2011) is preinstalled. To avoid a dependency on this external library, the source files have been included into themsa, see listing 5.23.

Listing 5.23: msa/src/ClustalOmega/msaMakefile – argtable2

5.4 Package Build

10 argtable2/argtable2.c argtable2/arg_end.c argtable2/arg_rem.c argtable2/

arg_lit.c argtable2/arg_int.c \

11 argtable2/arg_dbl.c argtable2/arg_str.c argtable2/arg_file.c \

Additionally, ClustalΩ uses many exit statements. To avoid crashing the R ses-sion, theexceptions4c (Calvo, 2013) library is included. We therefore replaced all exitstatements with an accuratethrowstatement.

Listing 5.24: msa/src/ClustalOmega/msaMakefile – exceptions4c

8 CPPNames=\

9 exceptions4c/e4c_lite.c \

Due to some memory leaks, ClustalΩuses the Boehm Garbage Collector as well.

On Mac OS X and Linux platformsconfigurescript is called, see listing 5.25.

Listing 5.25: msa/src/ClustalOmega/msaMakefile – for Linux and Mac OS X

44 clustalomega:

45 ./configure $(CONFIGURE_FLAGS); \

46 export PKG_LIBS="$(PKG_LIBS) $(SHLIB_OPENMP_CFLAGS)"; \

47 export PKG_CXXFLAGS="$(PKG_CXXFLAGS) $(SHLIB_OPENMP_CXXFLAGS) fPIC -DHAVE_CONFIG_H -I. -DCLUSTALO -DCLUSTALO_NOFILE -DDEFAULT_FILTER=90 -I"../../gc-7.2/include" ‘Rscript -e "Rcpp:::CxxFlags()"‘"; \

48 export PKG_CFLAGS="$(PKG_CFLAGS) $(SHLIB_OPENMP_CFLAGS) fPIC

-DHAVE_CONFIG_H -I. -DCLUSTALO -DCLUSTALO_NOFILE -DDEFAULT_FILTER=90 -I"../../gc-7.2/include" ‘Rscript -e "Rcpp:::CxxFlags()"‘"; \

49 cd src; \

50 R CMD SHLIB -o libClustalOmega.so $(CPPNames) && \ 51 ar rcs libClustalOmega.a $(OBJNames) && \

52 cp libClustalOmega.a ../../

Whereas on Windows platforms preconfigured files are used, which are stored in the subfolderwindows, see listing 5.26.

Listing 5.26: msa/src/ClustalOmega/msaMakefile.win – for Windows

35 all: clustalomega 36

37 clustalomega:

38 cp windows/src/config.h src/; \

39 cp windows/src/clustal-omega-config.h src/; \

40 export PKG_LIBS="$(PKG_LIBS) -L"../../gc-7.2" -lgccpp72 -lgc72"; \ 41 export PKG_CXXFLAGS="$(PKG_CXXFLAGS) DHAVE_CONFIG_H I. DCLUSTALO

-DCLUSTALO_NOFILE -DDEFAULT_FILTER=90 -I"../../gc-7.2/include" ‘${

R_HOME}/bin${R_ARCH_BIN}/Rscript.exe -e "Rcpp:::CxxFlags()"‘"; \ 42 export PKG_CFLAGS="$(PKG_CFLAGS) DHAVE_CONFIG_H I. DCLUSTALO

DCLUSTALO_NOFILE DDEFAULT_FILTER=90 I"../../gc7.2/include" -lgccpp -lgc ‘${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe -e "Rcpp:::

CxxFlags()"‘"; \ 43 cd src; \

44 ${R_HOME}/bin${R_ARCH_BIN}/R.exe CMD SHLIB -o ClustalOmega.dll $(

CPPNames) && \

45 $(AR) rcs libClustalOmega.a $(OBJNames) && \ 46 cp libClustalOmega.a ../../