• Keine Ergebnisse gefunden

Occurrence of che and fla genes in archaeal genomes . 138

7.2 Results and Discussion

7.2.7 Bioinformatic analysis

7.2.7.1 Occurrence of che and fla genes in archaeal genomes . 138

To have a reference for such co-occurrence comparisons, an exhaustive search for or-thologs of Che and Fla proteins in all completely sequenced archaeal genomes pub-lished until October 2007 was done. Since homology searches using Psi-Blast were not sufficient to comprehensively identify homologs of some proteins (especially small proteins with rather low conservation like FlaC, D, E, F, and G were problematic), and did not allow the discrimination between orthologs and other homologs for other proteins (e. g. CheY and other response regulators), a combination of different meth-ods was used for ortholog identification (see 2.6.7 for details). The resulting table of orthologs is shown in Supplementary Table S4.

As observed before (Klein,2005), noche genes were detected in any archaeal genome without fla and flagellin genes. In contrast, several archaeal species contain fla genes and flagellins, but noche genes, leading to the conclusion that these species are motile, but their motility is not controlled by a Che system. che genes have not been detected in a crenarchaeal genome. If che genes were found, there is always the whole set

consisting of cheA, cheB, cheC, cheD, cheR, cheW, and cheY present. An exception isMethanosarcina barkeri, which has lost the cheC gene (the genomic position where cheC is located in the other Methanosarcina species still contains remnants of the N-terminus of cheC). Several archaeal species contain multiple copies of various che genes; the front-runner is Methanospirillum hungatei with 35 genes classified as che orthologs by the used method. A noteworthy finding is a CheA-CheC fusion protein detected in the genome ofM. hungatei.

If in a crenarchaeal genomefla genes were identified, there were at least one flagellin, flaG,flaH, flaI, andflaJ, and usually alsoflaF (except inAeropyrum pernix) present.

The flaG gene in the sequenced strain of Sulfolobus solfataricus is interrupted by a transposase, but this insertion is neither stable under laboratory conditions nor is it found in a closely related strain (Szabó et al., 2007). In euryarchaeota, there is additionally always a flaD/E gene present, if they possess fla genes. flaD and flaE genes could not be discriminated by the applied method, so they were merged into one ortholog group. Two versions of flaD/E genes can be distinguished: The species of the classes Methanomicrobia and Archaeoglobi code for a version of a FlaD/E protein (referred to as FlaD/EM in the following) with only low homology to the FlaD and FlaE proteins found in other euryarchaeal genomes (Ng et al., 2006; Desmond et al., 2007). A special case is Methanococcoides burtonii (class Methanomicrobia), which possesses both the versions of FlaD/E proteins, each in a complete fla gene region. This is an indication of lateral gene transfer (LGT) of a wholefla gene region.

Such an LGT has also been proposed in a detailed study of the phylogenomics of the archaeal flagellum (Desmond et al., 2007). In genomes with flaD/EM (or in the case ofMethanococcoides burtonii the genome region withflaD/EM), noflaC gene, orflaC domain fused to a flaE gene, was found. In all other euryarchaeota with fla genes, FlaC is either coded as separate protein or as domain fused to an FlaD/E domain.

Like the che genes, also the fla genes and flagellins are present in multiple copies in some genomes.

7.2.7.2 Only few findings for OE2401F

OE2401F was formerly annotated as “phycocyanin alpha phycocyanobilin lyase ho-molog” due to homology with this protein from cyanobacteria and red algae. The annotation was now changed to “conservedche operon protein”. By Pfam, it is

classi-fied as a HEAT_PBS or HEAT family protein. These proteins are predicted to contain short bi-helical repeats. Several, but not all members of this family are thought to have lyase activity (Finnet al., 2008).

For OE2401F, homology search turned out to be difficult because the repeats led to a high number of non-significant matches. Thus it was not possible to identify a reliable set of orthologs from other organisms and no conclusions about co-occurrence of this protein family with che or fla genes could be drawn. Close homologs were identified in the che and fla gene regions of the halophilic archaea N. pharaonis and H. marismortui. The idea that OE2401F and OE2402F act cooperatively to perform their function is also supported by the genomic location of OE2401F and its homologs in the haloarchaeal che gene regions, where it is always adjacent to a DUF439 pro-tein. Additionally, HEAT-like repeat proteins are present in all sequenced haloarchaeal genomes (the above mentioned, H. walsbyi, and H. salinarum) in other genomic con-text. For none of these proteins any functional knowledge could be obtained. In the chemotaxis gene regions of other archaeal species, no homologs of OE2401F were found.

Hence it remains to be investigated if these proteins are restricted to haloarchaea, or if similar proteins, coded elsewhere in the genome, play a role in taxis signalling also in other archaeal species.

Although homology search revealed no correlation between OE2401F homologs and che or fla genes, the examination of the archaeal fla gene regions resulted in a note-worthy finding. Adjacent to the flagellin genes in several archaeal genomes (several Methanococci and Thermococci), a protein belonging to the Adaptin_N family is located. Adaptin_N belongs to the same superfamily as HEAT/HEAT_PBS, the Ar-madillo repeat superfamily (Finnet al.,2008). If the Adaptin_N proteins adjacent to the flagellins fulfil a similar function as OE2401F and its homologs and, if so, which function this is, remains elusive.

7.2.7.3 OE2402F and OE2404R belong to a family of unique archaeal Che proteins

OE2402F and OE2404R, both annotated as conserved hypothetical protein, are ho-mologous to each other and belong to the protein family DUF439 (Finn et al., 2008) and the cluster of orthologous groups COG2469. DUF439 is described as “archaeal protein of unknown function”, COG2469 as “uncharacterized conserved protein”.

R D C A

B Y 439 W MCP

D C A

B Y 439

D C

Y W B flaB A R HEAT 439 S6 439

HEAT 439

439 R D C C A

B Y W

R D C A

B Y 439 W MCP

D C R A

B Y

MCP W

D R

A B

Y MCP

W

MCP W 439 Y B A C D R

D C A

B Y 439

W B A D MCP R C C Y 439

R D C A

B Y 439 W MCP

D R C

A B

Y MCP

W

439 Y B A/C D C

439 MCP

D C C A

B Y R MCP

W

W MCP R Y B A C C D MCP 439

W MCP R Y B A* A* C C MCP D 439

D C A

B Y 439

Archaeoglobus fulgidus

Candidatus Methanoregula boonei

Haloarcula marismortui

Halobacterium salinarum

Methanococcoides burtonii Methanococcus maripaludis

Methanococcus vannielii Methanoculleus marisnigri Methanosarcina acetivorans

Methanosarcina barkeri Methanosarcina mazei

Methanospirillum hungatei

Natronomonas pharaonis

Pyrococcus abyssi

Pyrococcus horikoshii

Thermococcus kodakaraensis

uncultured methanogenic archaeon RC-I

Y C D

//

MCP B A R HEAT 439 439

W B A D MCP

//

439 Y C C R

//

Figure 7.9:Organisation of chemotaxis genes in known archaeal genomes. Known chemotaxis genes are shown in blue. Genes coding for proteins of the family DUF439 are shown in light blue, genes coding for HEAT domain proteins in cyan. Grey indicates that, where no name is given, the function of the coded protein is unknown, or the protein is probably unrelated to chemotaxis (S6:

30S ribosomal protein S6e) A // sign indicates separated genome regions. The asterisk indicates that this protein is interrupted by a frame-shift mutation.

OE2402F NP2166A rrnAC2209 rrnAC3231 rrnAC3221 OE2404R rrnAC2213 NP2162A Memar0946 Mhun0107 Mboo1339 MA3069 MM0331 Mbur0358 AF1043 PH0494 PAB1338 TK0641 MMP0934 MmarC70181 MmarC50741 Mevan0255 MJ1615 LRC568 OE2402F NP2166A rrnAC2209 rrnAC3231 rrnAC3221 OE2404R rrnAC2213 NP2162A Memar0946 Mhun0107 Mboo1339 MA3069 MM0331 Mbur0358 AF1043 PH0494 PAB1338 TK0641 MMP0934 MmarC70181 MmarC50741 Mevan0255 MJ1615 LRC568 OE2402F NP2166A rrnAC2209 rrnAC3231 rrnAC3221 OE2404R rrnAC2213 NP2162A Memar0946 Mhun0107 Mboo1339 MA3069 MM0331 Mbur0358 AF1043 PH0494 PAB1338 TK0641 MMP0934 MmarC70181 MmarC50741 Mevan0255 MJ1615 LRC568 OE2402F NP2166A rrnAC2209 rrnAC3231 rrnAC3221 OE2404R rrnAC2213 NP2162A Memar0946 Mhun0107 Mboo1339 MA3069 MM0331 Mbur0358 AF1043 PH0494 PAB1338 TK0641 MMP0934 MmarC70181 MmarC50741 Mevan0255 MJ1615 LRC568

1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1

72 72 71 71 71 68 69 71 55 57 55 5858 58 59 83 82 83 87 87 87 87 88 77 - MSESEYK I ADGSGKF LQA VKDGRRMKDAQWSK- GR I L L SN KR I VLAGSEGKRN L P- - - - L SEVQ- - - GL SGRHD VNQT VA - MSKSERKL LD T TGRF TQVVRDGT KLND I SWTD - GR I L L SN RRL VLAGTDGKQT I K- - - - L TN VD - - - K I EGRYD VN RT I A - MSDGEHA L VD T KGKF VQVVSDGRKRND I EWL P- GR I L L SN KRL VLA TNDGKRT I P- - - - L SKVS- - - SVT - A SQMNQPLA - MSD T EKK I AD T KGQF LQA VSQGQRL TDA EWRN - C R I VL T T ERVA L LGDD - KRQ I S- - - - L TD I D - - - R I AD RFD VNQQSA - MSGD ERKL VD T SGD FQYVVRDGD T VTD PKWRS- C RL I VTN KRL I LA TNGSKQP I P- - - - H SS I T - - - L PSN PDD L I PD - G - - - MSESA I AD F VSSF I - - PD TA TH - - - VEPVR- GRVVMSKRR I VLAAD - D EKT T I PLNGVFD VQ- - - H ETA PGD LA RF F E - - - MSESV I AD F VGKFN - - SEVAGR- - - GD P I R- GRVVL SQKRL VLAA SEDD KL T I PLD S I FD I A - - - VGQVPPD LGD F FD - - - MKEEA VAD F VGRFA RN PEGGVQ- - - GEPQS- C RA VMSKKRL VVAGDGD ER I T VPL SRVVD VV- - - VGN VPPN L RD L FD - - - MKSVP I KVEH - - - EGKWI P- T TMG I A ED RF R I DA PLN - QE I PYK- SVVD L - - - EEK- - KNQV I I TA - - - MA EVPA KL EK- - - GGSWVT - SR I D I GNDG I VL KD PWN - VT VSYR- S I VD L - - - QKRGQMI T L L VTM - - - MT EVPL KVEH - - - DGKWVV- VKAA VGED R I T L PA PVD - KE I L F K- S I FD L - - - EEK- - KSVL VL T V - - - MT E- KVH L RA S I KT YDGGWVD - VEL VVTDNN L V I GK- - - - RN I PL K- E I ED L - - - EEVEVEG I SC VQ I - - - MTD - KVH L RA PVK I YDDGWVD - VEL VVTD SSF V I GK- - - - RN I SL R- E I ED L - - - ED VE I EGVNC I Q I - - - MSE- KVH LQT PA KL YNGQWVD - A EL I L T EQT LC I GN - - - - I K I QVN - Q I ED I - - - GN I DMEGVGG I R I - - - MSDGERVL K I PVEYF EDGWKK- GEA V I T KDA I SFAG- - - - KT I RF K- E I QD L - - - ERL KH EGRDA I R I MP I F EA RVKVG I SSSWVT SRKVSWRDA I AQ I ES- D R I VVKYL KMGEVVGED - - SF PF S- A L I D LGVR I P- - - - D EL KLN PEKDH FG I KF Y I MP I YEA RVKVG I SSSWVT S- KVSWRDA LAQL ES- D R I VVKYL RMGEVTGED - - SF PF S- A L TD I G I R I P- - - - D EL KLN PEKDH FGL KF YV MT I AQVRVKA T I QSTWKGST S I KWRDAMA YL EN - D R I T L RYL RMGQVVGED - - VF PF S- SL I D I G I K I P- - - - D EL KLD PQL PH FGMKF YV - - MA KSK I I A T F KGKGV I VT PYT L RD PF T KWKN - FQMQL YED S I EF I F EN KV I EA SFD - Q I DDMGF EL PRKA L E I A KNN LDD I SVYGSF KL - - MA KSKL VA T FNGKG I I A T PYT L RD PF T KWRN - L K I EL YED SL EF I F EN KK I EA SFD - Q I DD I GF EL PRKA L E I A KNN LDD I S I YGSF RL - - MA KSKKVA VF EGKG I I VT PYT L RD PF T KWRK- L K I EL YED YL EF I FDN KT I T VA F E- H VDD I GL EL PRKA L E I A KNN LDD I ST YGSF RL - - MA KSKQ I A T FNGKG I I VT PYT L RD PF T KWRK- LQ I I L YED S I D FN FDN KSF E I D F K- D I ED I GF EL PRKA L E I A KN T L ED VSVYSA I RA - MI D KSSE I A RF SGKG I L I T PKT L EKPL L KWEK- L E I I L YKD K I VF EF VD KT I EVGVE- D I ED VGA EL PKKV I D I A KST L ED I T YH SS I I I - - MEN ET VLA KTN VD FMF PPKPEGPFD ERAWVKNGL L T I TN LN F KL STA TA TQA I PL K- GL ED VDMR- - - MAGERH I L VMT RY 73

73 72 72 72 69 7072 56 58 56 59 59 59 60 84 83 84 88 88 88 88 89 78

146 144 143 143 144 143 144147 123 125 123 124 124 124 124 169 168 169 176 176 176 176 174 146 KVGN YVS I RMTN ESVML VSLGDN - - T ET F ESKL YGA L LDQT ELQVKH PA VEGGVVTD - EQF ERA R I KVD ES- - - EL SVAM QVTD YVS I RVG- TD VF L VST KE- - - SESF ERT LH KT I LDGE I VL I KH PA I EGGVVQD - VSWRKA R I KVD ED - - - A VN LA V QVD SY I KVQAG- RN VT L I SA KK- - - AD EFQEKL YST L LDQT VVL VNH PA VKGGVVQD - GGWEKGRL KLDGD - - - C I N LA I GVSD YVA L YVG- ED V I L VSA SD - - - HGT F ETD F YRA SLDGA I VL VQH PA L KGGVVQS- A EWT KGRL KVTD E- - - A L KLAM GTGGA TA L EVG- NN VL L VD T PN - - - LDD FQT EYVRA T LQGEV I LA RQQA VVGGVVQEDA EWSKA RFQLDDD - - - I I RLQF D T VT I A YEHD E- D RH VA V I EGGGST VD RF VT L VF KA L VHGT T VYA KH PA RRGGR I TD - QPF EKGN LA L SPG- - - D L T I TG ST VT I A F EH RG- KRL VAA I EAGD EK I EKFGT VL F KA I I NGT ET T VKERA RVGGRVT E- ESF KTA KL F L T PG- - - N VEF RR A T VT VGYKTD EGT VET VL I EGDD STMSKFQT VL F KC L LNGT KA L VKH PAQLGGRVTD - SPVRKA K I S I ESK- - - QVRF KT GA TGEGVYR I A SVEKVL L VL KK- - - F I I TQA SA YRLNA F FMSPA I RGGVL VQNAQWEKGA I T VMKT - - - G I WF I PD KT ERPYK I A SVEKVL T I L T K- - - K I I MACNA YR I MAH FMSPA I RGGVL VTDA KWEKGA I A VL KT - - - G I WF V KGN PDA I I R I A SVEKVL I VL KR- - - L I LA SCNA YRLMA YFMSPA I RGGVLA KNAQWEKGS I VVVKS- - - G I WFA KKED K- - V I L RF PPN LHQQVF K- - - F I A FN L KAD KFA VF F L ESA T VGGVVSSSA RWEKGYF SVTD E- - - GFWF L KKESK- - I VLQL PKN LHHQVF K- - - Y I A FN L KAD KFA VF F L SSA T VGGVVSSDAQWEKGYF SVTD E- - - GFWF L KSNGE- - I I I RPPEKLMPQVF R- - - F LA FN L KSD RFA VYYLD SA T VGGVVT SGSQWEKGYF SVTD E- - - GFWF I KKDAD - - YF I D FG- SRQAQ I F R- - - YLA FN L KSD RFA VYF L SPA T RGGVVVSD SKWEKGYL S I TD E- - - A LWF L PGRGEL L V I F T I EEN L L I YD EKK- - F SEF VH KVF EVL I NGKT VMLQLA R I I GGA VNMESKWEEGWL RV I KVKSA R- - - TQKT ERS I VV I I K PGRGEL L V I F T I EEN L L I YD EKK- - FA EF VH KVF EVL I NGKT VMLQLA R I VGGA I NMESKWEEGWL RV I KVKSA R- - - TQRT ERSVVV I TQ PGSGEKT L VL T I GSN L L I YD EKA - - F KT F I H RVF EVL I NGVKVKL L LA RMRGGA LNMDA KWEDGT L R I VT VRSVR- - - KN RRERN I I VL T S N PPD ED KF S I GFA PEA S I YGET T - - I NA F L KKL FQQL LN KKE I KLQYA R I VGGSVD I A SEWEDGC F VFA KKPVKKGVSV I EEMVLA VA VT S SPPD ED KF S I GYA PEA S I YGEPT - - I NA F L KKL FQQL LN KKEMKLQYA R I VGGSVD VDAQWEDGYL VFA KKPVKRGVSV I EEMVLAAA VT S SPPD ED KYS I GFA PEA S I YGEPT - - I NA F L KKVFQQF LN KKE I KLQYA R I VGGSVD VSSEWEDGYL VYA KKPVKRGVSV I EEL VLA VA VT S KPKD ED KFA I GFA PEA S I YGEPT - - I NA F L KKF FQQ I LN KKD I KLQYA R I VGGSVD VN SNWQDGN L VFA KKPVRKGVSV I ED L VLAAA VT S KSKEFGN VMVGFA PET S I YGKA P- - I DN F L RKL F Y I L LN KKEVK I L YN - - - AGEN SEN T KWENGF L T F I KKR I KDGL VT K I EYRL VVE I LD D EGKF KSYA L SA SPHA L ET L RKF - - - L I QF I SDA F KT S I YY I SPA SRGGVVL TN VAWD KGL L LA TQK- - - S I WF I 147

145144 144 145 144 145 148 124 126 124 125 125 125 125 170 169 170 177 177 177 177 175147

217 214214 213 215 214 216 219 193 197 194 195 195 195 195 255 254 255 266 266 266 266 242219 SNGSF VSVELDD VGSA EAA SL EVNGD T KPVL KVEH T VN RD T SVQT YFA TD SH TA S I L ESL L T KEA EKSQGS EDGSF VQ I E I DD VGT VERN ERQVKSKERPVL EA EH T ED ET SVET YL SGN RKHC SVVESVL KRGA EQNA SS A SGT F VELD I DD VGT VEA KEKT I RGD ERPL L EVEH T I E GT SVETH I TGT PRH VSL I EGL VRQGEQRN I ADD ADGQA VV I D RAD I GD LA VEEKQVSGEERT V I QVEH SEE D I SVETH LAGEEFHA T VL RTML EESA EQNQAD PGGKSMSF E I ED VGT I ETGT ST VMGQEREV I EVEH TDNQD RSVETH I SGMDHH T RA L KT L F T RV I ED REDD DAD T T I D L ST VSH F ERVD REVNGSN KQL L SVRHMGSTGP I T T ELA L SSGRKMN L LGRY I RLQYTH L KQELA SDGSFN VD L KT VSD FD RN T RE I NGKGRPVL T VRHMKDGTAMT T LAAMA SN RKMS I LGRYL RREYA ELMEE I E SGDN F T I D I TN V I D F ERT ERA PDGESRPT L VVKHADDGQVA T SL VSPA SSRKLN L LGRYF R I EYSEL LN EVG SQEKQVC I PLD EVTG I EL T SRE I QEKN LD VVK I DH L SEN EL VT SF VLC PL T T LQVL YN F L KEAAHD T EVS SQD KQ I SVPL KEVA SL EL T KRELQKKKLD V I K I DH L EGGEVVT SF I LC PL ST LQVL YN F L RDA T KDMDMKGS SAA KQ I C VPL PD VAA I EL T KRD VQGKQTD VVR I DH VESGEVVSSL VLC PL ST LQVLAN F L KDA T KG I DMAG SSKNQKR I P I EN LGSVKTD L RN VGGKQRKVL VL SH VEKSSVVT SL VFC PEST L EML EGYLQRL F EKH KPA I SA RNQKR I P I EN LGSVKTD F RN VGGKQRKVL VL SH VEKSN VVT SL VLC PEST L EML EGYLQRL F EKH KPA I SPQKQQK I L FDN VGT VSKD VRN VGGKQRKVL V I SKVEDGQVA T SL LMC PET T L EML ENH LN L V I KSH KPQ I SSSKQ I R I S I QN LGSVEKD I RT VGKKQRVVL VVTH VENGEV I T SF VLC PET TMDMLQEY I QN F I EKSKPKE -D KRP- - VS I F S-D L E-D I E I EEV-DMNGKRVRAWK I RH FH I -DQSVT SYL Y I P-D KQTQL YVL RYL - - - L KYN PA I MEF I MKVS-D-D F PT L KSEFQE D KRP- - VS I F SD L ED I E I EEVEMNGKKVRAWK I RH FH I NQSVT SYL Y I PD KKVQL F VL RYL - - - L KYT PSAMEF I I K I ADD F PT L KSEFQE EGKP- - VPL F SDMED LD I EE I EMDN KKVEAWK I KH F YEKESVVSYL F VED KKVRL Y I L RYL - - - L T YRKD YVEL L I KA SEEF PT I KA EFQE GEKPKVYD L FNN I ESVSL ET KK I D EED KEVL E I KQL RGGET VN SY I H I PST KL - L YVL RY I SKL T KYHN T I KSL L PKSEDD LD SEMA VESW GD KPKVYD L FNN I ES I SL EKKK I DD ED KD VL E I KQL RGGET VN SY I H I PST KM- L YVL RY I SKL T KYHN T VKSL L PKSEDD LD SE I A VESW GEKPKVYD L F TNMESVSL EKKK I DD ED KEVL E I KQL RGGET VN SY I H I PST KL - L YVL RY I SKL T KYHN V I KSL L PKSDA ELD SE I A VESW GD KPKVYD L F TN I ESVS I EKKKVGEEEQEVL E I KQL RGSET VN SY I YLQS I KM- L YL L RY I SKL T KYHN T VKN L L PKSED EFD SE I A VESW N ED SK I YD I F SN I KD VE I EEKD VDGE I EPVL K I LQVKDGKD I I SYL YT KD KKVRL F I L RYMV I L LD YK SKD KQVRVGLD S I T K I KREA RKMGGKD RYVL S I D YF EKN ESL SS I VLC PDN TMD L L ERYL TD LMEKYN SLGED -218

215 215 214 216 215 217 220 194 198 195 196 196 196 196 256 255 256267 267 267 267 243 220

290 287 287 286 288 287 292 293 268 273 269 266 266 266 269 341 340 341348 348 348 348 255 297 VEL SET EKRVLMA L YSGVS SF E I PD F LGMD VD EVES I F ERL I EVD VL EEVRKRREVTMKT RGRN I A SEA I N EE I D L SKD EEEVLMA L YSGVS PF EVPEF LDMD PDA VEE I YERL I D LD VLQEVR I RREVA L KPRGRN I A SESMN SQ VD L SD KETQVLMA L YSG I S PF K I PEF VDME I EEVED VYD RLMESD I L EPVRT RREVQL EA RGRS I A SDAMADQ LD L SST EKRV I MA LH SGVS PFD I PN F VG I D VEKT EE I FD RL I ELD V I SVL RERT EVN L T T KGRRVAGERMGEQ YEL SEMESQVLMA L YSGVS PF EMSD F VGT T PD EVEE I YQKL LD VGA VD EVRVRT EVA LNAQGRNMA SEAMSEK D VT L T SEE I EA L VA I YSSG PNA SLAA VLGVDA SRVTML LND L I EKEL VTD DDG I A L T SLGRAA VSEH I ED VN L D VD L T KD KKEVL VAMYSTGDMDGMPLA S I LGKD SSQVSMI LQD LAADGL VQDG SDGPT L T PTGKVVA SRH L ED VNA Q I D L SESEKRVL VT I YA TG GD I D F KN VLDGSAAQA TN VVN SL REKGL I EEE PTGL SL T SHGQVVVSQRL ED VN I EE I D PL TGQVAML VYSGMD S SA I ENML KL SH KD LD V I YEKL LGSGLA EVL YVRKEVQL T PKGVRY I SESVKSPLD ELDQLDAQTAQVAML I YSGMD T KS I ENML SL PPEELNA I YET L L KL KL VD VVMVRKEVQL T PKGVRY I TDA L KPPS T ELDA I DQQVAML VYSGMD S HA I ENMLN I PH KQLD E I YD R I I KLG I A EVT I I RREVQL T T KGVRY I SDA T KTQTN T L SED EMQ I L T L I YSGLD F A S I EN I VGMSTD ELN T YYD RL VD SGLA KVVK I RKE I EL T PYGVSMVD K I SKR KL SED EMQ I L T L I YSGLD F A S I EN I AGMSTD ELN SYYD RL VD SGLA KVVK I RKE I EL T PHGVSMVD K I SKR EF TDMEKQ I L T L VYSGLD F VS I ENMI G I T TN ELN EYYD RL VD SGLA K I VKVRKEVEL T PRGVTMVGD I PN L KL SEVEEQ I L TMVYTGVD S VSVES I LGVT T EELN KMYD RL VN LGLA RVVK I RKE I EL T PRGVA L VSE I MKKAA R -I MEKE -I KEL EA LD EMEKQ -I L VA L YSG -I N P- - L ELHQF LGVSEKE -I EE -I YD RM-I D KGL L K -I VM-I RK -I VD L TN EGRK -I VN KL L KYGL LMEKEL KEL ESLD EMEKQ I L VA L YSG I N P- - L ELHQF LG I T EKE I EE I YD RMI D KGL L K I VMI RK I VD L TN EGRK I VN KL L KYGL VSM-EL ERVSM-EL KVSM-ELGGLD EMEQQVLMA L YSGMD P- - L T LH EMFG I SEKE I ED I YD RL I D KGL L KL VMI RKVVD L T RDGRKL VN KLMKYNMGVMSGD KL KN EVEQLA PEEQE I L TA I YTG I T S L EL PGMMGMD I D EVEKVL EKL I DQGF LD L VR I RKETD L T EKGRA VTN F I I TN F SGD KL KN EVDQL T PEEQEVLAA I YTG I T S L EL PGMMGMD I D EVER I L EKL I D RGF LD L I R I RKEAD L T EKGRA VTN F I I TN F SGD KL KN EVD KLA PEEQE I LAAMYTG I T S L EL PGMMGMG I D EVEKVL EN L I D RGF LD L VR I RKETD L T EKGRA VTN F I I TN F SGD KL KT EVEKLA PEEQEVLAA I YTG I T S L EL PSMMGMG I DD VEK I L EKL I DQGYLD L I R I RKETD L T EKGRA VTN F I I TN F Y I G I L RYLQET VE -- -- -- -- -- -- -- -- -- QKL TD I EGQVA T L I YSGVD S-- -- N T I QSMLN I DN KTMDQYYDN L L KLGMAN VVRVRREL EL T PKGVKF VTDMMSN F T T EKK

Figure 7.10: Multiple alignment of the members of the protein family DUF439. The species are: OEHalobacterium salinarum, NPNatronomonas pharaonis, rrnHaloarcula marismortui, MemarMethanoculleus marisnigri, MhunMethanospirillum hungatei, MbooCandidatus Methanoreg-ula boonei, MA Methanosarcina acetivorans, MM Methanosarcina mazei, Mbur Methanococcoides burtonii, AF Archaeoglobus fulgidus, PH Pyrococcus horikoshii, PAB Pyrococcus abyssi, TK Ther-mococcus kodakaraensis, MMPMethanococcus maripaludis S2, MmarC7Methanococcus maripaludis C7, MmarC5 Methanococcus maripaludis C5, Mevan Methanococcus vannielii, MJ Methanococcus jannaschii, LRC uncultured methanogenic archaeon RC-I. Colours are according to the ClustalX colouring scheme. The boxes point to peculiarities of the second DUF439 protein of the haloarchaea.

Homology searches have shown that no members of the family DUF439 can be found outside the domain archaea, and among the archaea, the presence of genes coding for such a protein strictly correlates with the presence of che genes (see Supplementary Table S4). The only exceptions are Methanocaldococcus jannaschii, which does not possesche genes but a DUF439 homolog, and Methanosarcina barkeri, which has che genes but no DUF439.

Examination of the genomic context revealed that the genes coding for DUF439 proteins are always located in the chemotaxis gene regions (Figure 7.9). The excep-tions are Methanocaldococcus jannaschii, of course, and two of the four paralogs in H. marismortui. In 10 of 17 species the DUF439 protein is adjacent to CheY, which supports the interaction found between these proteins (Dandekar et al., 1998).

The only archaealche gene regions without DUF439 homolog are theche2 regions of the Methanosarcina species. InMethanosarcina barkeri this is the onlyche region, as this species does not contain the part of the genome where theche1 region inM. mazei andM. acetivorans is located (Galaganet al.,2002; Deppenmeieret al.,2002; Maeder et al.,2006). Theche region of M. barkeri is special in that it has lostcheC, which is present in all other archaealche regions, so it might be not functional at all. For none of the Methanosarcina species flagellar motility was observed (Garrity et al., 2001), although they probably have this capability since their genomes contain flagellins and a complete set of fla genes (seeSupplementary Table S4). If the Methanosarcina che2 region plays a role in controlling flagellar motility remains to be elucidated.

A multiple alignment of all members of the family DUF439 revealed only few con-served residues and several weakly concon-served regions (Figure 7.10). Since no concon-served motif could be detected the multiple alignment gave no hint on the function of the proteins. It is noteworthy that the protein from Methanocaldococcus jannaschii (no Che proteins) is less conserved and truncated at the C-terminus while this is well con-served in all other species. Hence it is likely that this protein is either non-functional or fulfils a different function. The presence of a DUF439 protein in the genome of M. jannaschii, while che genes are absent, can be explained by two scenarios: Either this gene is the remnant of a formerche gene region which was lost. Since the DUF439 protein is located at the boundary of the che gene region in the other Methanococ-cus species (Figure 7.9), such an incomplete gene loss would have been possible and could also explain the C-terminal truncation. Alternatively, this gene could have been gained by horizontal gene transfer.

rrnAC3221 rrnAC3231 NP2166A rrnAC2209 NP2162A rrnAC2213

OE2404R

PH0494PAB1338TK0641

Mevan0255MMP0934 MmarC50741 MmarC70181

MJ1615

Mbur0358 MM0331 MA3069 AF1043

Mboo1339 Mhun0107 Memar0946 LRC568

OE2402F

Figure 7.11:Phylogenetic analysis of DUF439 proteins. Unrooted phylogenetic tree by neighbour-joining, calculated from the multiple alignment shown in Figure 7.10. Species can be derived from the prefix of the protein name as ex-plained in the legend ofFigure 7.10.

Two or more copies of DUF439 pro-teins were only found in the motile haloarchaea H. salinarum, N. pharaonis, and H. marismortui. All three species contain a second homolog in or adjacent to the che gene region. These second ho-mologs lack several residues conserved in all other proteins of the family DUF439 (see boxes in Figure 7.10). Hence they fulfil probably a different function than the “main” DUF439 protein. This is con-sistent with the phenotypic results ob-tained for the deletions: The deletion of OE2404R resulted, other than the dele-tion of OE2402F, only in a weak

pheno-type. Phylogenetic analysis (Figure 7.11) revealed that the second homologs in the che gene region of the haloarchaea (OE2404R, NP2162A, rrnAC2213) form a separate branch in the phylogenetic tree. That means that they either arose by a gene duplica-tion prior to the divergence of the haloarchaea, or they arose later, and were distributed by lateral gene transfer. However, the second explanation seems unlikely, because it cannot explain the conserved localisation in theche gene region. H. marismortui con-tains two additional DUF439 homologs located elsewhere in the genome. These two paralogues resemble more the “main” DUF439 proteins than the second homolog of the haloarchaea as can be seen in the multiple alignment and the phylogenetic tree.

If they also fulfil a function in taxis signalling remains elusive.

Overall, the presence of a DUF439 protein in (almost) all archaeal che gene re-gions indicates that these proteins are not only essential for chemo- and phototaxis in H. salinarum, but constitute a hitherto unrecognised class of archaeal chemotaxis proteins. The Che proteins in archaea were identified by homology to their bacte-rial counterparts (Rudolph and Oesterhelt, 1995;Rudolph et al., 1995;Szurmant and Ordal, 2004, and references therein). The absence of DUF439 in bacteria might ex-plain why these proteins were not recognised earlier. Since these proteins connect the chemotaxis system to the archaeal flagellum we propose the name CheF for this protein family.