Primary structure of the
genefor the murine
Taantigen-associated invariant chains (li). An alternatively spliced
exonencodes
acysteine-rich domain highly homologous to
arepetitive
sequenceof thyroglobulin
Norbert Koch, Wolfgang
Lauerl,
Juri Habicht and Bernhard Dobberstein'Institute of ImmunologyandGenetics, German Cancer Research Center, Im Neuenheimer Feld 280, D-6900 Heidelberg, and 'European Molecular Biology Laboratory, Postfach 10.2209, D-6900 Heldelberg, FRG Communicated by B.Dobberstein
Thegenefor murine Ia-associated invariant (1i)chains (1131 and 1i41)wascharacterized bysequence analysis. Thegene
extendsover - 9 kb and isorganized in nineexons. Exon 1 encodesthe 5' untranslated region and thecytoplasmicseg-
ment, exon2themembrane spaniingsegmentandadjacent aminoacids andexons3- 8the extracytoplasmic portion of 1131. Putative promotersequences were foundupstream of thestartof thecodingsequence. Betweenexons6 and 7 an
additional,alternatively splicedexon6bhas beenidentified.
Thisexonisspliced into themRNAcoding for the li-related Ii41 protein. Exon 6b encodesa cysteine-rich domain of 64 amino acids. It shows a remarkably high homology tothe repetitive elements inthyroglobulin,a precursorfor thyroid hormone. Based on this homology, it is suggested that this domain (TgR) in Tg and in 1141 may play a role either in hormone formation or as a carrier in the transport of molecules(thyroidhormoneorprocessedantigen respectively) between intracellular compartments.
Keywords:invariantchaingene/classIIhistocompatibility anti- gens/alternative-splicing/thyroglobulin/antigen processing
portionontheextracytoplasmic side ofthe membrane (Lippand Dobberstein, 1986). The single cysteine residue on the cytoplasmicside isfattyacylated (Koch and Hammerling, 1986).
Ii shares the membrane orientation and the site offattyacylation withthe transferrinreceptor(TR) which internalizes transferrin andrecycles it fromanintracellularcompartment totheplasma membrane (Omary and Trowbridge, 1981; Schneider etal., 1984). Because of its association and co-regulation with class
HMHCantigens, and its structural similaritytoreceptors, ithas been proposed that invariant chain may be involved in in- tracellulartransportorrecycling of Taand/or processedantigen complexes (Kvist et al., 1982; Claesson and Peterson, 1983;
Cresswell, 1985; Koch and Hammerling, 1986; Miller and Ger- main, 1986; Sekaly etal., 1986).
Immunochemical analysis has demonstrated that several forms ofmurine Ii chainareassociated with Taantigens (Zecheretal., 1984). RecentlytwomRNAspecies weredescribedcoding for 31-kd and 41-kd Ii-related proteins,
Mi31
and 1i41 respectively (Yamamoto, 1985b; Strubinetal., 1986b). 1i31 is expressed in amounts5-10 timeshigher than Ii41. Aftertransfectionofthe Ii gene into rat fibroblasts, both Ii31 and Ii41 wereexpressed (Yamamotoetal., 1985b).Proteins with similar mol. wts(In33A
0 10 20 30 40 kb
Introduction
Invariant chain (Ii) isaprotein whichis associated intracellular- ly with the murine and human class IIhistocompatibility (MHC) antigens, Taand HLA-D respectively (Jonesetal., 1978; Char-
ronetal., 1983; for reviewseeLong, 1985). This assemblyoc- curs shortly after insertion into the membrane of the rough endoplasmic reticulum (ER) (Kvistetal., 1982). Aftertransport totheGolgi complex and the addition of sialic acid residues, Ii dissociates from the class MHC antigens (Machamer and Cresswell, 1982; Ruddetal., 1985). ClassIIMHCantigensare transported to the cell surface where they are involved in the presentation offoreign antigenstoT-cells(Unanue, 1984). The fate and function of Ii is unclear.
Theexpressionof Taantigens and Ii iscoregulatedeventhough the respective genes are located on different chromosomes (Claesson-Welshetal., 1984; Koch and Harris, 1984; Yamamoto
etal., 1985a; Momburgetal., 1986). Interferon-y (IFN-'y) and B-cell stimulating factor (BSF-1) induce expression of both Ia antigens and Ii (Collins etal., 1984; Kochet al., 1984; Polla etal., 1986). ProteinshomologoustoIiarefound in allspecies whichareknowntoexpressclassHMHCantigens (Sungetal., 1982; Quill and Schwartz, 1983).
Murine Tichain isa31-kdtypeIImembraneglycoproteinwhich
spans the membraneonce and exposes the 29 amino-terminal residuesonthecytoplasmicand theglycosylated carboxy-terminal
B
10
EcoRI EcoRl
HindlllHindIll
BamHI Hindlil HindIll
13kb
EcoRI
II
liGene
10kb 2.9kb
2.3kb 3.7kb
0.75kb 0.8kb
Fig. 1. Physicalmapof themouseIi gene. (A)The 40-kb insert in cos 10.7containing themouseIigenewaspartially digestedwith sixrestriction
enzymesandarestrictionmapconstructed. TheIigenewaslocatedby using mouse orhuman cDNAprobesontwoadjacentEcoRI fragmentsof 2.9 and 10 kb inlength. They areindicatedbyslanted lines. (B)Sequencing strategy fortheIigene. Fragmentsasindicated in the figureweresubcloned intoplasmidvectors, deletionswereintroducedbyDNase I treatmentand selectedplasmids sequenced. The 5' end of thegeneis to the left and the 3' totheright.
Bam HI
Egl
I I II II
Eco_RI l
Hpa I
Kpn I IT
SstI IIIII
5
--4
-249 CCTGATGAATCCAGAAGTCTGCCTAGAAACAAGTGATGATAGCCCTGGCCAGCCAATGGGATCATGCAGGCCTTTCTACCTGTTTAGGGAACTCCC*CTTCATCCTGCCCAGGGAGGCAGCTTTG (15 mer)
-124 AGTGAGTGGGGAATTTCCAGATTTGTGGCTTTCAGTTCCACATCTACCATGTGGGCGGAGTGACCTGCTGTGGGCGAATCAGATTCCTTCCAGTATCAGCTTTAGAGGTGATCTTGGGGCTCAA
(CAAT-BOX) (Spl binding-site) (TATA-BOX)
Exon 1
2 GGGTC_CCAGACACACAGCAGCAGCAGCAGCAGCAGCAGCAGCAACAGCAGCAGCAGCAGCAGCGCCTGTGGGAAAAACTAGAGGCTAGAGCCATGGATGACCAACGCGACCTCATCTCTAACCAT
(OPA-sequence) MetAspAspGlnArgAspLeuIleSerAsnHis
127 GAACAGTTGCCCATACTGGGCAgACCGCCCTAGAGAGCCAGAAAGgtatgtgtgaataccagcagagagcccttacctctggaggacacagaatgcaggcctggggagggacacagagctctgttg GluGlnLeuProIleLeuGlyAsnArgProArgGluProGluAr
251 caggaaggttgccttgagtgaccttgagcgctgattttctgagtgaatTTTATTTATTTATTTATTTATTTATTTATTTATTTAT -- 3.5 kb -- Exon 2
1 tccgtcc caacagGTGCAGCCGTGGAGCTCTGTACACCGGTGTCTCTGTCCTGGTGGCTCTGCTCTTGGCTGGGCAGGCCACCACTGCTTACTTCCTGTACCAGCAACAGGGCCGCCTAGACALAG gCys SerArgG lyAl aLeuTyrThrG lyVal SerValLeuValAlaLeuLeuLeuAlaGl1yGlnAlaThrThrAlaTyrPheLeuTyrGlnG lnG lnGlyArgLeuAspLys
0
12 6 CTGACCATCACCTCCCAGAACCTGCAACTGGAGAGCCTTCGCATGAAGCTTCCGAAATgtgcgtgctccacctgtccc tcac ctcacagacatca tttctc cat ttagcccctcccga tctgcct LeuThrIleThrSerGlnAspLeuGlnLeuGluSerLeuArgMetLysLeuProLysS
Exon 3
251 tcctcccccgcaccggtttcaaatcttaacccctgggttccttactgccttggacctggactcatactgtcctgc ctgccccacagCTGCCAAACCTGTGAGCCAGATGCGGATGGCTACTCCCT erAlaLysProValSerGlnMetArgMetAlaThrProL 376 TGCTGATGCGTCCAATGTCCATGGATAACATGCTCCTTGGGgtaaggaagg -- 200 bp --
euLeuMetArgProMetSerMetAspAsnMetLeuLeuGly
1 cttcctccaagtctttggctgcaaaaatgctgttactcataagattactattggtaccttatccagggcagagcacatcaatcgaacggtgccaaatggtcagtcc tgaaaacatacataagtta 12 6 gcatgatacagacggggcaggtttaattaggactagtggtgt atatatatatagacatataatatatgcaagcaatgacaatgaatgaaaaaagaggccatgaatttgaagtgaagcaggaagga 2 51 aca tatgggagggcttgggaggaggaagcaaactctgctattgggaaggc tctgacaagc ttctgcccgcctgaccagagagggcatagaggcaggaagggcgtgaggggctggcacttccatgt
Exon 4
376 taggaggtggcagatttgagctgttgagtgcaagcacctgactcgtactagactatagctgctgatccctgcaatgctggtaaccctgttcccttccccacagCCTGTGAAGAACGTTACCAAGT ProValLysAsnValThrLysT 501 ACGGCAACATGACCCAGGACCATGTGATGCATCTGCTCACGgtgagtatcagagcg agctctgggtcacgtgggacccggccctcactcatgggttaggctcactataaactcaaca
yrGlyAsnMetThrGlnAspHisValMetHisLeuLeuThr
6 26 catgcttagtccaaggaatacaaggtggtccttaactgttgcgtacagtccatcccctacccaccttgagataagagtctatgtagtcctggaaccgactatgtagaacagttggctttgaactt
751 acaatttcggctttgaactc acaatttgcctgcctctcgtcccagtgctaggattaaaggcgtgcatcac taggccaggttccagccacctcacttttgaggagttaaaaa ttatggtcca ttga 876 gactggaaatataactcaaagattaaaaacaccacctactcttccaCaaatcctqaattcaattcccaaccacctcataactcacaaccatctgtaatataaatccaatqtcctcttctagta
(Alu-sequence)
1001 tt__agacaaacat tcqcatacataAAATAAATAAATAAATc_ttttaaaaaaatcaaattaaggtccaatggattgacttggccacaataccatgttctccttcccaaattaatatt
112 6 gcacatgc ttgcttcgtgtcagcacagtgcatggcaatggctcttggcctacagc agggaacactggttgtgtgaggacaggcagaggacccagac agagggaaaaaactggaqggtgctggttca 12 51 ctcctgacscctgactggagttccatagctgggtgcsccctcaccgctgctctccaacatggggaccaggggccaggcttggtgtggctaatgtccattcctcagaacgaaggcctgggaacatggg
Exon 5
1376 gtgcacatctccccttatttattccgggggttctctataacttccccttgccctgccgctctgcagAGGTCTGGACCCCTGGAGTACCCGCAGCTGkAGGGGACCTTCCCAGAGAATCTGkAGC ArgSerGlyProLeuGluTyrProGlnLeuLysGlyThrPheProGluAsnLeuLysH 1501 ATCTTAAGAACTCCATGGATGGCGTGAACTGGAAGgtaaacagcccctgttggaatctcttcttcttcccacagtagcttcaggactagaaagaggcaaagggaggactagggctgctgttctct
isLeuLysAsnSerMetAspGlyValAsnTrpLys
1626 tgaagctactgagggccttctaacattcacgacacccctgtggtctttaagaggcactgaggctgaagctggaccctccaagtttgtagtcaaggcagagtccagaagggtaggcggttgactc
Exon 6
1751 ctgaccctgaccatccatccacctctgatctccgttagATCTTCGAGAGCTGGATGAAGCAGTGGCTCTTGTTTGAGATGAGCAAGAACTCCCTGGAGGAGAAGAAGCCCACCGAGGCTCCACCT IlePheGluSerTrpMetLysGlnTrpLeuLeuPheGluMetSerLysAsnSerLeuGluGluLysLysProThrGluAlaProPro
1876 AAAGgtaccaggacgggagcttcggcctgccacagtgacc tactctctcagctcagtcttttctcgcctgttgttccttcaggttcggaaacccttatatcctatccgtgggtctgttctttcac LysG ==> Exon 7
V ==> Exon 6b
20 01 acacatgtccagagtacagaggcc tctaggtctctgtcttggagcaaaaccaactgaaaaggtacccggcagagtcsccccagcagtagataaagatacaaggaggagacggaatcatgat tgtc 212 6 cagatagagccctgggacctc tcac ttc acggcaggccagcatacaggaggctgaggcataaagcattggggaacgcc tctgctttaccttggtgctgcctc taacttgctgtgagtgc tggacg 2251 agcccatcccactggcttctgtgtcccacaagtggaaggctagaggcccatactgctctggacctaagtitctctgatatataaatcsccttacatgctatctcaaccagcttctagtaatgtttg
2 37 6 aatcccctctccagcgtccccaacaagtggcagcccccatttctactttatcccagtaaaggaaggtcagattcccagaggtgtccccattactgtccccctgaaacagaataatgtaacaaatg 2 501 cagaataattattactggggcaaatgacacctataagtcatgctttctctttttggatcttactggtttcagac tcaggcctcagttttctcatctaggc tatgggtc taccccataaggaaagc
Exon 6 b 26 26 aatttggataggcattaagccatggagtgGTGTGTGTGTGTGTGTgggtctctgctctgctgagatcctggcccagagtccacccac tcactgtcttcatcatgeccttcagTACTGACCAAGTGC
alLeuThrLysCys 2751 CAGGAAGAAGTCAGCCACATCCCTGCCGTCTACCCGGGTGCGTTCCGTCCCAAGTGCGACGAGAACGGTAkCTATTTGCCACTCCkGTGCCACGGGAGGCACTGCTkCTGCTGGTGTGTGTTCCC GlnGluGluVa lSerHisIleProAlaValTyrProGlyAlaPheArgProLysCysAspGluAsnGlyAsnTyrLeuProLeuGlnCysHisGlyArgHi sCysTyrCysTrpCysValPhePr 2 8 76 CAACGGCACTGAGGTTCCTCACACCAkGAGCCGCGGGCGCCATAACTGCAGTGgtaagc aggggacaccgtgtcac ataatctaaaggactagga agctctaggaaggccaagggtccaaaggtc
oAsnGlvThrGluValProHisThrLysSerArgGlyArgHisAsnCySerG
3001 cctcccagtatacatggggtcaggacatgggttggc tgtgcctgggagcatccaccatcagtcacacacacaccagcsccctttctccacacaggaactccttgtgtccttctaatc tttgtctc t 3126 tcttctccgcacgctgtcctgcactggagtcccacagacccacccgtctcactgtatccccaaccacactgtatgctctggtgcctcctttgccaagttcaaggcacagtggcaacaGTGTGTGT
Exon 7
3251 GTGTGTGTGTGTtggggtaagaa taacagtgagecctga tgcttcccttgcagAGCCACTGGACATGGAAGACCTATCTTCTGGCCTGGGAGTGACCAGGCAGGAACTGGGTCAAGgtaagggggg luProLeuAspMetGluAspLeuSerSerGlyLeuGlyValThrArgGlnGluLeuGlyGlnV
337 6 gatcacagagagggccacccacatgcagactctggtgacactgggacattcatcactatcactgggatggtttttccagaatggatggtggctgggagcacagtactacc tgagc ttagcagggg Exon 8
3501 gac agactgatctgcag ttccaagatgttgggcagaggaggc agggacaacaaactggtggcccagctgtatcaacctgtgcttctc t tcccagTCACCCTGTGAAGACAGAGGCCAGCTCTGCA alThrLeu
36 26 CAGC AGC AGCGC CC CCTGCTC TC CTGTGCC TCAGC C CTTCTTATGTTC CCTGATGTCAC AC CCCAC TTC CCGTCTC CC TGC AC CCTGGGGC TTGAGACTGGTGTCTGTTTCATCGTC C CAGGAC A 3751 CGGCAAATGAAGTCAGAACAGAAGGAGGACGCTGGAGGGCCTTGCTGGCTACCGCTATCTAAAGGGAACCCCCATTTCTGACCCATTAGTAGTCTTGAATGTGGGGCTCTGAGATAAAGGCCCGC 3 8 76 AGACAGGGACAAGGGATGCCCTACCCTTAACCTAGGCTGGACACATTTGCTGCCTTCTCCTCAAGGAAGAAGAACCCAAGCCCCTC CTCCCAGTAACCCCTCCTCACATC CTGCCACCCCCCCTC 4001 AAGCCCCAC CCCCTTTCAGGTTCCTTGCTCAGCCAAGCTTGTCAGCAGCCTGTAGGATCATGGTTCAAGTGACAATAAAGGAAGAAAGTAGAacactcttgcttctgcctctt 4113
(Poly-adenylation signal )
Fig.2 Sequenceofthemouseligene. The nucleotide sequence is showntogetherwith thepredictedaminoacid sequence of nineexons.Exon6b is an
alternatively spliced exonused togeneratethe Ii41 protem. Potential regulatorysequences, the CAATbox, oneconsensusSpIprotein bindingsite
(GGGCGG), andthe TATA box areindicatedatthe 5' end of the gene. A 15-mersegmentat -220isindicatedwhich ishighlyhomologousto a 15-mer segmentfound inmouseand human class II MHCantigengenes. Downstream of theputative transcriptionstartsite, analternatingCAGstructureOPA- sequence is found.
Alternating
TTTA, AAAT andGT nucleotidesareindicatedby capitalletters. An Alu-typerepeatelement in thefourthintron and thepolyadenylation signal AATAAAinthe 3' non-codingregion areunderlined. Inexon6b the sequencehighly homologoustotherepetitiveelement in
thyroglobulin (TgR) is also underlined. Potentialglycosylationsitesin thededuced amino acid sequenceareindicatedby an*and thecysteineresidueto whichpalmitic acid is bound(exon2)byadot.
1131
c
EcoRI
C M TgR
1i41
* ~
~~~~
q50aa
Fig.3. Outline of the exon-intron structure of the murine Ii gene. EightexonsencodeIi31. Closed boxes indicateexonsforcoding sequences, openboxes fornon-coding sequences. One exonencodes thecytoplasmic andonethemembrane-spanning segment. Sixexonsencode theextracytoplasmic part. An additional exon,6b, is used in mRNA coding for1141. This exon encodes a domain highly homologousto arepetitive domain in thyroglobulin (TgR). The site forfattyacid acylation is indicated by a dot, sites for potential N-glycosylation by x. C: cytoplasmic, M:membrane-spanning; EC: extracytoplasmic segment.
and In41) were also found in human lymphoblastoid cell lines (Quarantaetal., 1984;Strubin etal., 1986b). Strubin et
al.
con- cluded fromsequenceanalysis ofIn41cDNA that the 41-kd pro- tein resultsfromdifferentialsplicing of an invariant chain gene transcript (Strubinetal., 1986b). Otherformsofinvariant chain (In35 andIn43) were found in human cells. These were shown tobetranslated fromanAUG initiation codon upstream of that used fortheproduction of the 33-kd and 41-kd major forms of human invariant chain (Strubin etal., 1986a).Thestructure of thehuman invariant chain gene has recently been determined (Kudo etal., 1985; O'Sullivanetal., 1986).
It was shown to beorganized in nine exons. One exon codes forthe 5' untranslated regionandthecytoplasmic segment, one for the membrane spanning segment and seven for the extra- cytoplasmic portion.
Wedescribe here the nucleotide sequence of the murineIigene.
Severalconsensus sequenceswithpossibleregulatory functions are found in the 5' untranslated region. Comparison with the analogoussequenceofthe humangenerevealsastronghomology in allexonsincluding theexon6b whichbyalternative splicing gives riseto the 41-kd
1i41
protein.Results
Structure of theIi gene
The isolation andexpression ofa genomic clone coding forIi chainshas recently been described(Yamamoto etal., 1985b).
The40-kbgenomic clonecos 10.7wasshowntocontain thecom- plete gene coding for 1i31 and Ii41. We mappedthe genomic clonecos 10.7by restriction analysis. Arestriction map ofthe Iigeneand its
flanking
regionsis shown inFigure IA. Hybridiza- tionswith5'and 3' invariant chaincDNAprobesrevealed that theentire Ii gene is containedon twoEcoRIfragments,a2.9-kb fragment with the 5' end anda 10-kbEcoRIfragmentwith the 3' endofthegene.EcoRIorHindmfragmentsasshowninFigure lB
were subclonedand all theexonsand several of the introns were sequenced. The sequences areshown inFigure
2. Com-parison
of theIi31
cDNA sequence with thegenomic
sequence revealed that the Ii gene iscomposedofeight
exons(Figure 3).
Exon 1encodes the5'untranslated
region
andtheamino-terminal portion of thecytoplasmic
segment. Exon 2 encodes thethree amino acids located on thecytoplasmic side,
the membrane-Table I. Percenthomology betweenmurine and human invariant chainexons and introns
% homology Number ofexon
5'NC 1 2 3 4 5 6 6b 7 83'NC
Exon 84 76 84 72 81 77 80 88 83 75 53
Intron 56 50 50 52 56 55
spanningsegmentand 23amino acid residuesontheextracyto- plasmic sideofthe membrane. Sixexons(3-8) encode theextra- cytoplasmicportion. Thetwosites for the addition ofN-linked carbohydrate side chains are encodedby exon 4. Comparison of the murine Ii gene sequence withthe humanonerevealed the same exon -intron structure. Thehomology between theexons wasfound to be72-84% and between the introns -50% (Table I).
Potential regulatorysequences inthe5' and3'non-codingregions Of the 5' flanking region, 342bp were sequenced (Figure 2).
ThesequencesCATCT and TTTAAwerefoundupstreamofthe ATGinitiationcodon(underlinedinFigure2). They showstrong homology to the 'CAAT' and 'TATA' consensus sequences which are indispensable for specific initiation of transcription (BreathnachandChambon, 1981).A consensusSpI protein bin- ding site (GGGCGG) was found upstream of the TATA box (Gidoni etal., 1984). The capsite of Ii gene transcription has not yet been determined. In the human Ii gene the start of transcriptionhas been determinedtobe located 22
bp
downstream ofthe proposedTATAbox(Strubinetal., 1984a). Theanalogous positioninthe murinegeneisarbitrarily assigned
+1. These- quence between the TATA box and the ATG initiation codon showsarepetitiveCAG sequencecharacteristic for so-called OPA elementspreviouslyfound in homoeotic and other genes(Whar- ton etal., 1985). Itsfunctional relevanceremainstobeshown.As theexpression of Ii gene is induced
by
IFN--y we com-pared its5' sequence with those of other
IFN--y
inducible genes such as the class Hhistocompatibilty antigens.
Twoelements,
a15-mer anda8-mer,were
previously suggested
tobe involved in thetranscriptional regulation by IFN--y (O'Sullivan
etal.,
1986). Only the 15-mer sequence could be identified in the Ii gene(-228to-213) (Figure 2).
Control ofIigeneexpression
1i41
CM EC
I-
11|H xx -xM
1
2 3 s
6 7shyrogtobulin
TgRThyroglobulin
i~~~~~~~~~~~~~~
500aa r/AW/V/A
x x x x x ,x x
*VAV iV 8 9 10
TgACE
Fig. 4. Alignment of thehomologousregionsbetweenIi41 and thyroglobulin. Thesegment inIi41 encodedbyexon6b and the 10 times repeatedhomologous regions in thyroglobulin (TgR) are indicated by boxes with slanted lines. Segmentswith thehighesthomology areconnectedbylines. Potential N-
glycosylationsitesare indicated by x. C: indicates thecytoplasmic segment;M, membrane-spanning segment;EC, extracytoplasmicsegment; S, signal sequence;TgACE, the segment homologoustoacetylcholinesterase (Swillens etal., 1986); f, hormonogenictyrosines.
E NG
RNFY
LrLFQ
CIH G R H C Yjc W C V|F P N GTFEVP
HV T S R GHR H N * C sRIPIKICID
ENIGINIYILIPILIQ
C|Y G S J GYIC
W C V|F PNIGITIE VIP NIT RIS RIGIH
H N *CiS Q C Q D S|G|D|Y|A|P|V|Q C|D V Q H V Q|C W C V|D A E|G|M|E V|Y G|T R Q L|G|R P K R|C|PG Y p V Q
cl-
- - -IC W C Vi- - -IGI-_E Vl- GT R
S K G C
Fig.5.Alignmentofpartofthe amino acidsequenceencodedbyexon6b ofmouse(m)and human (h)invariantchaingenewith thesecondrepetitiveunits of humanthyroglobulin (TgR)and aconsensusTgRsequence. Theconsensussequence(CS)has been established fromacomparisonoffive of the humanTg repetitive units (MalthieryandLissitzky, 1985). Dots indicateagap, dashes indicate variable amino acid residues.
isalsocoupledtonon-proliferation. TheIi geneandthemetal- lothionine gene are inducible by the arrest of proliferation (Rahmsdorfetal., 1983; Angeletal., 1986).Between the TATA and the ATGstartsite for translationaregionofhomologywas
found.
Exon 6b encodesacysteine-richdomainhomologoustopartof thyroglobulin
After transfer of the murineEigeneintoratfibroblasts,tworelated invariant chainproteinswereidentified,oneof 31 kd, i31, and
oneof 41 kd, Ii41 (Yamamotoetal., 1985b). Ithas been sug-
gestedthatthe mRNAs coding forthese twoproteinsare pro-
ducedby differential splicing. In ordertosearch forthesequence
intheIi genethat encodesIi41 proteinweused its known bio- chemicalpropertiesand thehomologytotherecently sequenced cDNA for the human 41-kd form of the invariantchain, In41 (Strubin etal., 1986b). 1i41 should contain at least three sites for N-linkedglycosylation. Mi31 andIi41 should havecommon
amino- and carboxy-terminal portions; and the additional seg- mentin Ii41 should have a mol. wt of -5 kd and be rich in cysteins (Yamamotoetal., 1985b; LippandDobberstein, 1986).
Based on this information, a sequence between the sixth and seventhexons was found which fulfilled all the criteria foran
additionalexonusedinthe mRNAcodingforthe41-kdprotein.
This exon 6b contains 192 bp between consensus splice sites.
Theresulting reading frame and the deduced amino acidsequence
is shown in Figure2. Exon6bencodes 64 amino acid residues ofwhich seven are cysteines. It hastwopotential sites for the additionofN-linkedcarbohydrateside-chains. Theexon6bshows 88% homology tothehuman p41-1 cDNA (TableIand Strubin
etal., 1986b).
The deduced protein sequence from exon6b was compared
to the sequences inthe protein data base maintain by the Na- tional Biomedical ResearchFoundation, Washington,DCbyus-
ing the searchprogram FASTP (Lipmanand Pearson, 1985).
A significant homology of38% was found by Patrick Argos (EMBL) tothe repetitivesequence closetothe aminoterminus ofthyroglobulin(Tg).Thissequencecentresaround the tetrapep- tideCys-Trp-Cys-Valand is 10 timesrepeatedinTg (Figure4) (MalthieryandLissitzky, 1985;Merckenetal., 1985a,b). Acon- sensus sequence for theTg repetitiveelements(TgR)has been derived showing conserved positions for Cys, Pro and Gly residues(MalthieryandLissitzky, 1985).Whenthe amino acid
sequence deduced from exon 6b was compared with this con- sensussequence, nearlyallpositionswerefoundtobeconserv-
ed(Figure 5). Mouseand human 6b sequences were identical inallpositionstotheTgRconsensus sequence (Figure 5). It is interestingto notethatthe deducedcysteineatposition 39of the
mouse6b exon is notconserved in the human 6bexon and is alsonotpartoftheTgRconsensussequence.Incontrast,all the othercysteine residuesareconserved between theTgR and the 6b segment.
Discussion
Thesequenceandexon-intronstructureof the murine Ii chain
geneweredetermined. When comparedtothe human invariant chaingene, ahighsimilarityintheoverall structural organiza- tion,andparticularly between theexons, wasfound. However,
one significant difference was observed in the 5' untranslated regions. The human In33 and In35 proteins result from alter- native initiation at two in-phase AUG codons (Strubin etal.,
1986a). Only the second initiation sitewasfound inthe 5' un-
translatedregion of themouseIi mRNA (Figure2). Itthusap-
P R 20
K C D
m6b h6b hTg Cs
L 98
P
P C
x11,4[/g/gU/x
I
pears thatthe 35-kd form in man might not have an essential function distinct from the 33-kd one.
The 41-kd form of invariant chain, in contrast, is found in all species which have been screened for its presence. This form is the result ofalternative splicing (Yamamoto et al., 1985b;
Strubin etal., 1986b). The exon used in this event is located betweenexons6 and 7 and is therefore named 6b. Interestingly, when exonsof the human and murine invariantchaingene were compared, exons 6b showed the highest homology, 88%
homologywas found between exons 6b, whereas 72-84% bet- ween the others (Table I).
Themechanism ofalternative splicing is not yet understood.
Examination of intronsbetween the sixth and seventh exons did not revealany obvious sequence motifs which might effect effi- ciencyofsplicing.Several other geneshave been found to employ alternative splicing. These include the T36 gene in human T cells (Tunnacliffe etal., 1986) and the H-2 class I genes (Kress et al., 1983; Transy etal., 1984). In theT36 gene a stretch of 44 bp ofalternatingGT was found to flank thealternativelyspliced exon (Tunnacliffe et al., 1986). Stretches ofalternating GTare also found in the Ii gene between exons 6 and 7. Their significance for alternative splicing remains to be shown.
Exon-intronorganization of the Ii gene is very similar to that foundfortheasialoglycoprotein receptor (ASGR) gene (Leung etal., 1985). ASGR, likethe Ii31 and1i41 proteins, is a type IImembrane protein. The cytoplasmic and membrane spanning segmentofASGR areeachencoded by separate exons. Five ex- ons encodethe extracytoplasmic segment. The carbohydrate bin- ding site in ASGR has been localized in the carboxy-terminal segment encoded by exons 7-9. The functional domain in Ii chains has not yet been identified.
Comparison ofthebiochemical propertiesoftheIi31and
1i41
chain hadrevealedanextensivesimilarity.Anadditional cysteine- rich domainhas been postulated for the1141
chain(Lipp and Dob- berstein, 1986). Exon 6b codes for64amino acid residues, seven of which are cysteines. This segment intheIi41 proteinis located close to the carboxyterminus on the extracytoplasmic side of the membrane (Figure 3). Cysteine residues in secretory and membraneproteinsareoften foundtoorganizestructurally and functionally distinctdomains. Bestexamplesareprotein domains of the class I and II histocompatibility antigens (Nathenson et al., 1981) immunoglobulins (Sakanoetal., 1979) and thelow density lipoprotein (LDL)receptor(Yamamotoetal., 1984).
A structuralmotifofrepeated, cysteine-richsequenceswas,for in- stance,demonstratedforthe humanEGFandLDLreceptors(Rail
etal., 1985). When a protein data base was searched for se- quenceshomologoustothe segment encodedbyexon6b,astrik- inghomology of38% wasfound to acysteine-rich segmentin thyroglobulin (Tg). This segment is 10 times repeated in the amino-terminal half of Tg(Figure 4). Aconsensus sequencefor thecysteine-rich TgRhas beenproposed. Itcentres aroundthe sequence motif
Cys-Trp-Cys-Val (Malthiery
andLissitzky,
1985). The entire TgR consensus sequence is found conserved in the sequence ofmouse and human exon 6b(Figure 5).
Whatcouldbe thestructuralandfunctional
significance
ofsuch anextensivehomology?
Tg isaniodinatedprecursorprotein
for theproductionofthyroid
hormone(Wollman, 1969).
BovineTg
is a glycosylated
phosphorylated
and sulfatedprotein
of 2750 amino acid residues(for
reviewseeHerzog, 1984).
Itisadimeric glycoprotein of660kdwhich is secretedby
thethyrocytes
and stored in the lumen of the thyroid follicle. Here the protein becomesiodinatedattyrosylresiduesandat3-4of theseresidues thyroxine (T4) andtriiodothyronine
(T3) are formed. These arelocatedatthe extreme endsofTg onamino acidresidues 5, 2555, 2569 and2748 (Figure 4 and Mercken etal., 1985b). Active hormone is releasedafterendocytosisorphagocytosisof thyro- globulinanditshydrolysismostlikely inlysosomes. Some Tg, however, seems to escapelysosomaldegradation and thusap- pearsintact in theserum(VanHerleetal., 1979;Herzog, 1984).
Ithas been suggestedthatthe large thyroglobulin structurehas evolved for efficient and regulated iodination and coupling of the hormonogenic tyrosines.
The exon6b of the Ii gene codes for 64 amino acid residues.
Itsaminoacidsequencebetweenresidues 20and64 is as homo- logous tothe second TgR as the 10 Tg repeats are among each other(Figures4and 5). Ithas been proposed that the TGRs have arisen by gene duplication of a primordial gene coding for a 60 amino acidlongbuilding block (Mustietal., 1986).This sug- gestion isfurthersupportedby thelocationofexon-intron boun- daries withintheTg gene. MostoftheTgR unitsareencoded byseparate exons(Mustietal., 1986;R.DiLauro,personalcom- munication). Therefore,theexon6b ofthe Iigenemight be deriv- edfromthesameprimordialbuilding blockasthe 10 TgR units in the Tg gene. As the homology is high between the TgR segments inIi41 and in Tg, thesetwosegments might perform similar functions.
The function of the 10 TgR elements in Tgis notknown. It has been suggested that the unusually largeTgproteinstructure supportstheefficiencyofiodination and the formationofthe hor- monogenictyrosines. Itisconceivable thatTgR segmentsfunc- tioninthe formation of iodinated hormones. Hormoneformation ontheTgRmoleculeoccursoutside the cell in thethyroidfolli- cle. If Ii41 was ahormoneprecursorsimilartoTg, oneshould find itonthe cell surface. Noclear evidence for acell surface locationhas,however, beenfoundforIi41protein. Clearly, more detailed studiesarerequiredtoelucidateapossiblehormone func- tion ofIi41.
Tgundergoes extensive intracellulartransport. Itis atypical secretory protein which is secreted into the thyroid follicle, iodinated, endocytosed, degradedin the lysosomesandthe hor- monefinally released in thecirculation (Herzog, 1984; Vassart etal., 1985). Theremustbesomestructuralelementsin
Tg
that direct this molecule tothe different stations. TheTgR-element
could function inthetransporttothelysosomes
orinthetransport ofT4 and T3 out of thelysosomes
to the basal cell surface. A function involving alysosomal oracidiccompartment has also been suggested forinvariant chains. This waslargely
basedon thetransient associationofIi3
1 andIi41 with class II MHCan-tigensandtheir dissociationinanacidcompartment
(Machamer
andCresswell, 1984; NowellandQuaranta,1985).
Class II MHC antigens are involved inthepresentation
offoreign antigens
to Tcells. Most antigenshavetobeprocessed
beforethey
canef- ficiently bepresented.Processing
appears to occurinan intra- cellularacidiccompartmentand involves inmost casesproteolytic
degradation(for
review seeUnanue, 1984).
The two best- characterizedacidic compartments in the cellarethe endosomes and the lysosomes.Digestiveenzymesarewellcharacterized in
lysosomes.
Ithas alwaysbeen anenigma
howprocessed antigen
is retrieved from theprocessing
compartments, associates with class IIantigens
and appears in association with class II
antigens
onthe cellsur-face. Could 1i31 and
Ii41 perform
functions in the retrieval of antigen fromdifferentprocessing compartments?
Each of these moleculesmight then serveadifferent
route, one anendosome-
likecoiPla)rtment (li31)
andonethedigestive lysosome compart-
ment
(1i41).
Itis not
yet
known in which formthyroid
hormone reaches the cell surface and enters the circulation. Three to four hor-monogenic peptides
are released from one moleculeofthyro- globulin.
Itis conceivablethat the 10TgRs
areinvolved in the transport of hormone orhormonogenic peptides
from thelysosomes
tothe cell surface where the hormone is released into the circulation. If thisassumption
is correct, then theTgR
elements inIi41
would be a carrier forprocessed antigen
and inTg
forhormonogenic peptides
orhormone between thelyso-
somesand anendosome-like
compartment
orthe cell surface.Materials and methods
Cosmid andplasmids
ThegenomicIi chain clonecos10.7containingthe entire gene for Ii chainwas selectedfromacosmidlibrarymadefrom AKRmouseDNA. Itwasobtained fromM.Steinmetz,Basel(Yamamotoetal., 1985b). Plasmidpli-5containing
mostofthecodingsequenceof Iichain and the 3'non-coding regionhas been describedpreviously (Singeretal., 1984).It lacks the sequencescodingfor the
cytoplasmicsegmentof Ii andpartof themembrane-spanningregion (Singeret al., 1984). Plasmidp-y2wasobtained fromP.A.Peterson,Sweden. It encodes theentire human invariant chain(Claessonetal.,1983).Plasmid, Ii-5wasused tolocate the 3' end of the1igeneand the 5'320-bpPstIfragmentwasusedto
identify its 5' end.
DNAmapping, subeloningandsequencing
Arestriction map ofthe 40-kb insert in clonecos 10.7wasestablishedbyusing the methods of Rackwitzetal. (1985)and Zehetner and Lehrach(1986). By Southernblotting, usingmurineand human cDNAsasprobes,twoadjacentEcoRI
fragmentsof 2.9 and 10 kbwereidentifiedtocontain the Ii gene(Southern,1975) (Figure 1). The2.9-kbEcoRIfragmentwasfurther deletion subclonedbythe method ofFrischaufetal. (1980).Size-selected subclonesweresequencedeither
bytheSangerdideoxy-chaintermination method(Sangeretal.,1977)orbythe method of Labeitusing a-phosphorothioates (Labeitetal., 1986).Inthe latter method thea-thiotriphosphate analogsofdeoxynucleosidetriphosphatesareus-
edtoincorporateexonucleaseIE-resistantresiduesintoDNA(Labeitetal., 1987).
The 10-kb EcoRIfragmentwasdigestedwith HindH and theresulting fragments
subcloned intopUC8orpBr322 (seeFigure 1 fordetails).
The 3' end of the 2.3-kbEcoRI-HindmI fragmentwassequencedafter clon-
ingintopBR322. The0.75-,3.7- and 0.8-kbfragmentswerecloned intopUC8 and their5'and 3' endssequenced.The3.7-kbfragmentwasdeletion subcloned
(Frischaufetal., 1980)andselectedfragmentswith deletions in their 5' end, sequenced. The localization of thefragments is shown in Figure lB.
Acknowledgements
WethankM.Steinmetz for the Iigenomicclonecos10.7, P.Argosfor search oftheproteindatabase, M.Burmeister,S.Labeit, A.Nordheim, W.Rowekamp andK.Seedorffor manyhelpful discussions,M.-T.Haeuptle,V.Herzog,W.Hutt- ner,S.Kvist,andK.Simons for criticalcommentsonthemanuscript,A.Steiner forexperttypingof themanuscriptandP.Riedingerfor thedrawings.This work
wassupported bygrants from the DeutscheForschungsgemeinschaft,Ko810/2-2 and Do 199/4-3.
References
Angel,P., Poting,A.,Mallick,U., Rahmsdorf,H.J.,Schorpp,M.and Herrlich,P.
(1986)Mol. CellBiol., 6, 1760-1766.
Breathnach,R. andChambon,P. (1981)Annu. Rev.Biochem., 50, 349-383.
Charron,D.J., Aellen-Schulz,M.-F., St Geme,J.,III, Erlich,H.A. and
McDewitt,H.O. (1983)Mol.Immunol.,20, 21-32.
Claesson,L. and Peterson,P.A. (1983) Biochemistry,22, 3206-3213.
Claesson,L.,Larhanmar,D.,Rask,L.andPeterson,P.A.(1983) Proc.Natl. Acad.
Sci. USA, 80, 7395-7399.
Claesson-Welsh,L., Barker,P.E., Larhammar,D., Rask,L., Ruddle,F.H. and
Peterson,P.A. (1984)Immunogenetics, 20,89-93.
Collins,T., Korman,A.J., Wake,C.T., Boss,J.M., Kappes,D.J., Fiers,W.,
Ault,K.A., Gimbrone,M.A., Strominger,J.L.and Pober,J.S.(1984)Proc.Nati.
Acad. Sci. USA, 81, 4917-4921.
Cresswell,P. (1985)Proc.
Natl.
Acad. Sci. USA,82, 8188-8192.Frischauf,A.-M., Garoff,H. and Lehrach,H. (1980)
Nucleic
AcidsRes., 8,5541-5549.
Gidoni,D., Dynan,W.S.andTjian,R. (1984)Nature, 312,409-413.
Herzog,V. (1984) Int. Rev.
Cytol.,
91, 107-139.Jones,P.P., Murphy,D.B., Hewgill,D. and McDewitt,H.O. (1978) Im- munochemistry, 16, 51-60.
Koch,N. and Harris,A.W. (1984) J. Immunol., 132, 12-15.
Koch,N., Wong,G.H.W. and Schrader,J.W. (1984) J. Immunol., 132, 1361-1369.
Koch,N. and Hammerling,G.J. (1986)J. Biol. Chem., 261, 3434-3440.
Kress,M., Glaros,D., Khoury,G. andJay,G. (1983) Nature, 306, 602-604.
Kudo,J., Chao,L.-Y., Narni,F. and Saunders,G.F. (1985)NucleicAcids Res., 13, 8827-8841.
Kvist,S., Winian,K., Claesson,L., Peterson,P.A. and Dobberstein,B. (1982)Cell, 29, 61-69.
Labeit,S., Lehrach,H. andGoody,R.S. (1986) DNA, 5, 173-177.
Labeit,S., Lehrach,H. andGoody,R.S. (1987)Anal. Biochem., inpress.
Leung,J.O., Holland,E.C. and Drickamer,K. (1985) J. Biol. Chem., 260, 12523-12527.
Lipman,D.J. andPearson,W.R. (1985) Science, 227, 1435-1441.
Lipp,J.and Dobberstein,B. (1986)J. CellBiol., 102, 2169-2175.
Long,E.D. (1985) SurveyImunol. Res., 4, 27-34.
Machamer,C.E.andCresswell,P. (1982) J. Immunol., 129, 2564-2569.
Machamer,C.E. and Cresswell,P. (1984) Proc. Natl. Acad. Sci. USA, 81, 1287-1291.
Malthiery,Y.and Lissitzky,S. (1985) Eur. J. Biochem., 147, 53-58.
Mercken,L., Simons,M.-J.,DeMartynoff,G.,Swillens,S. and Vassart,G.(1985a) Eur. J. Biochem., 147,59-64.
Mercken,L., Simons,M.-J., Swillens,S., Massaer,M. and Vassart,G. (1985b) Nature,316, 647-651.
Miller,J. and Germain,R.N. (1986)J. Exp. Med., 164, 1478-1489.
Momburg,F., Koch,N.,Mo1ler,P.,Moldenhauer,G.,Butcher,G.W. and Hfim- merling,G.J. (1986)J. Immunol., 136,940-948.
Musti,A.M., Awedimento,E.V.,Polistina,C., Ursini,V.M., Obici,S.,Nitsch,L., Cocozza,S.and DiLaura,R.(1986)Proc.Natl. Acad. Sci. USA, 83, 323-327.
Nathenson,S.G., Uehara,H.,Ewenstein,B.M., Kindt,T.J.andColigan,J.E.(1981) Annu. Rev. Biochem., 50, 1025-1052.
Nowell,J. andQuaranta,V. (1985)J. Exp. Med., 162, 1371-1376.
Omary,M.B.andTrowbridge,I.S. (1981)J.Biol. Chem.,256, 12888-12892.
O'Sullivan,D., Larhammar,D., Wilson,M.C., Peterson,P.A.andQuaranta,V.
(1986)Proc. Natl. Acad. Sci. USA, 83,4484-4488.
Polla,B.S., Poljak,A., Geier,S.G., Nathenson,S.G.,Ohara,J., Paul,W.E.and Glimcher,L.H. (1986)Proc. Natl. Acad. Sci. USA, 83, 4878-4882.
Quaranta,V.,Majdic,O., Stingl,G., Liszka,K., Honigsmann,H.andKnapp,W.
(1984)J. Immunol., 132, 1900-1905.
Quill,H. andSchwartz,B.D. (1983)Mol. Immunol., 12, 1333-1345.
Rall,L.B., Scott,J.,Bell,G.J.,Crawford,R.J., Penschow,J.D., Niall,H.O.and Coghlan,J.P. (1985)Nature, 313,228-231.
Rackwitz,H.R., Zehetner,G., Murialdo,H., Delius,H.,Chai,J.-H.,Poustka,A., Frischauf,A.andLehrach,H. (1985) Gene,40,259-266.
Rahmsdorf,H.J., Koch,N., Mallick,U. and Herrfich,P. (1983) EMBO J., 2, 811-816.
Rudd,C.E.,Bodmer,J.G., Bodmer,W.F.andCrumpton,M.J. (1985)J. Biol.
Chem.,260, 1927-1936.
Sakano,H.,Rogers,J.H., Huppi,K., Brack,C., Traunecker,A.,Maki,R.,WaII,R.
andTonegawa,S. (1979)Nature, 277,627-633.
Sanger,F., Nicklen,S. andCoulson,A.R. (1977)Proc. Natl. Acad. Sci. USA, 74, 5463-5467.
Schneider,C.,Owen,M.J., Banville,D.andWilliams,J.G.(1984)Nature, 311, 675-678.
Sekaly,R.P.,Torelle,C., Strubin,M., Mach,B.andLong,E.O. (1986) J. Exp.
Med., 164, 1490-1504.
Singer,P.A.,Lauer,W., Dembic,Z.,Mayer,W.E., Lipp,J.,Koch,N., Hammer-
ling,G.J., Klein,J. andDobberstein,B. (1984)EMBO J.,3, 873-877.
Southern,E.M. (1975)J. Mol.Biol., 98, 503-517.
Strubin,M., Long,E.O. andMach,B. (1986a) Cell, 47,619-625.
Strubin,M., Berte,C. andMach,B. (1986b)EMBOJ., 5, 3483-3488.
Sung,E.,Duncan,W.R.,Streilein,J.W. andJones,P.P. (1982) Immunogenetics, 16, 425-433.
Swillens,S.,Ludgate,M., Mercken,L.,Dumont,J.E.and Vassart,G. (1986)Bio- chem. Biophys. Res. Commun.,137, 142-148.
Transy,C., Lalanne,J.-L. andKourilsky,P. (1984) EMBO J., 3, 2383-2386.
Tunnacliffe,A.,Sims,J.E.andRabbitts,T.H.(1986) EMBOJ.,5, 1245-1252.
Unanue,E.R. (1984)Annu. Rev. ImmoL., 2, 395-428.
VanHerle,A.J., Vassart,G.andDumont,J.E.(1979) New Engl. J. Med.,301, 239-249.
Vassart,G., Bacolla,A., Brocas,H.,Christophe,D.,Martynoff,G.de,Lerich,A., Mercke,L.,Parma,J., Pohl,V.,Targovnik,H.andvanHenverswyn,B.(1985) Mol.CellEndocrinol., 40,89-97.
Wharton,K.A.,Yedvobnick,B.,Finnerty,V.G.andArtavanis-Tsakonas,S.(1985) Cell,40, 55-62.
Wollman,S.H. (1969) InDingle,J.T.andFell,H.B. (eds), Lysosomes in Biology andPathology. North-Holland, Amsterdam, pp. 483 -512.
Yamamoto,K., Floyd-Smith,G., Francke,U., Koch,N., Lauer,W., Dobber- stein,B.,Schiifer,R.andHammerling,G.J.(1985a) Immunogenetics, 21,83-90.
Yamamoto,K., Koch,N., Steinmetz,M. andHammerling,G.J. (1985b)J. Im- munol.,134, 3461-3467.
Yamamoto,T., Davis,C.G., Brown,M.S., Schneider,W.J.,Casey,M.L., Golds- tein,J.L. andRussell,D.W. (1984) Cell, 39, 27-38.
Zecher,R., Ballhausen,W., Reske,K., Linder,D., Schluter,M. and Sirrm,S. (1984) Eur. J. Immunol., 14, 511-517.
Zehetner,G. andLehrach,H. (1986) Nucleic Acids Res., 14, 335-349.
Received on February 19, 1987; revised onApril 2, 1987