Adaptive FUzzy Ciusterillg .
. ' .'." ',"': . ~ '. . '., .. . . :'
Nicolas Cebron alld Michael R' B-'ei:thold
.. ·.·Departine~t 'of C;O!l1~utl}(,.and..Information Science, University of Konstanz
"··.F':'.·:~ ,. 7845.7 Konstanz, German'Y'
".' ,{ cebltoI'l,bertohold}@inf . uni-ko)')st!:anz. de
: .... ,. ; ...
--:'.-
iths~ci'-"'-:Cii>sS'ifJ:ii!gial'ge JiI'iOst~·~ith;i5iifliny 8'prl6rl infoi', m1itii!tI·j!o~&dl pwblcm 'Cspcdllily 1l'(tlidiCl<i6f bilijii'f6ntlatics:' lit thjs work;.'~::expi()rt. 'IIie ,·task . of· dilssifj'irt:g ·huiidrejlS.·~r tbl!"sa~)(!s':ot:cell'~y ·'images. obtained',by a.h!~~tJ,IrC)ughput
~lliJlgcam~r"'·1,b.e glial is"I!)·hb.~la fe"' .. ~e.cted .examples
by·.p'and~n,4;ti'~!lIOm~p.CJlII): i~l:IJ;l, the .. ~ ,of. tbe ima.ges
a.ftcrwards. Up to now, su~'jm"ge:HII:e ~lassjfi~dhy~:dp\.$ ilM' clasS)Ii~tilill techniques 'that are"desigiiedii) ~kJ~' jl" sptCilic proiif~m. We propOse "8 'new adaptive active ·dllslering scheme, basech'n.ar\iitiiWil·Fu;tl:y: (),.mC1l~ cJusleting and Lea.mlng Vecwr Q~anti~tion. Tb,is ~eml{ cjirt initiillly cluster .. Iarge datasels unsUll~nise(l:and. tb~.,alhiws for adjuStlnent -oCtbe,cla,.sification.
~y' tI!:e~f, 'M!)ti,ta(¢ bf.tbe-) c!i6ctpt ,of lic!ive le!lrru.ng, the learr~tttries to qyerYJh~ rtiOliti'U§.I'Uly.'e~'riples ih Uidl\aroilnJ;
P.l'llteSS 1Wd .. t)i~~6t€Jl(eep:$ tile ·costs"fo .. sii:i>etVisi~ii at 11 low.
leve\; 'A'c(famt-"'Otk''toY
ur e'
dassifiC::litilif{'Qf cdl ~'liy iinages II..~ ~!tJ!).is.t~l,1icr~e-j~:j!ltTocl·l!cec~. W(cJ;iinplll'e ourappnlllq.. t,~.Qt~e"/~I.iited: t~lli~ucs in tbjs.ll~l~blised qnS!!ver~l. !lat!l~\.$.
..
"'~~ ~~~;'"...
~:' 1.
':INr.R.oDuctiB~·; .. ' :.~', 'f:IJ.~d~V~)9Il;P.~T)t~f high~~iltro~giip.ll; iinagillg i!lsttuh~ent~.
~.,g,rluore~ri<i:.;~.i<irosco~c~erali; r~~ulted j~ ,thernbc- conling-.a ,.prolJ'tislhk:·~oo.l -1.O··sludy' .. ·.We .effect·· ot"·agenls ... on
~~:~~ W~ J · ~g; :~7~i:d~~ 1i,t:;;~7::t:;':t:~ur~U.ai ~t~~
classified·bY'a'ibiologillirexperLWho,writes,u·script··to-'aila-lyie.
,:h:ell·-aSSli'{'A'i;;,(IlC'·tiJipCarancc·of th'c"cells in"'diffcfent'ass'ays ctiai:ige:~;iliesc:ripfi;ljios<
iie.
:~d:~ri\cdi n9\Yi~~anY,.l1j~tJirig the relevant features'fo' <iLassiry~,the·:'cell .. Jypes,·Ci.>rtectlyCan:be.:~:~:~~~~:~;~~1.!.I~L 1~:'i::~~j~J!f!~~:I::~~~)::'
interpre!. As wea.rede~ling;WitJiI)9ri~Om:p.ul¢r ~X'~(t,$;;~e' ,~~ea:rnoqe\St~i\,I:~i\iV~'igryj~,~i,I.~as'frY;We;:1I~~.:·(tl~·,~6J;lceiit' ofclustenng
to
rcrlui£ihe'c omple'xityof our
iinai~'dat<1SeL Cl.oste·r analysis. tec~niques ·have· been wid.ely usedillithe ':lre,,)". ::ii.: .... , .~ .... ;-':;:",
c.lusteri·ng. may-not be·satisfactory. thus
, we
need. ·Io.a'dapt"ihe· chlsleringsol!)IR'it . reflects 'the 'desired classmcatl()j}::Zf.the user,. ' ' ; .'~.' "
As we jlfe. dealing wi~h a large amount of uniahele.d data, t.h~ u~ei's'riouji.i
fabelonly a sOl a li
.subset to ti-aiiilh~~lassirler.Choosing ~ri~oinly dr~wn ~xarriple$
fro. m
th.e .d.ata.s~ih e lps
to improve the dassi/icati~inaccujacybut-neCct&"a'iafgenumbt:r of.iterations to collVerge. Ins,ead qf picking redundant exam- ples, it would· be better to pick those Ihat cao. ':hcJp"'. to train the cl~ssifier. , , ...Thill"iSwhy-W(fry~o apply tM. C9l)~ptof activeleal'11ing to this task;wh~reour learning algorithm h'~s 'control over which parts of the inputdornain it recej'~,es ;iil(on\lationabouvfrom the user: This·
' concept
IS very similar t6 the human form of learning, I"hereby problem domains are examined in an activemanner. : ,.:;."
Afte r· ifitrOducing · the
Cell Assay Image Miiier in Sectioil·n,
we propose a'· new: clustering scheme (Jiat USbs the .,FIJl'.zy c-
means',algorithmwith' noise detection, which-is .. de:scFibcd in Secti.on·m::'A sampling
scbeme
tbat makes :usc. of: thefuz.zy membetshipslS p~oposed in Section IV, We show,· results ill Section V and disCu~s' related 'w()~k' hi: Section VI, before drawing conclusJOi\:siri Section VII,. ' .. ,J:k;QlH:.L·ASSAY.IMAGB.MI~ING .. .i>.' In'lhili'sectioit W(i::jntrtiducethe'CclfJA:S'silY JliingeMJiier, a'"
sdftWare' to\ lrtploreari:d ' categotite
'celrlt~SaytmageS:Atypical' .'
cell .. in Fi,f.~
..
~e ,!:~, i':\/'::~"~i'.' "
< ~'. ii:,c1\1;iderilifyi ntereitingsub~tfiictuie~;'iiidife,lmage;:tti~' otlgi" ...
oal image must I,>e segmented in order to calcu'ta~~ the fealures
Ersch. in: NAFIPS 2006 : 2006 Annual Meeting of the North American Fuzzy Information Processing Society ; Montreal, Canada, 3 - 6 June 2006 / IEEE. - Piscataway, NJ : IEEE Operations Center, 2006. - S. 188-193. - ISBN 1-4244-0363-4
Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-244036
for each cell individually. Unfortunately, the appearance of different cell types Gan vafydramatieaJly, T~erefore, different methods for segmentation. have to·be applied.according to the different cell types. However, the individual cells·in one image tend to look si.milar.
Currently. good results are obtained ,by an approaCh that detects .• a cell nucleus in an. image based on a trained neural network. After this step, a region. growing is .. ,performed.in a similar manner to thc approach described in (1 J, The result of such a· segmentillion step is shown in Figure 2,
A.n6r·the im~ge haS
'heeil
'segrneriteg,we
cancakllia(e
ib.\,\~~~i~.li~~i~i
that samples pointS:lh
: arii!itaie.i il.I<>rtg,a
vector. The histogram featilrescomPtise' tfie
'riieai1;'va6ance"skewness, ,kurtosis;. an1Fenf~~~ ~~::~,Seti ~~~ij;i;~:ttf~Jiil (i~]';~ire~ent'~tatis~;e~' .
of theco-ociurreriC'ii OliilJ'r)('.~f(h~gr~Y (eyel 'i~agc. four co- opcuITenCe mattlces
:fromhori 'irioml;
'verik~l;diiigoQal; "and;~~!tl~!t!
Lo.forffi.acOmbil1eMe~tufe
veciot: .one P6ssibWtymtgi)l
be toassign weignt~to,.ellCh'feattirej~'.()rde{'to co~trot'it;!:{Iff.ti~PQe
~~?i!~3a~~~!~~d~
these images cunstitute our fe.aturevec;l6rs~',As"we~·cao.·'see
from,thesepreprociS'§irigiieps/ihi:ririni.bliiQ(cJ«(~"iXilri.lSmaY
become very: large;, as'w.es~gment thousands;iirimages.;iolo
small subirnagcs (approximate./y 200::smaIU;eJl imag~s;per
original image), we relleh an
order
ohitilliohsOf imageS;Our goal is to classify the :()rigioal)mages by classifying each individual cell wilhin. .. :,' . ,,,,' '.
At the beginning, we do nottave'i0y labeled instaTItes, but we 'can make use of a biQlogieal..expe!1 w)Jo"isabI¢ to p~6vide '~ tlass label fi)r
eac1i 'cell itilJ
ge :lhaCIs sii6~~ to bim. the problem is to c1assJt'J, the whole' datasetwid,
as few labeling steps·as possible:'We 'h~v'8 a cert~iJI· oeg~¢ of freedom Considering tbe"misc!a§slficaliQ'ii as The'Wbb1f(Ji!'i:age is classified by amajoiily decision 6ved~e 'sma'll celfiriiiiges.U
a
Cl~ar iilajo,rity 'decision cari"ber#~de. th.eOirrrageJ~>10t . considered Juriher ... Bordetline tllses wi~h e·qiial; dr~~uiIonsof clnsses are sO.ned into a speCial 'CQ~lainer't() beass~$sed manually 'by the bIological
expert.
Itf;&oriui.~"?P ' ,..at
this approach IlllowS"f6r 'a 1'lIh!i~r hig;hc;fault'!giei'iil% .. a
~~~i:;~:~~:: : s ~ ~b= :~~:~~ o~~~~ng ~i f~~ i~~!e~, by
In
the n ext section
wep~plf~ a,sffi¢m~,'tli~~,itie:kiit~lliis speCial setting by first cT()sterJ:frg':.thii'jWJlbJe-:~nl~~j~d'd~t unsupe-rvised aMtheilnssigtiiiig class
labels 'foiliecJuster prototypes. This classifi,Cll,tion can tbi!ii IktlIJJilstedby-thc user;w e
pioP6ie-a query function tliadries to select the most useful examples by takinginto
account the fuzzy rnembel'Ships.'<'Hi: Fu~h G~M!.lANS ,virH
NoiSE
'DETEctiONThe futzy c~;U~ns (FCM) algo(ithm
(4)
is a ~6il:..k~~wli unsuperviscdJerumiog techniquc·:that-C3tl'OO:.usetIrt6rev.eal
the Oi!9¢.i:lyiog stlif"~!j.¢,oft.he. dat~ ~ased',qn,a,!!ilnjlarity. me.~slll'e,~~~I~rs ;I':~ ~~~4~I.~fe$ ~t~~~~~};; ::~1 n~:'{f;~~
ffie6xtended' ,i 6rslon
from1S1Jorthea\ided detectiOn ofnl:iise:'
d~~lir~ 7 ~· .. g~ i§j; ~; ~~~~~~~~:~:tr·i~a1~.~:~t~;t~W:
cltisieri;.V:is'the nlatrli with ccieffiCi,i;!l)t~ when~vi.l' denotes the.menitx:~bi~:9f x~JocWs\~r k. Gi'YiiiJ
a:
dlsiiiii~efunction~tl~~}ft~~;gt~t~;~~
m ':€ .:( { ;00) is ~the: fuzti fi~llti~ti':pttrtiineter .andiird iCii,is' .oo:w
1n:\ich';ttie·"'ciusfetS··.ai'C 'allo\vCd'ld'overlap'etleh;-rime\:,:the' . first term cQvr~p<>.nds 10 the ,!ormalfuzzy o;iriiiiiri&!tibjli(;tlie : Juhtlic5.h;Whcfeasthesecond:\ii'Ol:ilitse& (r6ffitbe ri6i!le-'ehliltet
;;~~.~~~ ~~t:~::::~~~v::;d:~~;~~~:~!~~~~;16!;:~~ · .•..
~~~~:n~±~1~'61~~~i;:~f~~~~~::~~!~~~~~~~:~;';
dett:(lted: lIsi,having.a high'mcOlbel'Ship
to- fhe'tfoiseC/uS'tt5r:.".7 in· '.'
is s.ubjecl.to mini.m~~lltion uo.d~r.J\t!:. ;. ~ou~t~inb'i>' .. ~ :"".:.~ : .. ' . : ~ ";':.' ". . '. .. : :., ... :: .. ;:~::; : " .
':;':/:'i:::,'~:>:::.~:c:::,,'~f fR,;~"~5;5~;Xt,::{
,., ~.<.
Fig. 3. Table showing each cell with its corresponding mask and numerical features.
FCM is often used when Ihere is no a-priori information available and thus can serve as an overview technique.
IV. FROM CLUSTERING TO CLASSIflCATION Based on the prototypes obtained from the FCM algorithm, we can classify the dataset by first providing the class label for each cluster prototype and then by assigning the class label of the closest prototype to each datapoint.
Datapoints that are detected as noise are remcved because they do not help to enhance the classification. We will give reasons for doing so later_
In order to have enough information about the general class label of the cluster itself that represents our current hypothesis, we perform a technique known as Cluster Meall selection [6].
Each cluster is split into subclusters; subsequently, the nearest neighbor of each cluster prototype is selected for the query procedure. If the class distribution within the current cluster is not homogeneous, we replace the prototype with the proto- types of the subclusters. We call this the exploration phase, as we ure trying to get an overview of which kind of categories exist in the dataset.
A common problem is that the cluster structure does not necessarily correspond to the distribution of the classes in the dataset. The redefinition of cluster prototypes could increase the classification accuracy. We make use of the Learning Vector Quantization algorithm for this task, which is described in the following section.
A. Leaming Vector Quantization
Learning Vector Quamil.ation 17] is a so-called competitive learning method. The detailed steps are given in Algorithm I.
The algorithm works a~ follows: for each training pattern, the ncarest prototype is identified and updatcd. The update depends on the class label of the prototype and the training pattern. If they possess the same class label, the prototype is moved closer to the pallern, otherwise it is moved away.
The learning rate l controls the movement of the prototypes.
The learning rate is decreased during the learning phase, a technique known as simulated amleafing [8]. The LVQ algorithm terminates if the prototypes stop to change sig- nificantly. One basic requirement in the LVQ algorithm is
Algorithm 1 LVQ algorithm
t: Choose R initial prototypes for each class ml(k), m2(k), . .. , mR(k), k = 1,2, ... , K, e. g. by sampling R training points at random from each class.
2: Sample a training point Xi randomly (with replacement) and let mj(k) denote the clo'sest prototype co
x,.
Let gi denote the class label of Xi and OJ the class label of the prototype.3: ir 91 = gj then {that is they belong to the same class}
4: move the prototype toward the training point:
mj(k) +- mj(k)
+
((Xi - mj(k)), where € is the learning rate.3: end ir
6: ir 9, ~ gj then {that is they belong to different classes}
7: move the prototype away from the training point:
mj(k) +-mj(k) - ((Xi"" mj(k))' 8: end if
9: Repeat step 2, decreasing the learning rate E to lero with each iteration.
190
that we can provide a class
la.t0~r.~;,e~W,tiaining
point Xi that is randomly sampled,.wI>.AiSii~,ri)~;thi!'t~tlf~;ttaining set is unlabeled -however an eipetf;;¢an;~rtf\iid¢"G~\\iith class labels for some sel.ected example$: Akw~;diN;;hl>'(l~i>e1 a small sct~~~~Ilrf~~~~
it influences the performance of the classification. Assuming access to a noiseless oracle. it is vital to gain as much as information as possible fr.om th~, smalle~t ppssiblc number of t:xamples. If we act on the assumption ·tbat the underlying structure· fqung.:~y. . )CM algorHhl)l. a~a .1nheres an
~~ai ' ;:"~f lir~ ~2
':~StittS'in
... ' ,,, ' ., .. ~,i.i~~~~;
"d ' ,The rank.ing is based on the fuzzy memberships and can be ex pre&sed for each datapoint X, as follows;
Rank(Xi) = 1 - (min IV;,k - vi,d) Vk, l = 1" .. c k f. l (3) Note that,we also take into account, the class label af each cluster. Only if the clus.ters correspond ro differeht classes is the rank coi'nputed .
After ,all dntapoints are ranked, we·can select. a.subset with higb rilnks to 'Perform the next step: dlvl;rsity selection,iThis prevents the active clustering scheme from .choosing ':p(jints that art; too close to each other (ana-therefoIe'ure together not that interesting). We refer to the ja·rrhest-jirst.tra.versriL11 0]
usuiilly used in clustering. It selects the most diverse exaiilples by choosing the first point at ran.l\om and the next points as f;u;tlwsta\yax Jr9m !l!1? cu.t;rent. SC.t. of sel,~ted.,inst,a/lces",
The
.disl~nGe4 ·
fr.oma d3t<lpoint x .to ,the set $ is.detlli<id as
d(S,x) ='mi~~Esd(~,y}:kno~n as the mi,.-max.dl~iaf)ge.,
.. '
While takil1~,in~, acc()unt s;lmple$at the.d~i~i9n.bound
ari,es betw.een cJu~ters., the cup:enl .. bYJ?OJJI,esi.s .. shQuldalso.pe verilled. AC!ll~i~r,.nwi\O sclec~ii,m. steP
as
:inel1tiof!~in,th~.el\plor!lti.onpb~ b~Jpsto .. i;onsolidate J,hecla~si"paHon, . ' .
We,
S':lI.w.nari~"theJlroc~ur.~.we,~a~e :dev.~io~.~<>-J;u, i~ . the foJiow)ng,secti.on" ' : ' , , ' " . " . . . . , .be
'/ d :·~s '''t;In~ e::~1; t ·w~reeeofn~%s ~; ,:i~ ' ! 1 ··. ·I P n ).~ t ::pe;h ' e~ ' ::1"e ···a~tuw.ee
ex'jj8~tt
I: .. "= . . . .. ,
CAdap.ti~;'. J~tiv~CI.qsJif;caijijiJ /:"
space is small, we· can label datapoillls close to the prototype
;:.
with a high confidence, wherea$ lheconfidcnce is lower for OJ.lr"adapti¥c a.ctiye,·classification procedure
Is
;based.:cOn"points lying between d'ffei'~~i cbj.Stii~~. ..:', '.'., . a combination of the tech.nj,q\le~ that·J:taw, be~n"m~ntio.~C(! . . ,' abo)le.~AlI st~P~Jlr.~Jisted
in.
Ng()~t1mJ)..)Ve start t<i,c1ustero~ dat,asel. :.yith
, me,
Ji,lUY c,-meansalgoritiu.n" because we expect. 4en$,tl,Jegi()n~i.ttt~P: featur~ s'pac;~ that are 1i.I\i~)Y to bear ; the same ciassJ~r t'l!cf.<;fore, tb~.fu~y c-mcap.s.a!gorit/Jmgiy~~ us ag<><X,l,jnjlj;ali;l.llti.()p.andprevenls us from)abeJing
unl!~~\!es:;,ary W;tanecs. • " . ,
The
!:loiS¢·~~i~Ii.()nin.
the.cJl!~tering procedure sewes iti(~.. :.::: .. ' samejlweP.s~r:li~edalaP9rn~thatJ~preSenlborde:l:linecases
".' shouldliQt·be selected, as
tnese
JIOise labels would influence., .. '>
th~~J~~~i'~~oti()q,in a n,ega~~~e.w~y. F~~h~W~p'~~,ih(!~ spm-pt~lwolIJd
. tJe·
u~.ele$ for !~eclass.iffc~ti,9n., ~!>w~y~r,.nQte that in Ihis manner, we are able l6presel\~ \Inilsii.at andlonlutl.i\}f •:',<',
. ~'::'.' ,
. . Fit4l:' ..
:l¥O:'Cfu$te'~th~t
.. overt{p ,,;' " .. , .. ; cases to tbe9ser. that could be.lntco;sUng t.Ohim , . ' --'
:':"'~:: : Jy'~'.: r.::(.; '.' .;'
tW~oc:~:~~7;~e ;;~~t~~~!~t~:~~u~!Utli~~i~~~~~~~
plicableio the fuzzy.Sc;ttiffg.R~ther'tllan 'tfyfi~iiiicM1Y
tti'60sitig '
one exampJefor the·l~bel,!,Il8p.r!'We,d.~m{(which would'slow down the process), we focus' on a' seleCtion technique that sel.eci~':i'si'iiaU·:biifch:6LN:?SlirnpJesf(j"!\'e·liitieled;Ji~bte::thal a -'darn 'itein':lr( 'Is h)nsiderCil .as'beI6nghig.'tO ·duSf~r'.ki.'if 1Jj'k"
is' tM:lijgJiestam()'rigd1i·'ihem~rlihip.!v~Itiesitf (,fe' coriSid~i"
thedari( pOints biitween·twodusters:;lheY'inuilf#avelin iilm;ost' eqiial'membc'rsbtp·to'bOl.hof'{hem. The seiectionilfpeiforhl~d in t\vo Steps:'lnililil1y; u:Jrdatnp6iill.i 'lif? r':iiiklitt<accdfdlng . to theit'meinbershlpstocluStel"prot6ty~%esribsequ~ntly, the most diverse examples are chosen from.this'tAAirOf'C::xlfmpli:s··
toav<M choosing'poiJits 'tharlir"etob'e16sec
lo
eac/i' bther:After a batch of N ex~ples .tHis;been selec\ed from within each' cluster and from the
borders': of ui(:
Clusto"l'S, '"the 'user intC'*t~QII::fli1<es ,pla~:,;tlie:¢x~tti:~3S:td'laool.ea~h,ex:.\\mple .• The hewly·'labCllll);:samples·'art\".-:fhen. ad(led ·to ·the;:tu~tll',set· .. , of lapel.ed'sQojpleS.L.Aftet this .. step(the ~1l!~tw'protQtype:s' .' cart ~.iriovedbiised orrthil 't1afrrtng
iset
L;;';,!.;-.
"::P.;:·< ... "The .. quesli~niS' wlle~ to s~op the Il)ove~e~t of th~ PI~t'(). : types;Thesimu1ateilanl1ealiog'ili'Jhe'WQalgQriChntiW{listoi1 th:e',ri1ovemenfaJ'rera certiiiil' humber of'iterations;'.Howevet;' ; an'1!cl:eplable'SOlutton· ma}h'be, fOilifd:earlier,·which-is'.why'We", prtJposei.fu.rtMr:·sto)Jpintrctiteria: .. .. . ,.~. ,"'.~'" ",·i~"
J)' Vdlitiiiy' Me-al'urts:':Cari' :give'us il'lfotmal\on-·t>f·lhe quqljW of the elilsteriillr (11 }:We employ the withjn
·duster ·
vatititidn'a'nd th.e befWeenCl~tet vailation'as 'an,indfctlttlr:·TlUsJ . descriptor' can ba useful for.the-initial :selection QflUtrib\l.tes.
Algorithm 2 A<;Iaptiy~ Active Glustering I!ocedure,
I: L <-0
2: Perform the. fuzzy c-mea~~ algorithin with noise deiectiori (unsupervised),
3: .Filter out noise datapoints.
4: "'hile Clru;sificati(ih, accumliY needs improvement ,do 5: '.' SeJect.kU'~ining examples within. the clusters and from'
the borders. .'
6: Ask the user for the l~bels of these samples, add:Ui'~m
·.to L.
7: Move· the prototypes according to L .. 8: Decrease the learning,rate 0;.
9: eild:.wliile .. ;.J
Naturally, the significance of this method decreases with the subsequent steps of labeling :~nd ,adaptati6nof' thtHlluster . prototypes· .. · . . . . . . . . .' . : .'
2YCiassijicatioli (j'radient/'Wc cantnakeu~of the:'JiJ~ .. ' ready labeled examples "to c(jriiiiare lhe" previous (0 :the newlY' ". ..' obtaIned res-ults. After the hlbels-
· of
the samples imidl(1l'.tid"between the clusters havc"been obtained, the clustet'prototypes are moved. The neW ClllSslfication Of the:dataset is <lei'Ned tly assigning to each data point the class of ils closest
¢luster·'
prototype. By comparing the labels given by Ih~ user tq the newly obtain~ labels froin the·tlassifjcat1on,:we'cancaJClfl~te .
. . ..
..
O.1?
. . .,
.""
0.""
. ..
",:.;;:: .~ :. 0," • :~.-
the nitio':Of iite ·number of conectly Illbeled' sanl'ples to .the .'<,,0 .• ,": ;"'
.< . ...
;,c .• ~~>,' ·9~,number oH;tlst}lytabeled examplcts.
~t~~o~'o( '~\!~) ritsei~~~oo ~J,lOiq~,~t i~~T ,~~li?n "(ieft} nnd ?iVe~ity
.3) Trar;kiiig:
Aribt.her
fridkatOf for acceptableclassificatibr! "aceuracykibV3:ck themoveTb:ent of-tile' ciU$ter·pt()t<?typi~.lf
l'lfi~~e~1~~".
' . ., .. "F·' .. r· .... i.;"",,· '·":~~'·";e .. A"., .• "... . ... '" .. ',' " ... .,. 11'''0« ' .::L~~:~!~
WJth f-ljndom selcchOri;¥;:;:;';'
.ljgaLOst our .. .sa. mpllng, . .sche:~:::~
.. me on the use-·'o .. '.'i"... 'u"":,l,>c'ptesentll1g: hie···tltimerrca~·· eal"res"d
h : .. : .... , .. 6.A,;:;.,.:>.,:i .,: .. ':1\ -i.-:: :'6 '.1 .~~·:S¢1~t',.>:'C9if~sp6~aj#.i'jm«ge'Pfthe dat~ (ti~le;'that is satt~a~c a,:,set t at contlu.ns ,~;.>,,:.ca~~s~SJ? -'t]~t6:. c ass~s In
f"""" ,", ..
", r-"" r """" ",,".
"""f '''''''' ''W . d"I:" "h ... . ... ". "Ih . a.3Q dnnens)ona l.feat.urc space .. AI.th.ol,lgh,thl&"dat;asct does .not,,~,;~mS i,.(C?}: e5 ~~~fl'i 'i.rmt~ ~~:,:~ ..
,.el;;"s,PaY~'"~ d , ~,~.~~~5 " ~Jl
. inb¢(il.Uies~i;ilite:f9fW.J)"iC.~
oursth~m~h.as:been
d¢velqped,tile II f!.,~e$t meJl\l.!':'rs II? 10 tne ·act\I.a·c Ilstel' an .l,ue·.samp es 't ell . rJ:i .. , b u'''f't/i' n'll . '11.\ .' d·til I' t' ..
~t Uit:t:i6i)ndaiYf&itW~ii'i~w'6·i''Clii.~teis lf thriY-lifciin :dtl'fe:j.ent.· J, S j 'pe o~~ e e, . a Ie norm 'fan G . se ec IOn, see dus'k6:i"::':'::(":};:;:>;'~:(:">':>:""':";'::"Y:?"''''':".:' .:.",-:'.:"""" .• :. ":-': .".", c ... : ... : . Fl.gure· 8, .. '. ... ... .
..: .:... . ." .... ,;.:" ... ,,, :.>:';.: .. ":,,., '.,' :'. :'.' .. ;,.... .- .::, ,; ... ','. .A.s-canbc -eleatly,;$§Cn;.,thi:\.,l\Clive selection of datapoints
'::-''':':~' .. '. '::. ":":'~:;::·:~;··V.:~·E.x~i~i i M'E'NtAt}.~·Eiu&rs" ',' :~:.::':': ::
in t.he"lcarnhrg" :p'rOt~Ss:'~~f
·:iHC;·;-"kVQ;ai~6rithm·.-":Jeads
·tt}· a"~;:;:',,::':/ ... ~.::-.;;~~::", '~':'i:: "::::'~':~:;:""":"'i'~.; .:.'::> . < ... :. . .... ';'::"
.' '~s. M'¢,dq.i'Jj.ot:have:a;.~.mpl¢t~lYda~J.i:;d"c!!l,t;,l,o;et.: oLcell .
as,~~y:tir.i.itg~l\'·:W~.d~:olQJlsti-;Jl~:l4eeff~!l\ieiie~~:QfbuJ'414ap~ivl;l clt)'~teJil)g~~m~'~rst,on'lln attificial.4a\3$(ilt,:andJ.hen .on;ilie , ...
sa~jmage.dit.asl!l·ffQmlhe.U¢r¥a¢hlneiL~j~g)~ePQsi~!>ry....' , .[Ii}.; .. ;:"" ... ,;:. ;c ••
>,,,,, .. "'., .... "
····.~.'··,.·<:t:: ... • ....::L ... ·:.-.·
.. ,. :Figure;·$. SMW&FIPe'a~qiml}nlijona!;J~std~la ,in;~<scattf;.rpl()L!..
thedis-tribul10/l ·of, illc cJassesjs·,&ke,¥cc,\.;.fjgur~,,6.lZla,rjfi~s:; lil~:.c~i\*)Jihts,;.lil:iJhC:;*Orid:Co~@114j#e.phase.Jhe.'lnHial '.
the. dift'e~IJC" .. betweenranc,l.Qm ~1e,clio!l.oJl.thc lef.t;-sj(l~I\/ld nCr~g~J>pf,hQod..s ~~[s~a~m~,,9y,.p'.ic\qng n~Vv' eX&J;nples ran-;
eX~lJ')p..Ie.schqseJi",wh/l:,nl"rili,ing.,an,,4 piv~roity :~!e<;ljQ!l.{)n ::llle;, do:r~;Vy·,~¥m(i}l~:<i~3$~\;","d;tga~ .j)¥.pr?y!~j/lg Cp!~!i~~int$ for ....
right::side .. ,'PIe.lalter ,helps the. (,YQldgQri!hm~0'".i~p~v:e4b.e ... ap:4ir.)r~hitii'Poil)ts.>·>;, i .o:;r;. . . :'...":'.,., ,,' .'
classificatir,maccuraoy mQ~e q!lic~J.yas c"anbt}se.en;.i.nFigufl,q.", 1~+n4j.:aj)
.
app.f:pa(,:bJor!\(:~iy""se.!TIi"supe,!)'jsit~; p.IJl.st~ri/Jg ,f.ig. 7. Active vs. Random Selcction
"
It,
'·o~. ~. ~)oo~-m~~~~~~--~~--~·~~--'~~.~~~-m~.~.~~~
'''';~QI~ '
!'ig.~. Active vs. Raodom Selecljon 00 the Satimage Dutaset
fcyt
image
ciatabll!iecaiegdi'ization'i s
in~esniatei:tliii1d(jdiih cost-fnctor for violating pairwisetoiis[ralfi~iti;the"oi1jective func:lion of the Fuzzy c-means algorithm. The activ~ selectig!1 of-constraints looks for siimples4it thebotde'r of ihe'lb~s(,*~li.d~fined' ·duster 'In' the current lrei'ntioi'i;'In the w6tkof [f5], labeled patterns are incorporated in the objettive' funclion \of the FiJzzy ISODATA:algorlthm. All. th<:se ,lpproaches take ascI of ;Iilbeled patterns orconstraiotst'iS input'oofirret'h:e clustering pfi:lcess starts.· These samples are. se/t~cted··randomJy.;
1n ,( J(j), a. very ~imilru: .approach· has ;been ,p'Qposed that selects
. !he
po.int~JQ Jl~~ ;,I)~ed. o!l:IheVoroIJOi.d~agram t!:Jat is)lulucedc bY the ,l}lfe.n;l)<;C, ,veclo.r.$. :J!h~ dlltapolOls,.IO: qu.eO' ar!'.selectedfrom,,'
the' ~~:t 0(, Yoio):joi'W[tif~$ with.!iifferentsti-aiegies" ,. .. ,
. Thb ti.o,:,eit~
de'
our··appr,oach. i~' in~h~.way thflt~a,als pr~cJustered befote supervision takes place,' wliich 'enhances lite chlssiflcation accuracy .. ,,'. ' . ' .. ," .. ~:.: ;,::",:::.
,
. .. ;,;~.'. .
':":": .
...
,:.,
VII. CONCLUStON
In this ;work/we have addressed the proliJem Of classifying . a l;I.\'ge dat:)5et when only a few labeled examples can be , . ptdvided
by.
ttie user. We have shown Ihat Ihe fuuy C-Dleans algoOthm.is welJ-s.uiled to be applied to stable initial clustering and that it has the advantage that datapoints 'on Ihe border can easily be detected by scannin~through their memberships to the clusler prototypes. Based OIi'1M'label~ ofl,he Sel~ted examples from the .borders between .clusters and the labeled cluster prototypes, we .have, propOs;cd .. t<;? move ·the Cltlster protolypes, similar to tlieLearr1ing
veCtor Quanli.zation (LVQ) method. We have shown thal.-the m:lsclliSsifii:::ation ml.e can be improved faster than in the normal LVQ method.ACKNOWI,ElDGMENT
'This
work was su~P&rtii<Jby meD(;G
Resellr~h Training Group GK~1042'''Exploi'ativeAfialysis nnd'Vi~liaHiali6n M Lrirgelnfoflt\ation'.Spaces" .. :: ;;:; .... : ... : ' o J ' , .,,;'".
.,.::/~: . :;..,;,:"
(lJ·
r,
Jones",A, .. Carpe!J!c;r, ~ P. Co.Uand, ":¥orQll9i'9~,~g.n~tali\)!l of cells i>i!' iinage manifoldS," Compuler ,Vision Inr 8!f!lrit!dicid lillil//' . ·Applictiii<inrLNCS;<voli'376S, 1.005;' ,. . .'. .. . ' .' 171 .. t.: •. ~.ke. ·.'J)i!fr~ctiOll/the9<Y. O'f;.lhe:cut prQCedU!1: '!"(tits ill!provcdfoi'm. the phase c""tins\ method," PI,y.ica, vo\. I,.pp., 6\\9.-704, \.934.
[3) -it Haitit.iclc. K. Sh.mmligam, and'l. biMe;"; ''Texturu fealuri:$ for'iin·
."lll'.cl~ssI6cation," le/jE Ti-:a/l$o.cl!on.,·:tm,sYJI~~.,IPJ4II,.OIIli'~~lics;
1973. . .
[4) J. !3e.z&.k; Potiern RecoEllilitJrf i<'itl"Fuity ObJoolilitiFlpICIiM Algo·
1il/i;'(1' -NeW Y(ltk: ·PJel\UII).J>re.;s, 1981.' .' .... :.. : '. . [S] R. N. Pave, ''GtraraQterization JlI1d. dete>t;t\on of OO)S<), in cJuslering:'
PaNtm Reco81~:iirii/voi:'-·h.
no: '
11, p'p:"/iS1-664; 199i'.' " . ".(6J B. Gabrys and·L. Peti-aliieva. "Colnbiriinglabetlcd and unlabeUix! data in the· design of pattern .classi.tjcati911 ~yst~ms.:' /fl/t!rtu.Uioool Joun141 01 Approximate Rr.1lS(Jriiitg. iOO4: . "'. .' .
17J T. KObOnc.n. S.1f.OrgonhiIJg Maps, Jle\d;:lhetg: Sp{ioger Verl~. 1995.
[aJ. ~. Kir"pl\tii~Jc.;s.?· C) .. Jr, . ~d.¥. P. Vecchi, ·'O¢.l1tizatioo by SIIDuiated ·~81iI1g: .. vol. 2?0. no, 4598, pp. 671...680. 1983.
[91 Ii Nguyeil arid A, 'Sm<rillde(S, "Active teiriiing usu\g' P[Hlust~g;';
. ICM4·29Q4, " . '. . . . . ' .. ' .
[10] H.OCh!>au.DJ. an<! Shrooy', ~'A cbt-.5t JXl!>sibl.e.))e1.lris~c for .1he •. l(,oentcr ':.' .p(Ol#m," Miiilieind/ie. of9prrn/iti".1ttJ.fili>i-cli. vol. m;'oo:2, pp. 180-
. 184 198'5 '" , . . , . ' . . '
(lllJ14.:wWdb·am. "Oluster valicJjtyforJUlilY cl~stetiog 8Ig,,~thnis;··
F"U;
··:.·!S,eli and.S)'$i~lnS, "01. S, pp. 171:':185, 1981. ' ':. ': . [l2]"¢: .n. . D.J/ Newman, s. 'Hettlch and C. M:¢ii/·';UCIrepo.~itory
,: 'of l)I$chi® .'liW11illll·' ,/<,lutallases,:' " . .twa, .' lOl\line], Available:
" •. htlp:llw"!w.ics.uCi_ediJ",iJi.IearivMLRe~1t6rY.bt}!)t ... ". '.' . ' . (13) 'S, aMu.~A: BaneJjeil,lIild·R;·J; Mooi!ey,'~'Aotjv(i $t:nli'su!ierVision for ',. pl!llwi~ C()~\[(iirie<j..",J~te~ng;·, ~~~er./in8$o('f t~_J!f\M IlIIihljlliolkll
C;oll/emna on Dala, Minillg {SDM-2004I. 2004. " , [14J N. GiittI, ·M:·c::wcililtu,·arnHl'F·UoIi.\c·niitil, ''''ActiVC' seini·supcrviSl!d
cl.us.~~.g fot:1ll1ag~,.dl!.tl!l>!ISC .~teg9l)ljItlo.i;I:~: Cp!ll!!lt~~Of..d Mj!!t~
liS]'
~~~~~
J:~~etzky, ''F~y·bt~sre~n~ '~tli ~~ ~~l6a~
1£~~J'tOJ;lsaGiIQns:.O(l,s>,,f/~/J's, '1fClII ~uuJ cl.bJ!rt/?(it<~:f'J)ll.P.; .{:jI~f!Ii;li;s,
vol.
n,
pp.177;-:185 •. 1997. . .' " . . . .:.(16JM.· jh$eoiJa~hilidR:ruiicr: "ACfivclhllinlng ';kjilj·tOC'ai 'r6Odii!s'f!iJeJM1
Prof'Nsi"sl:ertt.r.,.YQI.·'1, ppd07-H7i'199.8'. '.:.;." , ., '.J.'
, ' : ( . " 0 . ' : . , . : •
.. .. ".'>. ' , ' )
. ',:;.
",: ...
.:U',"·
193 . ,; "' ' ; . : ~ ... ' . " "