• Keine Ergebnisse gefunden

Adaptive Fuzzy Clustering

N/A
N/A
Protected

Academic year: 2022

Aktie "Adaptive Fuzzy Clustering"

Copied!
6
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Adaptive FUzzy Ciusterillg .

. ' .'." ',"': . ~ '. . '., .. . . :'

Nicolas Cebron alld Michael R' B-'ei:thold

.. ·.·Departine~t 'of C;O!l1~utl}(,.and..Information Science, University of Konstanz

"··.F':'.·:~ ,. 7845.7 Konstanz, German'Y'

".' ,{ cebltoI'l,bertohold}@inf . uni-ko)')st!:anz. de

: .... ,. ; ...

--:'.-

iths~ci'-"'-:Cii>sS'ifJ:ii!gial'ge JiI'iOst~·~ith;i5iifliny 8'prl6rl infoi', m1itii!tI·j!o~&dl pwblcm 'Cspcdllily 1l'(tlidiCl<i6f bilijii'f6ntlatics:' lit thjs work;.'~::expi()rt. 'IIie ,·task . of· dilssifj'irt:g ·huiidrejlS.·~r tbl!"sa~)(!s':ot:cell'~y ·'images. obtained',by a.h!~~tJ,IrC)ughput

~lliJlgcam~r"'·1,b.e glial is"I!)·hb.~la fe"' .. ~e.cted .examples

by·.p'and~n,4;ti'~!lIOm~p.CJlII): i~l:IJ;l, the .. ~ ,of. tbe ima.ges

a.ftcrwards. Up to now, su~'jm"ge:HII:e ~lassjfi~dhy~:dp\.$ ilM' clasS)Ii~tilill techniques 'that are"desigiiedii) ~kJ~' jl" sptCilic proiif~m. We propOse "8 'new adaptive active ·dllslering scheme, basech'n.ar\iitiiWil·Fu;tl:y: (),.mC1l~ cJusleting and Lea.mlng Vecwr Q~anti~tion. Tb,is ~eml{ cjirt initiillly cluster .. Iarge datasels unsUll~nise(l:and. tb~.,alhiws for adjuStlnent -oCtbe,cla,.sification.

~y' tI!:e~f, 'M!)ti,ta(¢ bf.tbe-) c!i6ctpt ,of lic!ive le!lrru.ng, the learr~tttries to qyerYJh~ rtiOliti'U§.I'Uly.'e~'riples ih Uidl\aroilnJ;

P.l'llteSS 1Wd .. t)i~~6t€Jl(eep:$ tile ·costs"fo .. sii:i>etVisi~ii at 11 low.

leve\; 'A'c(famt-"'Otk''toY

ur e'

dassifiC::litilif{'Qf cdl ~'liy iinages II..~ ~!tJ!).is.t~l,1icr~e-j~:j!ltTocl·l!cec~. W(cJ;iinplll'e ourappnlllq.

. t,~.Qt~e"/~I.iited: t~lli~ucs in tbjs.ll~l~blised qnS!!ver~l. !lat!l~\.$.

..

"'~~ ~~~;'"

...

~:

' 1.

':INr.R.oDuctiB~·; .. ' :.~'

, 'f:IJ.~d~V~)9Il;P.~T)t~f high~~iltro~giip.ll; iinagillg i!lsttuh~ent~.

~.,g,rluore~ri<i:.;~.i<irosco~c~erali; r~~ulted j~ ,thernbc- conling-.a ,.prolJ'tislhk:·~oo.l -1.O··sludy' .. ·.We .effect·· ot"·agenls ... on

~~:~~ W~ J · ~g; :~7~i:d~~ 1i,t:;;~7::t:;':t:~ur~U.ai ~t~~

classified·bY'a'ibiologillirexperLWho,writes,u·script··to-'aila-lyie.

,:h:ell·-aSSli'{'A'i;;,(IlC'·tiJipCarancc·of th'c"cells in"'diffcfent'ass'ays ctiai:ige:~;iliesc:ripfi;ljios<

iie.

:~d:~ri\cdi n9\Yi~~anY,.l1j~tJirig the relevant features'fo' <iLassiry~,the·:'cell .. Jypes,·Ci.>rtectlyCan:be.

:~:~:~~~~:~;~~1.!.I~L 1~:'i::~~j~J!f!~~:I::~~~)::'

interpre!. As wea.rede~ling;WitJiI)9ri~Om:p.ul¢r ~X'~(t,$;;~e' ,~~ea:rnoqe\St~i\,I:~i\iV~'igryj~,~i,I.~as'frY;We;:1I~~.:·(tl~·,~6J;lceiit' ofclustenng

to

rcrlui£ihe

'c omple'xityof our

iinai~'dat<1SeL Cl.oste·r analysis. tec~niques ·have· been wid.ely usedillithe ':lre,,)

". ::ii.: .... , .~ .... ;-':;:",

c.lusteri·ng. may-not be·satisfactory. thus

, we

need. ·Io.a'dapt"ihe· chlsleringsol!)IR'it . reflects 'the 'desired classmcatl()j}::Zf.the user,

. ' ' ; .'~.' "

As we jlfe. dealing wi~h a large amount of uniahele.d data, t.h~ u~ei's'riouji.i

fabelonly a sOl a li

.subset to ti-aiiilh~~lassirler.

Choosing ~ri~oinly dr~wn ~xarriple$

fro. m

th.e .d.ata.s~i

h e lps

to improve the dassi/icati~inaccujacybut-neCct&"a'iafgenumbt:r of.iterations to collVerge. Ins,ead qf picking redundant exam- ples, it would· be better to pick those Ihat cao. ':hcJp"'. to train the cl~ssifier. , , ...

Thill"iSwhy-W(fry~o apply tM. C9l)~ptof activeleal'11ing to this task;wh~reour learning algorithm h'~s 'control over which parts of the inputdornain it recej'~,es ;iil(on\lationabouvfrom the user: This·

' concept

IS very similar t6 the human form of learning, I"hereby problem domains are examined in an active

manner. : ,.:;."

Afte r· ifitrOducing · the

Cell Assay Image Miiier in Sectioil·

n,

we propose a'· new: clustering scheme (Jiat USbs the .,FIJl'.zy c-

means',algorithmwith' noise detection, which-is .. de:scFibcd in Secti.on·m::'A sampling

scbeme

tbat makes :usc. of: thefuz.zy membetshipslS p~oposed in Section IV, We show,· results ill Section V and disCu~s' related 'w()~k' hi: Section VI, before drawing conclusJOi\:siri Section VII,

. ' .. ,J:k;QlH:.L·ASSAY.IMAGB.MI~ING .. .i>.' In'lhili'sectioit W(i::jntrtiducethe'CclfJA:S'silY JliingeMJiier, a'"

sdftWare' to\ lrtploreari:d ' categotite

'celrlt~SaytmageS:

Atypical' .'

cell .. in Fi,f.~

..

~e ,!:~, i':\/'::~"~i'

.' "

< ~'. ii:,

c1\1;iderilifyi ntereitingsub~tfiictuie~;'iiidife,lmage;:tti~' otlgi" ...

oal image must I,>e segmented in order to calcu'ta~~ the fealures

Ersch. in: NAFIPS 2006 : 2006 Annual Meeting of the North American Fuzzy Information Processing Society ; Montreal, Canada, 3 - 6 June 2006 / IEEE. - Piscataway, NJ : IEEE Operations Center, 2006. - S. 188-193. - ISBN 1-4244-0363-4

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-244036

(2)

for each cell individually. Unfortunately, the appearance of different cell types Gan vafydramatieaJly, T~erefore, different methods for segmentation. have to·be applied.according to the different cell types. However, the individual cells·in one image tend to look si.milar.

Currently. good results are obtained ,by an approaCh that detects .• a cell nucleus in an. image based on a trained neural network. After this step, a region. growing is .. ,performed.in a similar manner to thc approach described in (1 J, The result of such a· segmentillion step is shown in Figure 2,

A.n6r·the im~ge haS

'heeil

'segrneriteg,

we

can

cakllia(e

ib.\,\

~~~i~.li~~i~i

that samples pointS:lh

: arii!itaie.i il.I<>rtg,a

vector. The histogram featilres

comPtise' tfie

'riieai1;'va6ance"skewness, ,kurtosis;. an1F

enf~~~ ~~::~,Seti ~~~ij;i;~:ttf~Jiil (i~]';~ire~ent'~tatis~;e~' .

of theco-ociurreriC'ii OliilJ'r)('.~f(h~gr~Y (eyel 'i~agc. four co- opcuITenCe mattlces

:fromhori 'irioml;

'verik~l;diiigoQal; "and

;~~!tl~!t!

Lo.forffi.acOmbil1eMe~tufe

veciot: .one P6ssibWtymtgi)l

be to

assign weignt~to,.ellCh'feattirej~'.()rde{'to co~trot'it;!:{Iff.ti~PQe

~~?i!~3a~~~!~~d~

these images cunstitute our fe.aturevec;l6rs~',As"we~·cao.·'see

from,thesepreprociS'§irigiieps/ihi:ririni.bliiQ(cJ«(~"iXilri.lSmaY

become very: large;, as'w.es~gment thousands;iirimages.;iolo

small subirnagcs (approximate./y 200::smaIU;eJl imag~s;per

original image), we relleh an

order

ohitilliohsOf imageS;

Our goal is to classify the :()rigioal)mages by classifying each individual cell wilhin. .. :,' . ,,,,' '.

At the beginning, we do nottave'i0y labeled instaTItes, but we 'can make use of a biQlogieal..expe!1 w)Jo"isabI¢ to p~6vide '~ tlass label fi)r

eac1i 'cell itilJ

ge :lhaCIs sii6~~ to bim. the problem is to c1assJt'J, the whole' dataset

wid,

as few labeling steps·as possible:'We 'h~v'8 a cert~iJI· oeg~¢ of freedom Considering tbe"misc!a§slficaliQ'ii as The'Wbb1f(Ji!'i:age is classified by amajoiily decision 6ved~e 'sma'll celfiriiiiges.

U

a

Cl~ar iilajo,rity 'decision cari"ber#~de. th.eOirrrageJ~>10t . considered Juriher ... Bordetline tllses wi~h e·qiial; dr~~uiIons

of clnsses are sO.ned into a speCial 'CQ~lainer't() beass~$sed manually 'by the bIological

expert.

Itf;&oriui.~"?P ' ,..

at

this approach IlllowS"f6r 'a 1'lIh!i~r hig;hc;fault'!giei'iil% .. a

~~~i:;~:~~:: : s ~ ~b= :~~:~~ o~~~~ng ~i f~~ i~~!e~, by

In

the n ext section

wep~plf~ a,sffi¢m~,'tli~~,itie:kiit~lliis speCial setting by first cT()sterJ:frg':.thii'jWJlbJe-:~nl~~j~d'd~t unsupe-rvised aM

theilnssigtiiiig class

labels 'foiliecJuster prototypes. This classifi,Cll,tion can tbi!ii IktlIJJilstedby-thc user;

w e

pioP6ie-a query function tliadries to select the most useful examples by taking

into

account the fuzzy rnembel'Ships.

'<'Hi: Fu~h G~M!.lANS ,virH

NoiSE

'DETEctiON

The futzy c~;U~ns (FCM) algo(ithm

(4)

is a ~6il:..k~~wli unsuperviscdJerumiog techniquc·:that-C3tl'OO:.usetIrt6

rev.eal

the Oi!9¢.i:lyiog stlif"~!j.¢,oft.he. dat~ ~ased',qn,a,!!ilnjlarity. me.~slll'e,

~~~I~rs ;I':~ ~~~4~I.~fe$ ~t~~~~~};; ::~1 n~:'{f;~~

ffie6xtended' ,i 6rslon

from1S1Jorthea\ided detectiOn of

nl:iise:'

d~~lir~ 7 ~· .. g~ i§j; ~; ~~~~~~~~:~:tr·i~a1~.~:~t~;t~W:

cltisieri;.V:is'the nlatrli with ccieffiCi,i;!l)t~ when~vi.l' denotes the.menitx:~bi~:9f x~JocWs\~r k. Gi'YiiiJ

a:

dlsiiiii~efunction

~tl~~}ft~~;gt~t~;~~

m ':€ .:( { ;00) is ~the: fuzti fi~llti~ti':pttrtiineter .andiird iCii,is' .oo:w

1n:\ich';ttie·"'ciusfetS··.ai'C 'allo\vCd'ld'overlap'etleh;-rime\:,:the' . first term cQvr~p<>.nds 10 the ,!ormalfuzzy o;iriiiiiri&!tibjli(;tlie : Juhtlic5.h;Whcfeasthesecond:\ii'Ol:ilitse& (r6ffitbe ri6i!le-'ehliltet

;;~~.~~~ ~~t:~::::~~~v::;d:~~;~~~:~!~~~~;16!;:~~ · .•..

~~~~:n~±~1~'61~~~i;:~f~~~~~::~~!~~~~~~~:~;';

dett:(lted: lIsi,having.a high'mcOlbel'Ship

to- fhe'tfoiseC/uS'tt5r:.".7 in· '.'

is s.ubjecl.to mini.m~~lltion uo.d~r.J\t!:. ;. ~ou~t~inb'i>

' .. ~ :"".:.~ : .. ' . : ~ ";':.' ". . '. .. : :., ... :: .. ;:~::; : " .

':;':/:'i:::,'~:>:::.~:c:::,,'~f fR,;~"~5;5~;Xt,::{

,., ~.<.

(3)

Fig. 3. Table showing each cell with its corresponding mask and numerical features.

FCM is often used when Ihere is no a-priori information available and thus can serve as an overview technique.

IV. FROM CLUSTERING TO CLASSIflCATION Based on the prototypes obtained from the FCM algorithm, we can classify the dataset by first providing the class label for each cluster prototype and then by assigning the class label of the closest prototype to each datapoint.

Datapoints that are detected as noise are remcved because they do not help to enhance the classification. We will give reasons for doing so later_

In order to have enough information about the general class label of the cluster itself that represents our current hypothesis, we perform a technique known as Cluster Meall selection [6].

Each cluster is split into subclusters; subsequently, the nearest neighbor of each cluster prototype is selected for the query procedure. If the class distribution within the current cluster is not homogeneous, we replace the prototype with the proto- types of the subclusters. We call this the exploration phase, as we ure trying to get an overview of which kind of categories exist in the dataset.

A common problem is that the cluster structure does not necessarily correspond to the distribution of the classes in the dataset. The redefinition of cluster prototypes could increase the classification accuracy. We make use of the Learning Vector Quantization algorithm for this task, which is described in the following section.

A. Leaming Vector Quantization

Learning Vector Quamil.ation 17] is a so-called competitive learning method. The detailed steps are given in Algorithm I.

The algorithm works a~ follows: for each training pattern, the ncarest prototype is identified and updatcd. The update depends on the class label of the prototype and the training pattern. If they possess the same class label, the prototype is moved closer to the pallern, otherwise it is moved away.

The learning rate l controls the movement of the prototypes.

The learning rate is decreased during the learning phase, a technique known as simulated amleafing [8]. The LVQ algorithm terminates if the prototypes stop to change sig- nificantly. One basic requirement in the LVQ algorithm is

Algorithm 1 LVQ algorithm

t: Choose R initial prototypes for each class ml(k), m2(k), . .. , mR(k), k = 1,2, ... , K, e. g. by sampling R training points at random from each class.

2: Sample a training point Xi randomly (with replacement) and let mj(k) denote the clo'sest prototype co

x,.

Let gi denote the class label of Xi and OJ the class label of the prototype.

3: ir 91 = gj then {that is they belong to the same class}

4: move the prototype toward the training point:

mj(k) +- mj(k)

+

((Xi - mj(k)), where is the learning rate.

3: end ir

6: ir 9, ~ gj then {that is they belong to different classes}

7: move the prototype away from the training point:

mj(k) +-mj(k) - ((Xi"" mj(k))' 8: end if

9: Repeat step 2, decreasing the learning rate E to lero with each iteration.

190

(4)

that we can provide a class

la.t0~r.~;,e~W,tiaining

point Xi that is randomly sampled,.wI>.AiSii~,ri)~;thi!'t~tlf~;ttaining set is unlabeled -however an eipetf;;¢an;~rtf\iid¢"G~\\iith class labels for some sel.ected example$: Akw~;diN;;hl>'(l~i>e1 a small sct

~~~~Ilrf~~~~

it influences the performance of the classification. Assuming access to a noiseless oracle. it is vital to gain as much as information as possible fr.om th~, smalle~t ppssiblc number of t:xamples. If we act on the assumption ·tbat the underlying structure· fqung.:~y. . )CM algorHhl)l. a~a .1nheres an

~~ai ' ;:"~f lir~ ~2

':~StittS'in

... ' ,,, ' ., .. ~,i.i~~~~;

"d ' ,

The rank.ing is based on the fuzzy memberships and can be ex pre&sed for each datapoint X, as follows;

Rank(Xi) = 1 - (min IV;,k - vi,d) Vk, l = 1" .. c k f. l (3) Note that,we also take into account, the class label af each cluster. Only if the clus.ters correspond ro differeht classes is the rank coi'nputed .

After ,all dntapoints are ranked, we·can select. a.subset with higb rilnks to 'Perform the next step: dlvl;rsity selection,iThis prevents the active clustering scheme from .choosing ':p(jints that art; too close to each other (ana-therefoIe'ure together not that interesting). We refer to the ja·rrhest-jirst.tra.versriL11 0]

usuiilly used in clustering. It selects the most diverse exaiilples by choosing the first point at ran.l\om and the next points as f;u;tlwsta\yax Jr9m !l!1? cu.t;rent. SC.t. of sel,~ted.,inst,a/lces",

The

.disl~nGe

4 ·

fr.oma d3t<lpoint x .to ,the set $ is.

detlli<id as

d(S,x) ='mi~~Esd(~,y}:kno~n as the mi,.-max.dl~iaf)ge.,

.. '

While takil1~,in~, acc()unt s;lmple$at the.d~i~i9n.bound­

ari,es betw.een cJu~ters., the cup:enl .. bYJ?OJJI,esi.s .. shQuldalso.pe verilled. AC!ll~i~r,.nwi\O sclec~ii,m. steP

as

:inel1tiof!~in,th~.

el\plor!lti.onpb~ b~Jpsto .. i;onsolidate J,hecla~si"paHon, . ' .

We,

S':lI.w.nari~"theJlroc~ur.~.we,~a~e :dev.~io~.~<>-J;u, i~ . the foJiow)ng,secti.on" ' : ' , , ' " . " . . . . , .

be

'/ d :·~s '''t;In~ e::~1; t ·w~reeeofn~%s ~; ,:i~ ' ! 1 ··. ·I P n ).~ t ::pe;h ' e~ ' ::1"e ···a~tuw.ee

ex'jj8~tt

I: .. "

= . . . .. ,

C

Adap.ti~;'. J~tiv~CI.qsJif;caijijiJ /:"

space is small, we· can label datapoillls close to the prototype

;:.

with a high confidence, wherea$ lheconfidcnce is lower for OJ.lr"adapti¥c a.ctiye,·classification procedure

Is

;based.:cOn"

points lying between d'ffei'~~i cbj.Stii~~. ..:', '.'., . a combination of the tech.nj,q\le~ that·J:taw, be~n"m~ntio.~C(! . . ,' abo)le.~AlI st~P~Jlr.~Jisted

in.

Ng()~t1mJ)..)Ve start t<i,c1uster

o~ dat,asel. :.yith

, me,

Ji,lUY c,-meansalgoritiu.n" because we expect. 4en$,tl,Jegi()n~i.ttt~P: featur~ s'pac;~ that are 1i.I\i~)Y to bear ; the same ciassJ~r t'l!cf.<;fore, tb~.fu~y c-mcap.s.a!gorit/Jm

giy~~ us ag<><X,l,jnjlj;ali;l.llti.()p.andprevenls us from)abeJing

unl!~~\!es:;,ary W;tanecs. • " . ,

The

!:loiS¢·~~i~Ii.()n

in.

the.cJl!~tering procedure sewes iti(~

.. :.::: .. ' samejlweP.s~r:li~edalaP9rn~thatJ~preSenlborde:l:linecases

".' shouldliQt·be selected, as

tnese

JIOise labels would influence

., .. '>

th~~J~~~i'~~oti()q,in a n,ega~~~e.w~y. F~~h~W~p'~~,ih(!~ spm-

pt~lwolIJd

. tJe·

u~.ele$ for !~eclass.iffc~ti,9n., ~!>w~y~r,.nQte that in Ihis manner, we are able l6presel\~ \Inilsii.at andlonlutl.i\}f •

:',<',

. ~'::'.' ,

. . Fit4l:' ..

:l¥O:'Cfu$te'~th~t

.. overt{p ,,;' " .. , .. ; cases to tbe9ser. that could be.lntco;sUng t.O

him , . ' --'

:':"'~:: : Jy'~'.: r.::(.; '.' .;'

tW~oc:~:~~7;~e ;;~~t~~~!~t~:~~u~!Utli~~i~~~~~~~

plicableio the fuzzy.Sc;ttiffg.R~ther'tllan 'tfyfi~iiiicM1Y

tti'60sitig '

one exampJefor the·l~bel,!,Il8p.r!'We,d.~m{(which would'slow down the process), we focus' on a' seleCtion technique that sel.eci~':i'si'iiaU·:biifch:6LN:?SlirnpJesf(j"!\'e·liitieled;Ji~bte::thal a -'darn 'itein':lr( 'Is h)nsiderCil .as'beI6nghig.'tO ·duSf~r'.ki.'if 1Jj'k"

is' tM:lijgJiestam()'rigd1i·'ihem~rlihip.!v~Itiesitf (,fe' coriSid~i"

thedari( pOints biitween·twodusters:;lheY'inuilf#avelin iilm;ost' eqiial'membc'rsbtp·to'bOl.hof'{hem. The seiectionilfpeiforhl~d in t\vo Steps:'lnililil1y; u:Jrdatnp6iill.i 'lif? r':iiiklitt<accdfdlng . to theit'meinbershlpstocluStel"prot6ty~%esribsequ~ntly, the most diverse examples are chosen from.this'tAAirOf'C::xlfmpli:s··

toav<M choosing'poiJits 'tharlir"etob'e16sec

lo

eac/i' bther:

After a batch of N ex~ples .tHis;been selec\ed from within each' cluster and from the

borders': of ui(:

Clusto"l'S, '"the 'user intC'*t~QII::fli1<es ,pla~:,;tlie:¢x~tti:~3S:td'laool.ea~h,ex:.\\mple .• The hewly·'labCllll);:samples·'art\".-:fhen. ad(led ·to ·the;:tu~tll',set· .. , of lapel.ed'sQojpleS.L.Aftet this .. step(the ~1l!~tw'protQtype:s' .' cart ~.

iriovedbiised orrthil 't1afrrtng

i

set

L;;';

,!.;-.

"::P.;:·< ... "

The .. quesli~niS' wlle~ to s~op the Il)ove~e~t of th~ PI~t'(). : types;Thesimu1ateilanl1ealiog'ili'Jhe'WQalgQriChntiW{listoi1 th:e',ri1ovemenfaJ'rera certiiiil' humber of'iterations;'.Howevet;' ; an'1!cl:eplable'SOlutton· ma}h'be, fOilifd:earlier,·which-is'.why'We", prtJposei.fu.rtMr:·sto)Jpintrctiteria: .. .. . ,.~. ,"'.~'" ",·i~"

J)' Vdlitiiiy' Me-al'urts:':Cari' :give'us il'lfotmal\on-·t>f·lhe quqljW of the elilsteriillr (11 }:We employ the withjn

·duster ·

vatititidn'a'nd th.e befWeenCl~tet vailation'as 'an,indfctlttlr:·TlUsJ . descriptor' can ba useful for.the-initial :selection QflUtrib\l.tes.

(5)

Algorithm 2 A<;Iaptiy~ Active Glustering I!ocedure,

I: L <-0

2: Perform the. fuzzy c-mea~~ algorithin with noise deiectiori (unsupervised),

3: .Filter out noise datapoints.

4: "'hile Clru;sificati(ih, accumliY needs improvement ,do 5: '.' SeJect.kU'~ining examples within. the clusters and from'

the borders. .'

6: Ask the user for the l~bels of these samples, add:Ui'~m

·.to L.

7: Move· the prototypes according to L .. 8: Decrease the learning,rate 0;.

9: eild:.wliile .. ;.J

Naturally, the significance of this method decreases with the subsequent steps of labeling :~nd ,adaptati6nof' thtHlluster . prototypes· .. · . . . . . . . . .' . : .'

2YCiassijicatioli (j'radient/'Wc cantnakeu~of the:'JiJ~ .. ' ready labeled examples "to c(jriiiiare lhe" previous (0 :the newlY' ". ..' obtaIned res-ults. After the hlbels-

· of

the samples imidl(1l'.tid"

between the clusters havc"been obtained, the clustet'prototypes are moved. The neW ClllSslfication Of the:dataset is <lei'Ned tly assigning to each data point the class of ils closest

¢luster·'

prototype. By comparing the labels given by Ih~ user tq the newly obtain~ labels froin the·tlassifjcat1on,:we'cancaJClfl~te .

. . ..

..

O.1?

. . .,

.""

0.""

. ..

",:.;;:: .~ :. 0," :~.-

the nitio':Of iite ·number of conectly Illbeled' sanl'ples to .the .'<,,0 .,": ;"'

.< . ...

;,c .• ~~>,' ·9~,

number oH;tlst}lytabeled examplcts.

~t~~o~'o( '~\!~) ritsei~~~oo ~J,lOiq~,~t i~~T ,~~li?n "(ieft} nnd ?iVe~ity

.

3) Trar;kiiig:

Aribt.her

fridkatOf for acceptableclassificatibr! "

aceuracykibV3:ck themoveTb:ent of-tile' ciU$ter·pt()t<?typi~.lf

l'lfi~~e~1~~".

' . ., .. "F·' .. r· .... i.;"",,· '·":~~'·";e .. A"., .• "... . ... '" .. ',' " ... .,. 11'''0« ' .

::L~~:~!~

WJth f-ljndom selcchOri

;¥;:;:;';'

.ljgaLOst our .. .sa. mpllng, . .sche

:~:::~

.. me on the use-·'o .. '.'i"... 'u"":,l,>c'ptesentll1g: hie···tltimerrca~·· eal"res"

d

h : .. : .... , .. 6.A,;:;.,.:>.,:i .,: .. ':1\ -i.-:: :'6 '.1 .

~~·:S¢1~t',.>:'C9if~sp6~aj#.i'jm«ge'Pfthe dat~ (ti~le;'that is satt~a~c a,:,set t at contlu.ns ,~;.>,,:.ca~~s~SJ? -'t]~t6:. c ass~s In

f"""" ,", ..

", r-"" r """" ",,".

"""f '''''''' ''W . d"I:" "h ... . ... ". "Ih . a.3Q dnnens)ona l.feat.urc space .. AI.th.ol,lgh,thl&"dat;asct does .not

,,~,;~mS i,.(C?}: e5 ~~~fl'i 'i.rmt~ ~~:,:~ ..

,.el;;"s,P

aY~'"~ d , ~,~.~~~5 " ~Jl

. inb¢(il.Uie

s~i;ilite:f9fW.J)"iC.~

our

sth~m~h.as:been

d¢velqped,

tile II f!.,~e$t meJl\l.!':'rs II? 10 tne ·act\I.a·c Ilstel' an .l,ue·.samp es 't ell . rJ:i .. , b u'''f't/i' n'll . '11.\ .' d·til I' t' ..

~t Uit:t:i6i)ndaiYf&itW~ii'i~w'6·i''Clii.~teis lf thriY-lifciin :dtl'fe:j.ent.· J, S j 'pe o~~ e e, . a Ie norm 'fan G . se ec IOn, see dus'k6:i"::':'::(":};:;:>;'~:(:">':>:""':";'::"Y:?"''''':".:' .:.",-:'.:"""" .• :. ":-': .".", c ... : ... : . Fl.gure· 8, .. '. ... ... .

..: .:... . ." .... ,;.:" ... ,,, :.>:';.: .. ":,,., '.,' :'. :'.' .. ;,.... .- .::, ,; ... ','. .A.s-canbc -eleatly,;$§Cn;.,thi:\.,l\Clive selection of datapoints

'::-''':':~' .. '. '::. ":":'~:;::·:~;··V.:~·E.x~i~i i M'E'NtAt}.~·Eiu&rs" ',' :~:.::':': ::

in t.he"lcarnh

rg" :p'rOt~Ss:'~~f

·:iHC;·;-"kVQ;

ai~6rithm·.-":Jeads

·tt}· a

"~;:;:',,::':/ ... ~.::-.;;~~::", '~':'i:: "::::'~':~:;:""":"'i'~.; .:.'::> . < ... :. . .... ';'::"

.' '~s. M'¢,dq.i'Jj.ot:have:a;.~.mpl¢t~lYda~J.i:;d"c!!l,t;,l,o;et.: oLcell .

as,~~y:tir.i.itg~l\'·:W~.d~:olQJlsti-;Jl~:l4eeff~!l\ieiie~~:QfbuJ'414ap~ivl;l clt)'~teJil)g~~m~'~rst,on'lln attificial.4a\3$(ilt,:andJ.hen .on;ilie , ...

sa~jmage.dit.asl!l·ffQmlhe.U¢r¥a¢hlneiL~j~g)~ePQsi~!>ry....' , .[Ii}.; .. ;:"" ... ,;:. ;c ••

>,,,,, .. "'., .... "

····.~.'··,.·<:t:: ... • ....

::L ... ·:.-.·

.. ,. :Figure;·$. SMW&FIPe'a~qiml}nlijona!;J~std~la ,in;~<scattf;.rpl()L!..

thedis-tribul10/l ·of, illc cJassesjs·,&ke,¥cc,\.;.fjgur~,,6.lZla,rjfi~s:; lil~:.c~i\*)Jihts,;.lil:iJhC:;*Orid:Co~@114j#e.phase.Jhe.'lnHial '.

the. dift'e~IJC" .. betweenranc,l.Qm ~1e,clio!l.oJl.thc lef.t;-sj(l~I\/ld nCr~g~J>pf,hQod..s ~~[s~a~m~,,9y,.p'.ic\qng n~Vv' eX&J;nples ran-;

eX~lJ')p..Ie.schqseJi",wh/l:,nl"rili,ing.,an,,4 piv~roity :~!e<;ljQ!l.{)n ::llle;, do:r~;Vy·,~¥m(i}l~:<i~3$~\;","d;tga~ .j)¥.pr?y!~j/lg Cp!~!i~~int$ for ....

right::side .. ,'PIe.lalter ,helps the. (,YQldgQri!hm~0'".i~p~v:e4b.e ... ap:4ir.)r~hitii'Poil)ts.>·>;, i .o:;r;. . . :'...":'.,., ,,' .'

classificatir,maccuraoy mQ~e q!lic~J.yas c"anbt}se.en;.i.nFigufl,q.", 1~+n4j.:aj)

.

app.f:pa(,:bJor!\(:~iy""se.!TIi"supe,!)'jsit~; p.IJl.st~ri/Jg ,

(6)

f.ig. 7. Active vs. Random Selcction

"

It,

'·o~. ~. ~)oo~-m~~~~~~--~~--~·~~--'~~.~~~-m~.~.~~~

'''';~QI~ '

!'ig.~. Active vs. Raodom Selecljon 00 the Satimage Dutaset

fcyt

image

ciatabll!iecaiegdi'ization

'i s

in~esniatei:tliii1d(jdiih cost-fnctor for violating pairwisetoiis[ralfi~iti;the"oi1jective func:lion of the Fuzzy c-means algorithm. The activ~ selectig!1 of-constraints looks for siimples4it thebotde'r of ihe'lb~s(,*~li.

d~fined' ·duster 'In' the current lrei'ntioi'i;'In the w6tkof [f5], labeled patterns are incorporated in the objettive' funclion \of the FiJzzy ISODATA:algorlthm. All. th<:se ,lpproaches take ascI of ;Iilbeled patterns orconstraiotst'iS input'oofirret'h:e clustering pfi:lcess starts.· These samples are. se/t~cted··randomJy.;

1n ,( J(j), a. very ~imilru: .approach· has ;been ,p'Qposed that selects

. !he

po.int~JQ Jl~~ ;,I)~ed. o!l:IheVoroIJOi.d~agram t!:Jat is)lulucedc bY the ,l}lfe.n;l)<;C, ,veclo.r.$. :J!h~ dlltapolOls,.IO: qu.eO' ar!'.selected

from,,'

the' ~~:t 0(, Yoio):joi'W[tif~$ with.!iifferent

sti-aiegies" ,. .. ,

. Thb ti.o,:,eit~

de'

our··appr,oach. i~' in~h~.way thflt~a,als pr~cJustered befote supervision takes place,' wliich 'enhances lite chlssiflcation accuracy .. ,,'. ' . ' .

. ," .. ~:.: ;,::",:::.

,

. .. ;,;~.'. .

':":": .

...

,:.,

VII. CONCLUStON

In this ;work/we have addressed the proliJem Of classifying . a l;I.\'ge dat:)5et when only a few labeled examples can be , . ptdvided

by.

ttie user. We have shown Ihat Ihe fuuy C-Dleans algoOthm.is welJ-s.uiled to be applied to stable initial clustering and that it has the advantage that datapoints 'on Ihe border can easily be detected by scannin~through their memberships to the clusler prototypes. Based OIi'1M'label~ ofl,he Sel~ted examples from the .borders between .clusters and the labeled cluster prototypes, we .have, propOs;cd .. t<;? move ·the Cltlster protolypes, similar to tlie

Learr1ing

veCtor Quanli.zation (LVQ) method. We have shown thal.-the m:lsclliSsifii:::ation ml.e can be improved faster than in the normal LVQ method.

ACKNOWI,ElDGMENT

'This

work was su~P&rtii<J

by meD(;G

Resellr~h Training Group GK~1042'''Exploi'ativeAfialysis nnd'Vi~liaHiali6n M Lrirgelnfoflt\ation'.Spaces" .. :: ;;:; .... : ... : ' o J ' , .,,;'

".

.,.::/~: . :;..,;,:"

(lJ·

r,

Jones",A, .. Carpe!J!c;r, ~ P. Co.Uand, ":¥orQll9i'9~,~g.n~tali\)!l of cells i>i!' iinage manifoldS," Compuler ,Vision Inr 8!f!lrit!dicid lillil//' . ·Applictiii<inrLNCS;<voli'376S, 1.005;' ,. . .'. .. . ' .' 171 .. t.: •. ~.ke. ·.'J)i!fr~ctiOll/the9<Y. O'f;.lhe:cut prQCedU!1: '!"(tits ill!provcd

foi'm. the phase c""tins\ method," PI,y.ica, vo\. I,.pp., 6\\9.-704, \.934.

[3) -it Haitit.iclc. K. Sh.mmligam, and'l. biMe;"; ''Texturu fealuri:$ for'iin·

."lll'.cl~ssI6cation," le/jE Ti-:a/l$o.cl!on.,·:tm,sYJI~~.,IPJ4II,.OIIli'~~lics;

1973. . .

[4) J. !3e.z&.k; Potiern RecoEllilitJrf i<'itl"Fuity ObJoolilitiFlpICIiM Algo·

1il/i;'(1' -NeW Y(ltk: ·PJel\UII).J>re.;s, 1981.' .' .... :.. : '. . [S] R. N. Pave, ''GtraraQterization JlI1d. dete>t;t\on of OO)S<), in cJuslering:'

PaNtm Reco81~:iirii/voi:'-·h.

no: '

11, p'p:"/iS1-664; 199i'.' " . ".

(6J B. Gabrys and·L. Peti-aliieva. "Colnbiriinglabetlcd and unlabeUix! data in the· design of pattern .classi.tjcati911 ~yst~ms.:' /fl/t!rtu.Uioool Joun141 01 Approximate Rr.1lS(Jriiitg. iOO4: . "'. .' .

17J T. KObOnc.n. S.1f.OrgonhiIJg Maps, Jle\d;:lhetg: Sp{ioger Verl~. 1995.

[aJ. ~. Kir"pl\tii~Jc.;s.?· C) .. Jr, . ~d.¥. P. Vecchi, ·'O¢.l1tizatioo by SIIDuiated ·~81iI1g: .. vol. 2?0. no, 4598, pp. 671...680. 1983.

[91 Ii Nguyeil arid A, 'Sm<rillde(S, "Active teiriiing usu\g' P[Hlust~g;';

. ICM4·29Q4, " . '. . . . . ' .. ' .

[10] H.OCh!>au.DJ. an<! Shrooy', ~'A cbt-.5t JXl!>sibl.e.))e1.lris~c for .1he •. l(,oentcr ':.' .p(Ol#m," Miiilieind/ie. of9prrn/iti".1ttJ.fili>i-cli. vol. m;'oo:2, pp. 180-

. 184 198'5 '" , . . , . ' . . '

(lllJ14.:wWdb·am. "Oluster valicJjtyforJUlilY cl~stetiog 8Ig,,~thnis;··

F"U;

··:.·!S,eli and.S)'$i~lnS, "01. S, pp. 171:':185, 1981. ' ':. ': . [l2]"¢: .n. . D.J/ Newman, s. 'Hettlch and C. M:¢ii/·';UCIrepo.~itory

,: 'of l)I$chi® .'liW11illll·' ,/<,lutallases,:' " . .twa, .' lOl\line], Available:

" •. htlp:llw"!w.ics.uCi_ediJ",iJi.IearivMLRe~1t6rY.bt}!)t ... ". '.' . ' . (13) 'S, aMu.~A: BaneJjeil,lIild·R;·J; Mooi!ey,'~'Aotjv(i $t:nli'su!ierVision for ',. pl!llwi~ C()~\[(iirie<j..",J~te~ng;·, ~~~er./in8$o('f t~_J!f\M IlIIihljlliolkll

C;oll/emna on Dala, Minillg {SDM-2004I. 2004. " , [14J N. GiittI, ·M:·c::wcililtu,·arnHl'F·UoIi.\c·niitil, ''''ActiVC' seini·supcrviSl!d

cl.us.~~.g fot:1ll1ag~,.dl!.tl!l>!ISC .~teg9l)ljItlo.i;I:~: Cp!ll!!lt~~Of..d Mj!!t~

liS]'

~~~~~

J:

~~etzky, ''F~y·bt~sre~n~ '~tli ~~ ~~l6a~

1£~~J'tOJ;lsaGiIQns:.O(l,s>,,f/~/J's, '1fClII ~uuJ cl.bJ!rt/?(it<~:f'J)ll.P.; .{:jI~f!Ii;li;s,

vol.

n,

pp.177;-:185 •. 1997. . .' " . . . .:.

(16JM.· jh$eoiJa~hilidR:ruiicr: "ACfivclhllinlng ';kjilj·tOC'ai 'r6Odii!s'f!iJeJM1

Prof'Nsi"sl:ertt.r.,.YQI.·'1, ppd07-H7i'199.8'. '.:.;." , ., '.J.'

, ' : ( . " 0 . ' : . , . :

.. .. ".'>. ' , ' )

. ',:;.

",: ...

.:U',"·

193 . ,; "' ' ; . : ~ ... ' . " "

Referenzen

ÄHNLICHE DOKUMENTE

Based on this assumption, the initial co-occurrence graph is unlikely to be separated into clusters during the first iterations of the algorithm, because those terms are related to

Таким образом, целью мы ставим изучение влияния кластеров на социальное окружение и влияние институциональных факторов на процессы

The methodology presented in the previous section is applied to the various sectors of economic activity in Greece by using the available data collected from

The result are clusters located in different parallel universes, each modeling only a subset of the overall data and ignoring data that do not contribute to

We developed an exten- sion of the fuzzy c-Means algorithm with noise detection that uses membership degrees to model the impact of objects to the clustering in a particular

Based on the prototypes obtained from the FCM algorithm, we can classify the dataset by first providing the class label for each cluster prototype and then by assigning the class

The basic idea in this thesis for solving this network design problem was to cluster the sites with hierarchical clustering heuristics and to use the resulting hierarchy as support

The major changes in the institute and its organizational structure, namely the four re- search divisions: Ocean Circulation and Cli- mate Dynamics,