• Keine Ergebnisse gefunden

The Classification of Subjects with Joint Complaints on Incomplete Biochemical and Haematological Datasets

N/A
N/A
Protected

Academic year: 2022

Aktie "The Classification of Subjects with Joint Complaints on Incomplete Biochemical and Haematological Datasets"

Copied!
14
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

f .

Goldschmidt, den Hartog, Leijten, Coomans and Massart: Classification of subjects with joint complaints 739

J. Clin. Chem. Clin. Bioohem.

Vol. 23, 1985, pp. 739-752

The Classification of Subjects with Joint Complaints

on Incomplete Biochemical and Haematological Datasets

By H. M. J. Goldschmidt, J. den Hartog

1

), J. F. Leijten

Department of Clinical Chemistry and Haematology, Maria Hospital, Tilburg, The Netherlands D. Coomans and D. L. Massart

Pharmaceutic Institute, Free University of Brüssels, Brüssels, Belgium

(Received October 4, 1982/February 25, 1985)

Summary: We performed a retrospective study on 163 subjects suffering from rheumatic fever (16), rheu-

matoid arthritis (36), lupus erythematosus (17), gout (21), arthrosis (50) and osteomyelitis (23).

The number of variables evaluated was 39. These were all of a general biochemical and haematological nature. A feature reduction resulted in sixteen variables that matched well with those known from the literature.

Linear discriminant analysis yielded poor results in classifying the six disease categories (with 18 variables 61.8%).

A reduction to three disease categories improved the Classification results remarkably. This, and the excellent discriminating power between patients and the reference group, shows that the selected variables are illustrative only for general clinical pictures, such äs infection, and not for the desired differential diagnosis.

Klassifizierung von Personen mit Gelenkbeschwerden aufgrund unvollständiger biochemischer und hämatologi- scher Datensätze

Zusammenfassung: Wir führten eine retrospektive Studie an 163 Personen, die an rheumatischem Fieber (16),

rheumatischer Arthritis (36), Lupus erythematodes (17), Gicht (21), Arthrose (50) und Osteomyelitis (23) erkrankt waren, durch.

Die Anzahl der ausgewerteten Variablen betrüg 39. Diese waren alle allgemeiner biochemischer und hämatolo- gischer Natur. Eine Verminderung der Merkmale ergab 16 Variable, die gut mit den aus der Literatur bekannten übereinstimmten.

Lineare Diskriminanzanalyse ergab für die Klassifizierung der sechs Krankheitskategorien schlechte Ergeb- nisse (61,8% mit 18 Variablen). Eine Verminderung auf drei Krankheitskategorien verbesserte die Klassifizie- rungsergebnisse bemerkenswert. Dies und die hervorragende Diskriminierung zwischen Patienten und Refe- renzkollektiv zeigt, daß die gewählten Variablen nur für allgemeine klinische Bilder wie Infektion bezeichnend sind, nicht aber für die erstrebten Differentialdiagnosen.

!) Present address: Pharmacia Nederland B.V., Ohmweg 12, NL-3442 AA Woerden.

J. Clin. Chem. Clin. Biochem. / Vol. 23,1985 / No. 11

(2)

Introduction

In recent years an increasing number of articles have been written on the use of multivariate analysis for clinical chemical problems. The review of Goldberg &

Ellis (1) gives many examples in the different ap- plication areas. It shows that multivariate analysis can be applied succesfully for detecting common fea- tures, for instance, in the search for risk factors.

It also demontrates the possibilities of multivariate analysis in data reduction, by quantitating the extra Information provided by an additional analysis. Fi- nally, it reveals various examples of how to classify patients in different disease categories. This study deals with the classification of patients with joint complaints. These subjects were put into six classes based upon their complaints, radio-graphic findings and external appearance.

Usually many physical, chemical and haematological data are gathered and this raised the following que- stions:

a) are physical, chemical and haematological data useful in discriminating between these patients and is the complete dataset needed or is part of the data sufficient?

b) are these data or a subset operative in the dis- tinction between a normal population and patients with joint disease?

c) which variables are pathognomic for which dis- ease?

The answers to these questions would lead to a re^

duction in the number and the variety of tests. Any good strategy will reveal the importance of serological tests but, since we are only interested in the relevance of higher order variables, we excluded serological tests from our dataset. Also, when dealing with de- scriptive variables, it is not surprising that much Potential data for each patient is not obtained.

An earlier study of Wilding et al. (2) was focused on changes in chemical and haematological data caused by drugs or due to the disease activity, but solely in cases of rheumatoid arthritis. The present paper deals with more variables, a broader scale of diseases and the problem of missing data.

Our purpose was not only to distinguish the patient groups from a group of apparently healthy indivi- duals, but also to separate different disease categories from each other.

This study deals with a set of patients (objects), each of which generates the values of a certain number of test results (variables or features). Therefore a

multivariate approach is necessary. Classification of objects and grouping of variables are the main pur- poses of this work.

The first problem is to reduce the number of variables to a manageable set. Assuming that tests not usually asked for are less meaningful, Vfe left out all those variables lacking in 25% or more of the patients.

The remaining were ordered and reduced, using two feature selection pföcedures: variance weighting and the multivariate F-ratio. We used the selected set of variables in a linear discriminant approach to the classifiation of the patients in the different disease categories.

Materials

The study involves six disease categories äs shpwn in table 1.

The reference group is a group of apparently healthy blood donors, varying adequately in sex and age.

Tab. l. Outline of selected groups of patients.

Group no.

1 2 3 4 5 6

7

Rheumatic fever Rheumatoid arthritis Lupus erythematosus Gout

Osteparthrosis Osteomyelitis

Healthy blood donors

c?

8 18 3 14 21 11 75 57

$

8 18 14 7 29 1?

88 29

Total

16 36 17 21 50 23 163 86

In the record Offices of the St. Elisabeth and Maria Hospitals in Tilburg, The Netherlands, we-looked for those patients whose main disease state corresponded to one of the six men- tioned disease classes. All patients were classified according to the ICD.9.CM index. An independent physician read the medi- cal records and checked the diagnosis. In addition, we inter- viewed the second consultant with regard to the reliability of the diagnosis. From the responses of these three (the first attending physician, the independent and the second con- sultant) we concluded that the diagnosis for the patients in question were all well established.

Variables *

From the medical records of the 163 patients, all those data were taken which had been obtained within 24 hours (48 hours in weekends) after admission. There proved to be 150 different variables. The 24 hour time limit was chosen to minimize time, therapeutic arid drug effects. v j

(3)

Goldschmidt, den Hartog, Leijten, Coomans and Massart: Classification of subjects with joint complaints 741 It seemed reasonable to suppose, and it was admitted by the

consultant physicians, that data taken only incidentally were not gathered in relation to the joint complaints. As a first selection criterion, we therefore limited the variables to those found in at least 120 patients (= 75%). By doing this the 39 variables of table 2 remained. These 39 variables correspond with the variables normally gathered in an admission and screening profile.

To determine haematocrit, haemoglobin, erythrocytes, leukocy- tes and platelets a Hemalog 8 (Technicon Instruments B. V., Gorinchem, The Netherlands) was used. Differential counts (neutrophil segmented granulocytes, neutrophil band form granulocytes, lymphocytes, monocytes, eosinophil granulocy- tes, basophil granulocytes) were determined by manual eye microscopy. The Erythrocytes Sedimentation Rate was calcu- lated after l hour of Standing in a 20 cm Sedimentation tube.

The SMA-C (Sequential Multi Analyzer plus Computer, Tech- nicon Instruments B.V., Gorinchem, The Netherlands) was used in the assay of the analytes in serum. Standard bicarbonate was determined using a Radiometer pH meter (Model PH M, Radiometer Corp. Copenhagen, Denmark) according to the method of Jorgensen (3). Conjugated bilirubin in serum was determined on an AutoAnalyzer II (Technicon Instruments B. V., Gorinchem, the Netherlands) using the method of Borst (4).

The thymol turbidity analysis was performed in serum by the method of Kingsley & Getchell (5). Pulse rate, systolic and diastolic blood pressure, body weight, body temperature and age were all determined on admission äs part of the routine physical examination.

It should be noted again that the rheuma serology data were not included, because their diagnostic efficiency is known and our object was to study the value of the other tests in relation to the kind and severity of the disease.

Table 2 gives the ränge and the mean of the different variables for the six disease classes and the reference group. The nu- merical value of each test result for the appropriate number of variables was used to calculate, by linear discriminant analysis, the allocation of each patient to a particular disease category.

By feature reduction the total number of variables necessary to discriminate all the disease categories from the reference population was determined to be 10. The decision to allocate any patient to a disease or non-disease category is based on the statistical analysis of all included variables and not on the absolute value of any individual test result.

Statistical Methods

We were left with 39 variables. In relation to the number of patients in the various disease classes, this number is too large for multivariate techniqües like linear discriminant analysis. We therefore introduced an intermediate Step for further reduction .jof the number of variables, which uses'a combination of t wo feature selection procedures: feature selection by variance weighting and stepwise feature selection on the basis of the muitivariate* F-ratio.

Feature selection by variance weighting (6, 7) is a statistical technique in which each variable is weighted on the basis of its individual importance in the differentiation between each pair of diagnostic classes (no correlation is taken into account). The variance weight for two classes is the ratio of the interclass variance of thfe two classes to the intraclass variances of these classes (univariate F-ratio). The higher the variance weight, the higher the individual importance of the tests. This procedure was performed using the Software package ARTHUR (7).

Stepwise feature selection was performed on the basis of the multivariate F-ratio (8). The method was applied using the Software package SPSS (9). In the stepwise procedure, combina- tions instead of individual variables are considered. Initially, the single test which has the best value for the selection criterion is chosen; in the present case it is the F-ratio (the univariate one in the first step of the procedure and the multivariate one in the further Steps) for the Separation of the centroids of the diagnostic classes. This initial variable is then paired sequen- tially with each of the other available variables and the F-ratio is computed again. The variable which produces the most significant increase of the F-ratio is selected äs the second variable. The procedure continues in this way until all variables are included or the best variable in a given step of the procedure does not permit a significant increase of the multivariate F- ratio. For the multivariate F-ratio, correlations between the variables are taken into account.

In multivariate techniqües there is a relation between the max- imum permissible variables and the number of patients in a disease class, because too many variables will result in unstable estimates of the differences between the class centroids. Since the smallest diagnostic class contains only 16 patients (rheu- matic fever) not more than 16 laboratory tests were selected in every paired weighting. the number of variables was thus re- duced to 16.

The discriminatory performance of the laboratory tests was further investigated by means of linear discriminant analysis (8). In general terms, linear discriminant analysis distinguishes diagnostic classes on the basis of a set of linear functions of the variables. The weight coefTficients in these functions are calculated in such a way that the ratio of between-class to within-class Variation is maximized, considering also the corre- lation between the variables. Linear discriminant analysis was performed using the program package SPSS. The method was not applied to pairs of classes, äs in the selection procedures, but directly to the seven diagnostic classes. This was done in order to obtain a better estimated pooled variance-covariance matrix with more patients, especially when a large number of variables are being used (for instance 20 laboratory tests).

Results

Feature selection for discriminating between different disease categories

With the described two feature selection methods the data in the six disease classes were compared mutu- ally, one class with another. In each of the resulting l5 binary discriminating problems, only those variables were used which were present in a t least 75% of the patients involved in the particular paired classifica- tion problem. In this way we made use, in the compa- rison between gout and arthrosis, of only 10 variables, whereas in the comparison between rheumatic fever and lupus erythematosus, 33 variables were used.

Tables 3 and 4 give the results.

For reasons mentioned above, only the six variables with the highest score are given, and they are ranked in order of decreasing value. At this stage we were left with 25 variables, which are mentioned at least once in tables 3 and 4.

J. Clin. Chem. Clin. Bioehem. / Vol. 23,1985 / No. 11

(4)

"

D

g

Ό <Ό OO VN

O f>4 ^ 0 £ l VN NO VH <y l \o ι ι ι |· ι

<3 5 °0 ^ P0 0. ^ "Ϊ S ' NO OS rfr Tf CO NO v- ι

O

m ; <N ^ ds °°

«

ο θ

S

rj-oo

6

·%

o

r

l-i

ox

ss ,»

*. s

oo r-oo

9

^ ^ » ^ O s ' v N p O ' t P l v N "

(^( ^-.r^ foos o (N«N VN wSoo SONO rncn cnr< OSVN

oom

Ilt

^ Ι ν ο ο ι ν ο

VN VN cn ^f OS VN <»>ι ι oo <NCJ o

l v* | 00

co NO NO t- ΓΠ v^ f^ OO OO VN

<Nt^

1

o

§

r^ oo <·«·> oo 2 ϊ

£Q! ^ ®

<«-* o »n I O N

·^ OO τ-ι(Ν O ^

.3 l

cd

O

cd

O O v>oo OO <S<N O"^-· < s m -»-100 U^^^H -»^

.00x:

•ΌC

li

O

^

oo 0^ cd

«i l ^ i

l l

NO 00 <N

"

oo ^ "* O VN ·

ι .« 5? . S 2 ν ο

Q ^. oo vooo o ci<s -!-

cdc

ε

o

1

o

c

>>

cd

X,

cer

l •8 ί

i « o •S

l

.! l

JD

(2

(5)

Goldschmidt, den Hartog, Legten, Cootnans and Massart: Gassification of subjecls with joinl complaints 743

Continued

S

CN

D

%o ^

r^ CN

CN1 oo

*O CN

2

1 -

ΓΟvO

43

00

CN£

oovo

0 CN1 ^

1

c

Serum aspartate aminotra

o

rr -°

1 ϊ

C W

o0

vo ov | r»

rf vo ~ v

2 s 4s « n

oo o r^ wo IN *·«

OO

Ov ^1 ^p— *" VO

^t **ϊ Ι r4

vo ov oo 2

1 CN | 0 CN vo ^ _;

ro vo ~* *·

oo

1 oo Γ* O

CN CO | ^ ΓΟ VO VO fO

2 «

1 co ^ r^

ro"> <οττ

S

c

Serum total cholesterol Erythrocytes sedimentatio

CN CN

f f τ

β

P **.

^ T r- "T vo p oo «^ ^

O IN ~ VO VO

s § i

l l ι

P &t t**t *** o

CN <N IN vo vO

2 ci S

1 0 1 CN |

^ *Ί ^1 "Ϊ o

S d S

^- To 7

vOOv <N ^ ^ CN CN CN v vo

vo O

£v 0 g CN 1** *^ P^ _φ

* * ^^ * ^L

5 2 2

riS J.S T Γ-; ro vo -^ o

— CO CN VO vo

O CN 0

1 1

~ CN CO vo vo

C

Serum conjugated bilirubi Serum urea Pulse rate

c -^ vo

CN CN CN

p

CNr*·

0

s

g

g

00

i

"

X60

C? Ov

O

2i

=1

1 *»

CN| ^o

<N «.1

sCN

ooo

22

Syst lic blood pressure

VOCN

1

00

δ

2

7<x>

VO Ο

OCN

§

sS

0

S

vooo

0

7«>

o

| Γ-;

VO OS

Diastolic blood pressure

R

~ §· .3 o

p

^ vo ^ 1 vO 1 «^! ^ O vo VO 5 Ov O 1 ΓΟ ro

^, Λ S 2 *- CN vor*

s -- *> f ~

| ^ ' *£* | O vO .χo

v>

S 2

«os 7^ 2 ifS 1 r^ j r* 1 Ό CN CN

p

*> 2

ro *7 3^ 7°°. uo J>£

vO

vo 5 oo °. ^ vo | ^, 7 P 1 T _ T1:

1 ^ ^ VO l . CN rO

^· ro vo --* *-· ^t rf vo

VO

**"» ^ oo 1 Tf·

7 ^ ' <^ TCO ^

WO CN ΓΜ *M ,-ifO ^ VO

pVO

τ*» 75 7 ήδ

1 00 CN fO 1 P CN CN

t^vo r^-i-< -^ro -^vo

1 1

Serum alanine aminotrans Serum creatine phosphoki Serum thymol tufbidity Weight

s a s z δ s ι ™

00oo 1 £

*O vO

OO

l*

vg vo t?

oo00 VO 00

1 T

ro vo

~*

CN vO

|

c

§>

1

I s

CNm

δ δ δ δ δ υ |

νο CN «7 η

^ Ο Ι Α νο

00 Ι ο ο * - 9\ CN r s i S l

i n c N ^ 1Ρ Ι ^. | v q 2 2 o l S

Ο ^" CN fO O VO O CN Ο Ο ίΌ CO ^-* <Ο

00

«ο 3,. νβ « - J.S ^ο«

Ι ·": Ι >ί> Ι -ι Ι Ρ Ι -ί seoo l νί

Ο *^^ ^^ CN *^ ΓΟ Ο *^ Ο Ο fO ro »^ ΓΟ

00ρ

Ξ ?f 2 ΟΝ ^ Ι 00 °°

7-: i j i 7 p I p 7 - i x 5 4 · ^

ο*-« *tcN oco OCN oo r o m <NVO

7^ 7«^ 7_ » 7^ JjS? 7^

• . fO r** ' . 1 « 1 . vo r** f** ro OCN *-«<N O f > O·« OO roro — * vo

vo 0 5 | _, P 7 ^ l ^ 7 ^ T r r 7 -. 55 1 ^ι . Ο ^ ο ι i l ^. · . vo r·* oo CN

OCN ^ C N OCN O·^ OO fO CO ^ V O

00

^ 3^ Ov vo CN ν^ ^ S vc, OO *^CN O r o O ^ OO ro fO vovo

OV 00

- o | 0 ov

^ r ^ o v ^ °° ^ v o m ^ o v 1 «. 1 v 1 °°. 1 **. 1 ~ r^ oo 1 CN

O ^ Ov CN v* ro O O O O ro ro O ro

1

SC

oo g

§ £ 5

£ J2 S*

•o 2 & §

S § 2 1

5 <Λ ι- c f?

^ « 00 Λ fe

•3 >^ 8 « eb §·

0. 0 >v 0, ^ 1

S 'S, 8 g 'S. ^

•*^ e c .s o ^ 2 5* S tu iS n <

ro ^ vo vo t*· oo Ov

'! i

J. Clin. Chem. CHn. Biochem./ Vol. 23,1985 / No. 11

(6)

ht criterion.

.ZT

.S

i

"C

s

Js•4-*

'S

*?

1

α

ε

ο

0

cdC

15 c0

1 8

C 'Scd

1 8

0 f">

JD

OsteomyelitArthrosis

O

0

C/33 cd

1

.22

Ή •S

*o'o

"cd

ε

3

£

l··>

.Hg

ε

4>

S

ol*

υ

8

1 1 l ε

Serum inorganic phosphate -

1

C/J

0

1

O

O CO

o

Serum inorgani phosphate

·§ §

cd u

O *rf

sl

O g;

+_· cd

|tfc

u,

^

'Scd c υ

m*J

**<

« p-,

1 's

0 Cc o

6, -

Eosinophil , Serum total

<£g

Neutrophil band granulocytes Age

•o

'S

υ

'§ i ι

co <

Serum calcium Serum glucose

0

-S 1C cd

Λ() c\> 60 fci, 0 jS |g

•S ex exc g o

I j§ ε

§

α

ο co ω

.ε l«

Haemoglob Serum ΐηοη phosphat

S r

— s

~ W

Eosinophil granu Erythrocytes Sedimentation

ω r*

f 1

Eosinophil granul Neutrophil band 1 granulocytes

8

"50

Leukocytes Eosinophil grar

co

>>

-i£0

U

LeukocytesSerum alkaline phosphataseLeukocytes

|

TJ

Neutrophil ban· granulocytes

.SJO

•50

o

o 2 cd

"oM

1 Ss

o

1

«υ

S

Erythrocytes Sedimentation Haemoglobin

l

Haemoglobin Neutrophil band ! granulocytes

«9g

Serum aspartat« aminotransfe Serum calcium

'S

1

Cd

2 ε

1

t »

"3

I

Eosinophil

.g

1 ε

£

g

T3

Neutrophil ban granulocytes

• O

!|

Serum albumin

I

,S2

1

o

|

o

c 'S

1

'S

c

3 I o

|

co

*g

g

8

o

"cd

2 g

&>

CO

«

C

Neutrophil segme granulocytesLeukocytes u-0 -cd 80

11

4_» cdS W)

Serum alkaline phosphataseLeukocytes

"

Serum lactate dehydrogenas

1 1 « 21 o

1 1 I I I

3 1 3 8,8 8 S "5k . rS a g >»

1 | | || 5

co co co co < >-4

22

i II 1 1 -i 1 1

ε ΐ ΐ ε ε ε "ε^

2 &8 2 Ξ ε 2-8

β> Ι·* ο D Ο

co tu co co co co

s S -α

eo «

τ,

αS C

ε ε

ΓΞ ·α a, ce s S .S § « £ 'S

1 i t ? llt I| U P i

co 3C u Q 2 co

'

I 3

cd

£

§

oJ

g.

M

5 ..'

(7)

Goldschmidt, den Hartog, Leijten, Coomans and Massart: Classification of subjects with joint complaints

745

.S ι

(2

l. εa s i

i|1 l l l

C/D A OO

iH M

s

f I I

l5 -jC c

i'l

^ .S c c

f i i

.a J o

•a .s

i l i ε ε

II

l l -S l * .1

ε ε ε s s s l

A

l

Now each of the six variables was given points ac- cording to its ranking order in tables 3 and 4. The points in each of the two comparisons were summed, and the three variables with the highest score for each binary comparison are given in table 5. The number of variables used in every comparison are given in table 5 at the bottom right of each square.

After this Operation there now remained 16 variables that are mentioned at least once. Textbooks of path- ology (10—12) also mention laboratory fmdings that more or less regularly accompany the various diseases. A compilation of these observations shows that the following are usually proposed s a monitor:

Rheumatic fever:

body temperature, age, leukocytes, erythrocytes Sedi- mentation rate, haemoglobin.

Rheumatoid arthritis:

haemoglobin, erythrocytes Sedimentation rate, sex, leukocytes, platelets.

Lupus erythematosus:

haemoglobin, age, platelets, serum albumin, erythro- cytes Sedimentation rate, leukocytes, serum total pro- tein.

Gout:

sex, serum uric acid, serum glucose, systolic blood pressure, diastolic blood pressure.

Arthrosis:

age, sex.

steomyelitis:

neutrophil segmented granulocytes, leukocytes, ery- throcytes Sedimentation rate.

Table 6 shows the 16 variables, found by our Opera- tion and mentioned in table 5, in relation to the 13 'textbook variables', and shows that there is a strong similarity between the variables we calculated and the fmdings mentioned in textbooks.

J. Clin. Chem. Clin. Biochem. / Vol. 23,1985 / No. 11

(8)

i. 1

00

i

t/i

$ §

jj

0

^dV ti

1 1

00C

*N*2

'x

Ϊ

O0

1

CO

3rt

*c

cd

0)

J 'S

o

cd O

parisons taking in

8

S 13c

| £

| 8

1

Cd

* "

OsteomyelitisArthrosis

0

Lupus erythematosus

co

•c

Rheumatoid arth

i

o

1

p

45 C*

HaemoglobinSerum inorganic phosphate

ε&

1

o

coo

Neutrophil band form granulocytes g

£

Neutrophil band granulocytes

u

^

13|

1

J s^

1 13

0

|

α

co

0

S

Erythrocytes Sedimentation i

|

Neutrophil band 1 granulocytes

co

Serum inorganic phosphate

|

Serum* inorganic phosphate Eosinophil gran

t2 1

Neutrophil band granulocytes Leukocytes

Serum uric acid Lymphocytes

Serum inorganic phosphate Haemoglobin

Haemoglobin Lymphocytes MonocytesHaemoglobin

f

co

Eosinophil granulSerum calcium

"cd

Erythrocytes Sedimentation i LeukocytesSerum calcium .£

t

2

CO

co

±?

8

Eosinophil granul

I 1

^

υ

1

·*-»

g

co

<

HaemoglobinSerum calcium

•S •c

cdIM

1

*^

a

Λ/

1

| co

T>

MonocytesSerum glucose

I

Neutrophil banc granulocytesSerum lactate dehydrogenase

.P

Neutrophil band i granulocytes

!

D,

I |

1

0 0

«o'5 .g

« 2 S

-s^ s B 8 a s

tq. <

ffi

<s |

^ l a I

Π3 0 >L 2

•&I 1 -E

5i t i

g M | g

2 ^ ig

g

** C2 °

j l ΙΊ 's

Λ Ό S

l s l 1

Leukocytes Neutrophil segmented granulocytes Serum total cholesterol '·

| i j

I -1 -s S | ί

., -a "S 2 §"1 I

·< WJ WJ C/3 t CO

2

§ β 3

sl 1 !s l l g>S ·§ §|[ ^ ^

1 1 | s t 1

•3, -S S. g. 8 g

i i 1 1 1 ι

;;

3

1 I

w

1

§

(9)

Goldschmidt, den Hartog, Leijten, Cootnans and Massart: Classification of subjects with joint complaints

747

l

(2

l § *

l 8

s, B» l g ß E

< W Ä Ä

Feature reduction while discriminating be- tween different disease categories and a ref- erence group

After the selection of the most meaningful variables, the classification of patients was studied using linear discriminant analysis. This method is well known in

(

clinical chemistry (l, 2).

When a patient is characterized by 10 variables, the patient may be represented äs a point in 10-dimen- sional space. Patients situated near to each other have similar patterns and, when the variables are sufficiently relevant, suffer from the same disease (or are both healthy).

Unfortunately, one cannot directly view a 10-di- mensional space, but methods are available to rep- resent this hyperspace in an optimal way in only two dimensions. The linear discriminant functions developed in linear discriminant analysis permit this.

Figure l a gives such a discriminant plot for the utilization-of 20 variables; l b is the same plot but includes only 10 variables, and I c shows the plot when only 9 variables are used. They demonstrate, äs denoted in the legend to these figures, that it is possible to distinguish more or less between the seven classes, because about 60% of the subjects are classi- fied correctly. These plots reveal that the reference group can be separated easily but only when more than 9 variables are included. When the variable, age, is eliminated in going from 10 to 9 variables, the distinction disappears.

Variables pathognomonic for each disease category

In this study the number of patients is so small that it is impossible to split the database to create an independent test set. When using the learning set äs test set, in the linear discriminant analysis, over- optimistic results can be obtained. The leave-one-out procedure is the solution in situations like these.

Statistically this is comparable with an independent dataset. A discriminant analysis is performed on the total number of patients minus one. Each time another one is omitted and is classified by means of bis discriminant scores in the accompanying analysis.

The discrimination between the six disease categories on the basis of chemical and haematological variables is poor. When using the leave-one-out procedure, only 40% of the patients are classified correctly with 18 variables, while with 10 variables, only 31% are classified correctly.

J. Clin. Chem. Clin. Biochem. / Vol. 23,1985 / No. 1

(10)

'S

rncd

CAO

3cd C

1

OCA

J5

ΌO 3CA

J

JD

1

cd

1 ε

3C

0

ί

i indication

1 "&

CA

1

ε

o o

S

— ·> —

'to

2CA

,s

cd

c0

1 2

X

ε

S JDcd cd

'S

c

'o

JDcd F

OsteomyelitisArthrosis

O

0

i

0 cd tu

i

•c

Λ Ecd

•o

S

ε

&*

<o rt

3 g

1

ti

I

o

1

S ff

0

co 'S

fe

CA

"o

•s

0

ε s

0

^c0 uυ co

£ g

•β

1

'Jeα•a

2 3

0

2

u,

>

«*«

Λ

1

οί

phosphate

•α.ΛCA

•g.

Ο

s· ΰ

"3

c2

00

Haemoglobin

«S g

Neutrophil band granulocytes

•o

0

3

Ξ

CJ

c^O

J

"ob o>

^CJ

•SO

s

cou ' r;

C!O

α

| J2

Eosinophil granu 19

0

2

Erythrocytes Sedimentation 19

<+Zgo

•g

il

CX -j

2 £

z ζι g δ

§

•Ό

II

ί!

£ ^ <N (N

ISoC

"ob

0c cd _

ffi tN

JJ 2

CA

1

O

5 3 ' |

coo

•c 'S

1

ISoC 00

fficd

*l ε

8

co

^CA

Ί

cd

.'So

1OC

1

Erythrocytes

t*-<

§

•o

1 1 o

i

Leukocyt

t>

2

Sedimentation

1

o3

2

00

I 1 s s

<2 -53 8 2 2

"β o "3 c c -S tA 4, M J3 O O S

if s ΐ J ^ a l ll- i

II t 1 t ! I-i " fl 1

2 2

o o

I g c | 1

tll a t i t l

•O 3

"§ -S '§

.a o s s s ·§ s- a

ε i l ε g 8 g g ^

O i·* »r< *< .<w r*·

CO ^ *£ J CO «-i

II « I

H

u cdO ro CO m

1 i

B .«A

ij~

• 1 i 1 •3 o ; <

(11)

Goldschmidt, den Hartog, Leijten, Coomans and Massart: Classification of subjects with joint complaints

749

U 'S

s

;s

1

•o

1

43

i

1 •s

0

§

I

Λ

1 's

i I

β

1 a 1

so

g

|

>sinophil granulocytes U

.2 ε

1 εg

£

Neutrophil ban form granulocytes

<A

3

JD

c

0 'S)O

1

1

Λ

3

.s

5 •a

& |

< 00

1 β

es z s.

£

| i

gg

#

0

II

«g -

_4> C«Λ β

— "S S* 'S

'i|J s

1 §2 i

Z: (Λ

*S*ocd

^ 3

•g

1 I S A

v> O

4)· W4

">\ "**

3 2

«P ..υ

J 00

JOS

^o

1

§ 1

00

3 u

α

JD13

1« i!

1

0

ε

•o•o

"S

U

& |

v

1

3 1

II

ll

C cd

co

3

*% S

l

*β ft>X ·«-*

ffl

8

Fig. l a. Plot of two discriminant scores, resulting from a dis- criminant analysis on 7 classes and 20 variables:

haemoglobin, leukocytes, serum uric acid, serum calcium,

serum inorganic phosphate, serum glucose,

serum total protein, serum albumin, serum total bilirubin, serum alkaline phosphatase, serum lactate dehydrogenase, serum aspartate aminotransferase, serum total cholesterol,

erythrocytes Sedimentation rate, neutrophil segmented granulocytes, neutrophil band form granulocytes, lymphocytes,

monocytes,

eosinophil granulocytes,

age.59.6% of the cases were correctly classified. The patients are indicated by a number indicating the dis- ease and the centroid of each group is indicated by a letter:

rheumatic fever (l,A) rheumatoid arthritis (2, B) lupus erythematosus (3,C) gout (4,D) arthrosis (5,E) osteomyelitus (6,F) reference group (7,G)

By means of linear discriminant analysis these va- lues are increased to 62% and 51%, respectively.

Much better results are gained by reducing the six disease categories to three by combining the infec- tious diseases, (rheumatic fever + osteomyelitis), the auto immune diseases (rheumatoid arthritis + lupus erythematosus) and 'mechanicaF defects (arthrosis + gout) s shown in figure 2 for 21 varia- bles. Table 7 gives the result obtained by linear dis- criminant analysis. Thus, 71% of the patients can be classified correctly and this score does not alter by reduction of the number of variables to 8.

J. Clin. Chem. Clin. Biochem. / Vol. 23,1985/ No. 11

(12)

Fig. Ib. Plot of two discriminant scores, resulting from a dis- criminant analysis on 7 classes and 10 variables:

haemoglobin, serum uric acid, serum'calcium,

serum inorganic phosphate, serum albumin,

serum alkaline phosphatase, serum total cholesterol, erythrocytes Sedimentation rate, neutrophil band form granulocytes,

age.57.6% of the cases were correctly classified. The patients are indicated by a number indicating the dis- ease and the centroid of each group is indicated by a letter:

rheumatic fever (l A) rheumatoid arthritis (2,B) lupus erythematosus (3,C) gout (4,D) arthrosis (5,E) osteomyelitus (6,F) reference group (7,G)

Fig. 2. Plot of two discriminant scores, resulting from a dis- criminant analysis on 3 classes and 21 variables:

Fig. Ic. Plot of two discriminant scores, resulting from a dis^

criminant analysis on 7 classes and 9 variables (same äs in figure Ib minus age).

48.5% of the cases were correctly classified. The patients are indicated by a number indicating the disease and the centroid of each group is indicated by a letter:

rheumatic fever (l,A) rheumatoid arthritis (2,B) lupus erythematosus - (3,C) gout (4,D) arthrosis (5,E) osteomyelitus (6,F) reference group (7,G)

Fig. 2. Continued

v.

serum uric acid,

erythrocytes Sedimentation rate, leukocytes,

serum urea,

serum lactate dehydrogenase, neutrophil band form granulocytes, age,serum total bilirubin,

serum aspartate aminotransferase, serum albumin,

monqcytes,

serum alkaline phosphatase, serum inorganic phosphate, serum total protein, serum glucose,

eosinophil granulocytes, serum calcium,

haemoglobin,

neutrophil segmented granulocytes, lymphocytes,

serum total cholesterol.

70.6% of the cases were correctly classified. The patients are indicated by a number indicating the disease class and the class is indicated by a letter:

infections = rheumatic fever + osteomy- elitis (l,A)

auto-immune diseases = rheumatoid arthritis + lupus erythematosus (2,B) 'mechanical' problems = arfhrosis + gout (3,C)

(13)

Goldschmidt» den Hartog, Leijten, Coomans and Massart: Classification of subjects with joint complaints

751

Tab. 7. Prediction results of the analysis shown in figure 2.

Actual group

Rheuma tic fever + Osteomyelitis Rheumatoid arthritis

+ Lupus erythematosus Arthrosis -f Gout

Predicted group membership Rheumatic fever

-f- Osteomyelitis 24(61%)

5 (10%) 4 (6%)

Rheumatoid arthritis 4- Lupus erythematosus

5 (13%) 34 (64%) 10 (14%)

Arthrosis + Gout 10(26%) 14(26%) 57 (80%)

i l i

Discussion

The first problem to be studied was the importance of the variable for the discrimination between disease categories. As usual in retrospective studies various data are missing for each patient.

Normally, the resulting problem is solved by sub- stituting the mean vahie of each variable in the subgroup concerned for the missing data. The im- portance of each variable in the achieved clas- sification can then be estimated.

However, when there are many missing data, the classification and the estimation become unreliable.

Therefore, we applied two different selection methods. Variable selection by two methods with no and partial correlation correction yielded strongly differing results. The weak similarity between the two selections is an expression of the mathematical differences between the two methods but also an indication of dealing with derived, secondary vari- ables that are indicative for a Status of being ill rather than for a specific disease.

•The two methods together lead to the selection of a set of variables that have a strong resemblance to the variable set selected pn clinical grounds. The nature of the selected variables and also the much higher classification scpre after reduction of the six disease

categories to three combined classes clearly prove that the measured chemical and haematological effects accompanying joint diseases are not pa- thognomic, but probably only the result of an in- fection or autoimmunity, while the importance of the age and sex related variable once again makes i t clear that predisposition to suffer from the disease in question plays a substantial role.

The applied mathematical techniques, äs soph- isticated äs they are, cannot provide more in- formation than is stored in the datasets supplied.

Therefore, our final conclusion is that these variables are not relevant to the diagnosis. They may however be helpful in objectivating the complaints and therapy results, since it is clearly possible to separate the disease groups from normal patients (reference group).

Acknowledgement

The authors wish to thank Mrs. A. J. L. M. Veuger, M. D., for her heipful cooperation; Prof. Dr. J. B. J. Soons and Prof.

Dr. A. Dijkstra for their encouraging discussions; Prof. Dr.

R. W. Lent (Albert Einstein College of Medicine, New York) for kindly reviewing the manuscript; Drs. L. van Norel for performing some exploratory statistical studies; Miss A. M. J.

van Bommel, Mrs. J. W. A. M. van lngen- Berkel and Miss M. H. van Osch for their skilful assistance.

References

1. Goldberg, D. M. & Eljis, G. (1978) Adv. Clin. Chem. 20, 49-128.

2. Wilding, P., Keödall, M- J., Holder, R., Grimes, J. A. &

Farr, M. (1975) Clin. China. Acta 64, 185-194.

3. J0rgensen, K. & Astrup, P. (1957) Scand. J. Lab. Invest.

9,122-132.

4. Borst, A., Hanssen, C. J. M. & de Jong, E. B. M. (1974) Clin. Chim. Acta 55,121-128.

5. Kingsley, G. R. & Getchell, G. (1953) Stand. Meth. Clin.

Chem. /, 113-117.

6. Harper, A. M., Duewer, D. L. & Kowalski, B. R, (1977) In: Chemometrics, Theory and Practice (Kowalski, B. R., ed.), Am. Chem. Soc. Symp. Ser. Nr. 52, 14-52.

7. Duewer, D. L., Koskinen, J. R. & Kowalski, B. R. (1975)

"ARTHUR" (avaüable from B. R. Kowalski, Laboratory for Chemometrics, Department of Chemistry BG-10, Uni- versity of Washington, Seattle, Washington 98195).

J. Clin. Chem. Clin. Biochem. / Vol. 23,1985 / No. 11

(14)

8. Solberg, H.E. (1978) Discriminant Analysis in Clinical 11. Eastman, R. D. (1975) Biochemical values in elinical Chemistry. CRC. Crit. Rev. Clin. Lab. Sei., 209. dicine: The results follöwing pathological or physiological 9. Nie, N. H., Hüll, C. H., Jenkins, J. G., Steinbrenner, K. & ehange, Wright Ltd., Bristol.

Bent, D.H. (1975) Statistical Package for the Social 12. Bondy, P. K. & Rpsenberg, L. E. (1980) Metabolie control Sciences (SPSS), McGraw-Hill, New York, 2nd ed.* and disease, 8th ed., W.B. Saunders Company, Phik*

434-467. delphia, London, Toronto.

10. Holvex, D. N. (ed.) & Talbott, J. H- (cons. ed.) (1972)

The Merck Manual, 12th ed., Merck Sharp and Dohme Drs. H. M. J. Goldschmidt Research Laboratories, Rahway. Director

Department of Clinical Chemistry and Haematology

Maria Ziekenhuis Dr. Deelenlaan 5 Postbus 90107 NL,5042 AD Tilburg

Referenzen

ÄHNLICHE DOKUMENTE

In this study the effect of three-dimensional strain fields on the strain measurement using NBED was examined. Since there are to date no studies on this topic, this

For example, the ability to input the age of workers as a number (like 25) instead of a range (18-30) as was usually done when using codes to handle flat data in order to avoid the

A dormancy value or duration defines the period potato tubers can be stored before initiating sprouting. Characterization of dormancy value provides useful information to

Table 3 Summary of final comments on Research Lesson 2 (italics indicates the key point chosen for discussion in the main text) Key points of final commentsSummary of final

Following parameters were determined in 113 blossom and 34 honeydew honeys of Swiss origin: the sugars glucose, fructose, turanose, saccharose, nigerose, maltose, isomaltose,

al. As the scores on the first axis of the CCA are dimensionless, the evaluation of the scores with maps of settlement areas should allow us to allocate the score’s values in

The cointegration test, shown in Table 9, (see Engle &amp; Granger, 1987; Engle and Yoo, 1987, Table 2), shows that in the two cases with monthly data (models 5 and 6),

The domain of the HDI, as published in the Human Development Report, is between 0 and 1, but even the best performing country does not achieve the highest possible value.. To