• Keine Ergebnisse gefunden

Capacity Utilisation and Quality of Care in German Hospitals

N/A
N/A
Protected

Academic year: 2022

Aktie "Capacity Utilisation and Quality of Care in German Hospitals"

Copied!
61
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

UNIVERSITÄT ZU KÖLN

W

ORKING

P

APERS ON

H

EALTHCARE

M

ANAGEMENT EDITOR:PROF.DR.LUDWIG KUNTZ

Capacity Utilisation and Quality of Care in German Hospitals

by Roman Mennicken

Working Paper No. 8

Cologne 2007

Chair for Healthcare Management, University of Cologne, Albertus-Magnus-Platz, 50923 Cologne, Germany,

phone: +49-221-470-5417, fax: +49-221-470-5418, www.mig.uni-koeln.de

(2)

Abstract

Capacity Utilisation and Quality of Care in German Hospitals

by Roman Mennicken

Abstract:

The main objective of this study is to empirically assess the relationship between capacity utilisation and quality of care in German hospitals. Many international studies show a rela- tionship between staff to patient ratios or bed utilisation and the quality of care provided in hospitals but evidence for Germany is lacking so far. In the last decade there was a decrease in hospital beds and an increase in hospital cases with a fair increment in staffing levels for physicians and a moderate decrease for nurses. In times of a high level demand of hospital services, it is sensible to assume that maintaining a sufficient level of quality of care may be difficult. Additionally, the recent change in hospital reimbursement from a retrospective system to the prospective DRG-System sets more incentives to increase patient cases and hence to wide capacity utilisation as well, possibly at the expense of quality of care. The consequences of this change are assessed using German data. For the empirical analysis around 135 000 patient cases from 8 German hospitals from the year 2004 are used. The data are a full sample of all patients admitted to these hospitals in the respective year. The outcome variable is daily mortality ratio. The results of multivariate regression models in- cluding panel models suggest a deteriorating effect of increasing bed utilisation on quality of care. Meaningful conclusions about staff to patient ratio are not possible due to insufficient data.

(3)

Author

About the author:

Roman Mennicken was working as a Registered General Nurse in Germany and England from 2000 till 2001. Afterwards he studied Health Economics at the University of Cologne (Germany) and Public Health at the University of Auckland (New Zealand). Currently he is working as a Research Associate at the Chair for Healthcare Management (University of Cologne, Germany) and at the Rheinisch-Westfälisches Institut für Wirtschaftsforschung in Essen (Germany). E-mail: mennicken@wiso.uni-koeln.de

(4)

Contents

List of Tables 5

List of Figures 6

Abbreviations 7

1 Introduction - Measuring Quality in Health Care 8

1.1 Main Objectives of this Study . . . 9

1.2 Development of Capacity Utilisation in German Hospitals . . . 9

1.3 Evidence for the Effects of Capacity Utilisation on Quality of Care . . . 10

1.4 Necessity of Risk-Adjustment for Comparisons of Quality of Care . . . 13

2 Data for the Analysis 16 2.1 Exclusion of Patient Data . . . 17

2.2 Recoding of Hospital Departments . . . 17

2.3 Pooling of the DRG-Data . . . 18

2.4 Structural Data Set . . . 19

2.5 Full Data for the Analysis . . . 20

2.5.1 Definition of dependent Variable . . . 21

2.5.2 Definition of Variables measuring Capacity Utilisation . . . 21

2.5.3 Variables used for Risk-Adjustment . . . 25

2.5.4 Other control Variables . . . 25

3 Basic econometric Model 28 3.1 The Panel Regression Models . . . 29

3.2 Regression Diagnostics . . . 31

3.2.1 Testing for Multicollinearity . . . 32

(5)

CONTENTS

3.2.2 Testing for Normality of the Error Term . . . 32

3.2.3 Testing for Heteroskedasticity . . . 33

3.2.4 Testing for Serial Correlation . . . 36

3.2.5 Conclusion of the Regression Diagnostics . . . 37

4 Regression Model 40 4.1 Regression Results for Bed Occupancy . . . 41

4.2 Regression Results for Staff to Patient Ratio . . . 44

5 Discussion 47 A Tables 49 A.1 Descriptive Tables . . . 49

A.2 Additional Estimation Results . . . 51

References 59

(6)

List of Tables

1.1 Development of Capacity Utilisation from 1995 to 2004 . . . 10

2.1 Illustration of Data stored in wide Form . . . 18

2.2 Illustration of Data stored in long Form . . . 18

2.3 Illustration of collapsed Data . . . 19

2.4 Descriptive Statistics for Mortality and Capacity Utilisation . . . 23

2.5 Descriptive Statistics for Risk-Adjustment and control Variables . . . 27

3.1 Variance Inflation Factor . . . 33

3.2 Pearson Correlation Matrix . . . 38

3.3 Pearson Correlation Matrix continued . . . 39

4.1 Variable Explanation in the Regression Model . . . 41

4.2 Estimation Results for In-Hospital Mortality on Bed Occupancy . . . 42

4.3 Estimation Results for In-Hospital Mortality on Staff-Ratio . . . 45

A.1 Total Number of Nurses and Doctors per Hospital . . . 49

A.2 Average Bed Occupancy per Department . . . 50

A.3 Estimation Results for In-Hospital Mortality on Bed Occupancy II . . . 52

A.4 Estimation Results for In-Hospital Mortality on Bed Occupancy III . . . 53

A.5 Estimation Results for aggregated Staffing Ratios . . . 54

(7)

List of Figures

2.1 Average Bed Occupancy per Hospital 2004 . . . 23

2.2 Average weekly Bed Occupancy for all Hospitals . . . 24

3.1 Quantile Normality Plot . . . 34

3.2 Residuals for Bed Occupancy plotted against fitted Values . . . 35

3.3 Residuals for Staffing Ratios plotted against fitted Values . . . 36

(8)

Abbreviations

CLT Central Limit Theorem

DMP Disease Management Programme

DRG Diagnosis Related Groups

FD First Differences

FE Fixed-Effects

GLM Generalised Linear Model GLS Generalised Least Squares

ICU Intensive Care Unit

IOM Institute of Medicine

LOS Length of Stay

KHEntgG Krankenhausentgeltgesetz MDC Major Diagnostic Category NICU Neonatal Intensive Care Unit OLS Ordinary Least Squares

PCCL Patient Clinical Complexity Level

RE Random Effects

SD Standard Deviation

SGB Sozialgesetzbuch

US United States

VIF Variance Inflation Factor

(9)

Chapter 1

Introduction - Measuring Quality in Health Care

Quality1in health care is a topic of growing awareness for both, the public as well as policy makers. Hence, in the last years quite a few regulations in regards with quality in health care were introduced in the German health care sector2. However, these developments impose a challenge. If quality is becoming one of the main emphases in health care, how can it be measured?

Donabedian (1986) introduced three dimensions for quality in health care: Structural, process and outcome quality3. Structural measures of quality refer to characteristics of health care, e.g. available medical technology in a hospital, while process measures deal with the question of completeness and appropriateness of the care patients receive. Outcome measures are defined as changes in health status, whether favourable or adverse due to the received health services4.

Potential outcomes in health care analyses include mortality, readmission and complica- tion rates, physical and psychosocial functioning, quality of life, costs and use of resources and services. As Iezzoni (1994) demanded5, "to be meaningful, the outcome must be impor- tant to patients [. . . ], relatively common, and linked temporally and causally to the services

1Quality of care can be defined as "the degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge."

See Institute of Medicine (2007).

2For example, publication of quality reports by German hospitals, possibility of recommendations for pa- tients in regards to hospital treatment by sickness funds or volume-outcome regulations (see SGV V, § 137).

3See Donabedian (1986), p. 102ff.

4See Brook et al. (2000), p. 283ff.

5See Iezzoni (1994), p. 1822.

(10)

1.1 Main Objectives of this Study

under scrutiny".

When using administrative data for health care analyses, the choice of patients’ outcomes for measuring quality is a common approach6. One of most frequently used outcome mea- sures is in-hospital mortality (see sections 1.3 and 1.4). As international evidence suggests quality of care in hospitals is influenced by capacity utilisation assessed by e.g. bed occu- pancy and staffing levels, their skill mix as well as other factors. The following sections will discuss these issues in more detail.

1.1 Main Objectives of this Study

The main objective of this study is to empirically assess the relationship between capacity utilisation and quality of care in German hospitals using in-hospital mortality as the outcome measure. Three hypotheses will be tested using multivariate regression models.

It will be assessed, how in-hospital mortality during a calendar year is affected by ca- pacity utilisation measured as (1) daily bed occupancy and (2) daily staff to patient ratio. As suggested by Lankshear et al. (2005), staff to patient ratio also includes doctors to control for a possible confounding effect by omitting hospital doctors7. It is expected that with increas- ing bed occupancy in-hospital mortality also rises, while staff to patient ratio is supposed to have an inverse relationship on the outcome.

Additionally, this study (3) investigates a possible "weekend effect" in German hospitals, i.e. in-hospital mortality is supposed to be higher over weekends and for patients admitted on weekends.

1.2 Development of Capacity Utilisation in German Hospi- tals

Over the last decade there has been a considerable change in capacity utilisation of German hospitals. From 1995 to 2004 the number of hospital cases increased by more than 5%. The number of hospital beds decreased by roughly 14% with a drop of 6% in average occupancy of all German hospital beds due to the reduction in average length of stay (LOS) per case

6See Iezzoni (1994), p. 1825.

7See Lankshear et al. (2005), p. 171.

(11)

1.3 Evidence for the Effects of Capacity Utilisation on Quality of Care

Table 1.1:Development of Capacity Utilisation from 1995 to 2004

1995 2000 2001 2002 2003 2004

Hospital cases 15 931 168 17 262 929 17 325 083 17 432 272 17 295 910 16 801 649

LOS per case 11,5 9,7 9,4 9,2 8,9 8,7

Hospital beds 609 123 559 651 552 680 547 284 541 901 531 333

Bed occupancy 81.7 81.5 80.7 80.1 77.6 75.5

in %

Doctors 101 590 108 696 110 152 112 763 114 105 117 681 Nurses 350 571 332 269 331 472 327 384 320 158 309 510

Notes: Table derived from www.gbe-bund.de from more than 11 days to less than 9 days (see table 1.1 for details).

There was also a mixed development in the medical and non-medical work force in Ger- man hospitals. In 2004 about 15% more doctors were working in hospitals compared to 1995. On the other hand the number of employed nurses decreased by about 12%. Hence, the average number of cases per physician decreased by about 10%, while, on average, the number of patients nurses had to care for increased by about 19.5% between 1995 and 2004.

No German studies could be identified which investigate the consequences on quality of care of these developments, even though the international literature suggests a deteriorating effect (see chapter 1.3).

Additionally, recent policy changes in the hospital sector, e.g. reimbursement from a retrospective system to the prospective diagnoses related groups (DRGs) or introduction of disease management programmes (DMPs), set incentives to increase patient cases and therefore enhance capacity utilisation, possibly at the expense of quality of care8. DRGs as such are supposed to have a deteriorating effect on the quality of care provided: Due to payment per case, DRGs set the incentives to minimize provided services per patient, reduce their average length of stay, and select patients who require less resources9.

1.3 Evidence for the Effects of Capacity Utilisation on Qual- ity of Care

Bed occupancy and staff to patient ratio have been identified in the international literature as determinants of quality in hospital services. One study has explored the effects of nurse

8See Schrappe (2004), p. 267 or Breyer, Zweifel, Kifmann (2003), p. 354.

9See Lüngen (2004), p. 258 for more details.

(12)

1.3 Evidence for the Effects of Capacity Utilisation on Quality of Care

to patient ratio and bed occupancy on in-hospital mortality and other outcomes in neonatal intensive care units (NICUs). The study used data from 13 515 infants in 54 randomly chosen NICUs. Occupancy was calculated by dividing the number of infants in the unit by the maximum number of infants in the same unit during the study period of about one year (March 1998 to April 1999). Nurse to patient ratio was computed by dividing the number of working nurses by the number of recommended nurses. For both measures of capacity utilisation the average was calculated twice for the LOS of every respective infant10. A multivariate logistic regression approach showed that the odds of in-hospital mortality for infants admitted to maximum occupied NICUs were 50% higher when compared to infants admitted at 50% occupancy11.

Another study also assessed the impacts of capacity utilisation on in-hospital mortality12. The retrospective study used data from 1 050 cases admitted to one intensive care unit (ICU) between 1992 and 1995. Nursing staff was measured as the highest number of nurses per shift as recommended by the UK Intensive Care Society. Two variables were calculated for bed occupancy: Occupancy per shift (highest number of occupied beds per shift) and peak occupancy (highest occupancy per shift during LOS for every patient). Using a multivariate logistic regression model, an interaction between nursing requirements and bed occupancy resulted in a more than two times higher in-hospital mortality rate for patients staying during times of high capacity utilisation than those patients staying during time with low capacity utilisation13.

One longitudinal study evaluated the relationship between nurse staffing and in-hospital mortality as well as other outcomes14. Mark et al. (2004) used data from 412 hospitals lo- cated in 11 states in the United States (US) over five years. Controlling for hospital, market and financial characteristics in different regression models they found that increased nurse staffing reduced risk-adjusted in-hospital mortality15. However, the results suggested a di- minishing effect of increased staffing, i.e. employing more nurses to a well staffed depart- ment lead to continuously smaller further reduction in in-hospital mortality or complication

10See the UK Neonatal Staffing Study Group (2002), p.101.

11See the UK Neonatal Staffing Study Group (2002), p.103f.

12See Tarnow-Mordi, Hau & Shearer (2000).

13See Tarnow-Mordi, Hau & Shearer (2000), p. 187f.

14See Mark et al. (2004).

15Mark et al. used an OLS, fixed effect and a dynamic model for estimating in-hospital mortality and com- plication rates. See Mark et al. (2004), p. 284ff.

(13)

1.3 Evidence for the Effects of Capacity Utilisation on Quality of Care

rates16.

Two recent systematic reviews17conclude that the nurse staffing ratio and their skill mix have an impact on patient’s outcomes. Lankshear et al. (2005) included a total of 20 studies in their review. In 12 studies in-hospital mortality was one of the outcomes of interest18. The presented results suggest that nurse staffing is supposed to have an inverse relationship on in- hospital mortality19. The earlier review by Lang et al. (2004) came to a similar conclusion, even though two of the 14 studies assessing in-hospital mortality showed a negative impact of nurse staffing on mortality rates20. Obviously, both systematic reviews included for about two third the same studies; hence the similar conclusions are not surprising.

Recent studies support the idea of a time factor in the quality of care, i.e. a variance of the quality of care provided at weekends or public holidays compared to weekdays. Bell and Redelmeier (2001) have shown in a ten-year retrospective study with over 3 700 000 admissions to acute care hospitals in Ontario that patients admitted on weekends have sig- nificantly higher risk-adjusted in-hospital mortality than weekday admissions. However, mortality rates were only higher for specific conditions and for more than 20% of the most common causes of death and not for aggregated mortality rates21.

Cram et al. (2004) conducted a similar study in California. The retrospective study with data from 3 700 000 admissions to all acute care hospitals studied the "weekend effect"

with the 50 most common diagnoses. Analyses were performed with progressively restric- tive criteria: First all admissions, then unscheduled admissions and at last only unscheduled admissions from emergency department were analysed. The authors suspected a bias from scheduled admissions (most likely during weekdays) having better outcomes than unsched- uled admissions22. Using the most restrictive criteria, aggregated data (all 50 diagnoses) showed that risk-adjusted in-hospital mortality was 3% higher in admissions over weekends when compared to patients admitted during weekdays23.

This short international literature review has shown that (adverse) health outcomes like in-hospital mortality seem to be affected by specific conditions in hospitals. Bed occupancy

16See Mark et al. (2004), p. 293f.

17See Lankshear et al. (2005) & Lang et al. (2004).

18See Lankshear et al. (2005), p. 166.

19See Lankshear et al. (2005), p. 171.

20See Lang et al. (2004), p. 330.

21See Bell and Redelmeier (2001), p. 664ff.

22See Cram et al. (2004), p. 152.

23See Cram et al. (2004), p. 153.

(14)

1.4 Necessity of Risk-Adjustment for Comparisons of Quality of Care

and staffing ratios are not only intuitive determinants of health, but also in the international literature already empirically assessed factors influencing health outcomes.

1.4 Necessity of Risk-Adjustment for Comparisons of Qual- ity of Care

Especially in outcome assessment risk-adjustment is an absolutely necessary part of the methodology to ensure a fair comparison among health care providers. Brook et al. (2000) illustrate this issue with the following example24: A patient is admitted to an emergency department with a heart attack. If the patient received nothing but pain relief, most likely (60-70% probability) he or she would leave the hospital alive with negligible restrictions in daily life. Obviously, the quality of care the patient received is beyond description, but in terms of outcome measurement, the hospital would be performing well. The underlying concept of risk-adjustment is that patient outcomes are attributable to the health care service received, the quality of the service, random effects and the patient’s own characteristics e.g.

age, sex, severity of illness and other risk factors25.

The clinical and other characteristics are likely to differ in patients or other units of analysis (departments, hospitals). In observational studies it is not possible to control for these differences in case mix from the outset with randomisation, therefore, the retrospective approach of risk-adjustment has to be taken26. Without risk-stratification, differences in chosen outcomes could indicate more a difference in case mix than an actual divergence in quality of service provided27. Hence risk-adjustment has the objective to assess the risk of an individual patient to experience an (un)desired outcome regardless of the quality of care or the services of care received28.

Depending on which data and risk-stratification method used, a variance in hospital per- formance can be observed. However, risk-stratification with administrative data performed quite well in comparison with risk-adjustment using clinical data29. Studies have shown that risk-adjusted indices, developed from administrative data comparable to the data used

24See Brook et al. (2000), p. 284.

25See Iezzoni (1994) & p. 1823 and Iezzoni (2003), p. 4.

26See Grol et al. (2004), p. 35.

27See Davies (2005), p. 14

28See Iezzoni (1994), p. 1823.

29See Iezzoni (1995), p. 768f. & Iezzoni (1996), p. 1384.

(15)

1.4 Necessity of Risk-Adjustment for Comparisons of Quality of Care

here, for mortality, readmission, and complication rates have face and construct validity, are stable over time, and account for the variation of hospitals in regards to these outcomes30 and thus, can be used as indicators for hospital performance regardless of the principal diag- nosis. Other studies, however, provided evidence that, before using risk-adjusted indices as performance indicators, validity of the indicators for different diagnoses has to be established beforehand31.

When using administrative data for risk-adjustment, certain aspects have to be kept in mind. In comparison to clinical data sets, all discharge diagnoses are coded and not only the ones present at admission. Accordingly, also events taking place during the hospital stay are used for risk-stratification (which could be the result of poorer quality of service).

Administrative data used for reimbursement purposes also have a potential for "DRG creep", i.e. patients are coded sicker than they actually are32. However, the same arguments can be brought forward to clinical data as well. Once the attention is focussed on certain kinds of risk factors which are used for adjusting, the possibility of upgrading patients is always given33.

Risk-adjusted mortality rates are a widely used indicator for quality of care. Numer- ous studies use risk adjusted in-hospital mortality as an indicator of quality either for the whole hospital34, for certain departments35, specific conditions36 or even individual perfor- mance in hospitals37. However, all measures of risk-adjustment are potentially biased with confounding. Therefore, objections to comparing hospital performance even with excellent risk-adjusted outcomes can never be cleared out38. Jencks et al. (1988), for example, inves- tigated four conditions and came to the conclusion that risk-adjusted mortality rates could produce misleading results39. Only sufficient and accurate data allow reliable and valid in- ferences. As pointed out by Zhan and Miller (2003, p. 61), the size of administrative data

"coupled with missing important confounding variables and difficulty in choosing correct

30See DesHarnais et al. (1990), p. 1139f. & DesHarnais et al. (2000), p. 22.

31See Thomas, Holloway & Guire (1993), p. 19.

32See Psaty et al (1999), p. 109 or Carter et al. (1990), p. 426f.

33See Davies (2005), p. 14f.

34In addition to the already mentioned, see e.g. McKay & Deily (2005), Prytherch et al. (2005), Devereux et al. (2002) or Cutler (1995).

35See in addition to UK Neonatal Staffing Study Group (2002) and Tarnow-Mordi, Hau & Shearer (2000) also Dara et al. (2005) or Shortell et al. (1994).

36See Mason et al. (2006) or Hamiltion & Hamilton (1997).

37See Huckman & Pisano (2006).

38See Brook et al. (2000), p. 287, McKee et al. (1997), p. 187 or Halm & Chassin (2001), p. 693.

39See Jencks et al. (1988), p. 3615.

(16)

1.4 Necessity of Risk-Adjustment for Comparisons of Quality of Care

statistical models that fit the data, clinically insignificant but statistically significant results could lead to biased inferences and erroneous conclusions".

(17)

Chapter 2

Data for the Analysis

The data (§ 21 KHEntgG) are composed of DRG1data (134 536 cases from 8 German hos- pitals) for the year 2004. The DRG data is a full sample of all cases and patients giving reasons for admission and discharge with the respective dates. This cross-sectional data was transformed to a panel2for further analysis.

In sum, the data comprise the following variables: patient’s age, insurance type3, gender, length of stay, the DRG and the major diagnostic group which categorises DRGs. In 2004 there were 824 different DRGs in 23 major diagnostic groups (MDCs)4. Moreover, the data provide the level of illness severity with the patient clinical complexity level (PCCL)5, effective DRG weight, (main) diagnoses and secondary diagnoses. Information from this data was used for risk-adjustment.

The DRG-data were merged with a second dataset. This structural dataset includes infor- mation about the bed capacity in the different departments of the hospital, the staffing levels for nurses and physicians with information about their level of medical qualification in the respective department.

Before transformed into longitudinal data, variables were either recoded or newly gener- ated6. The process of generating and transforming the data will be explained in the following sections.

As this is highly sensitive data, all analyses have been performed anonymously. Individ-

1Certified grouper by 3M

2Panel data is defined as having a cross-sectional and a time series dimension (see Wooldrige, p. 448.)

3Insurance type is either private or public insurance.

4See Roeder et al. (2004), p. 909.

5PCCL is measured by five categories; from PCCL 0 to PCCL 4.

6All transformations and analyses were performed with STATA 9.2 for Windows.

(18)

2.1 Exclusion of Patient Data

ual patient identification was at no time possible, because the identity was numerically coded before the author had access to the DRG-data. After finishing the data transformation, the already recoded patient identification number has been eliminated as well7.

2.1 Exclusion of Patient Data

In a first step 1 112 cases were excluded, because they were not admitted for treatment, but only staying in hospital as accompanying persons (Begleitpersonen). Six departments with 2 280 cases were excluded, because structural data (see chapter 2.4) could not be obtained for the respective department. Another 7 587 cases were excluded due to missing data. These cases were not coded completely, so important information (e.g. gender, PCCL or other pa- tients’ characteristics) was missing. However, these 7 587 cases were included in computing the number of patients per day for every department.

2.2 Recoding of Hospital Departments

Capacity utilisation is measured as bed occupancy and staff-to-patient ratio. To compute both variables per day, it was necessary to calculate the number of patients per day in every department for each hospital. Therefore, the cross-sectional data had to be transformed into a longitudinal data set. It was necessary to adjust the DRG-data for merging with the structural data, i.e. departments8 in five hospitals were grouped together. These adjustments were required, because structural data were derived from quality reports (see chapter 2.4), in which information about the respective departments were presented in an aggregated form as well.

For one hospital the obstetric department (Code 2500) was merged with the gynaecologi- cal department (Code 2400) and the neonatology departments (Code 1200) were included in the paediatric departments (Code 1000) in another two hospitals. In another hospital the med- ical departments with different focuses (Codes 105 and 106; haematology and endocrinol- ogy, respectively) were combined with the main medical department (Code 100). The same approach was taken with accident surgery (Code 1516), vascular surgery (Code 1518) as well as with the medical intensive (Code 3601) and the surgical intensive care unit (Code 3618).

7The identification of hospitals has been necessary, because otherwise structural data could not have been obtained.

8In the DRG data and in the structural data set departments were coded according to §301 SGB V.

(19)

2.3 Pooling of the DRG-Data

These departments were included in either the surgical department (Code 1500) or the in- tensive care unit (Code 3600). The stroke unit (Code 2856) was merged with the neurology department (Code 2800) in the last hospital.

2.3 Pooling of the DRG-Data

After adjusting the DRG-data, the shape of data was changed. Originally, information was stored in wide form. For changing the data into long form, it was necessary to identify the exact days the patient stayed in hospital. Therefore, a loop was programmed to identify all days of hospitalisation for every patient. For every respective day, a new variable with the value "day of the year"9 [DOY] was added to the wide form of the data (see table 2.1). The dot in table 2.1 symbols a missing day for this patient, i.e. he or she was discharged after two days.

Table 2.1: Illustration of Data stored in wide Form Patient ID Variables DOY 0 DOY 1 DOY 2

1 0 16071 16072 16073

2 1 16071 16072 .

3 0 16071 16072 16073

Notes: Variables include department, hospital and other variables.

The command "reshape" in STATA converts the data from wide to long form10, i.e. data for each patient is repeated for every day of his or her stay in hospital (see table 2.2).

Table 2.2: Illustration of Data stored in long Form Patient ID DOY Variables Department Hospital

1 16071 0 100 1

1 16072 0 100 1

1 16073 0 100 1

2 16071 1 100 1

2 16072 1 100 1

3 16071 0 200 2

3 16072 0 200 2

3 16073 0 200 2

For computing the number of cases per day in every department for each of the eight hospitals the data in long form were collapsed by day of the year [DOY], department and

9In STATA "day of the year" is given in a numeric expression. The 01.01.2004 has the value 16 071.

10See N.N. (2005), p. 407ff. in STATA Data management reference manual release 9.

(20)

2.4 Structural Data Set

hospital11. After collapsing (see table 2.3 for an illustration of the collapsed data) the ob- servational units are the mean or median patient characteristics per day and department, e.g the mean age of patients in an obstetric department of a hospital on 1st January 2004. Bi- nary variables result in proportions, while metric variables result in the mentioned means or medians (see section 2.5.1 and following for detailed information).

Table 2.3: Illustration of collapsed Data DOY Variables Patients Department Hospital

16071 0,5 2 100 1

16072 0,5 2 100 1

16073 0 1 100 1

16071 0 1 200 2

16072 0 1 200 2

16073 0 1 200 2

2.4 Structural Data Set

Structural data was derived from the quality reports (Qualitätsberichte12) of the respective hospitals13. They include information about number of beds, most common DRGs, fre- quently performed procedures and the staffing levels for the hospital as well as for every department. However, for this study only number of beds and information about staff were used.

The number of beds given per department is referring to so called "Planbetten", i.e. num- ber of beds according to "Krankenhausplan"14 which may differ from the actual number of beds (Ist-Betten) in the respective department. Only staff of departments with beds was included, i.e. doctors or nurses working in functional departments (e.g. radiology) of the hospital were not considered in this study.

The quality reports provide also information about theeducational level of the hospital employees. Therefore, it is possible to distinguish four levels of qualification for nurses:

11Data were collapsed by DOY, department and hospital, so that all three variables were unchanged in the

"collapse" command.

12Data refers to the due date 31.12.2004, i.e. information reflects only employment levels and number of beds on that particular day, which are assumed to be fixed over the year 2004.

13It was not possible to obtain the actual number of beds from the hospitals.

14The "Krankenhausplan" is designed by the Bundesländer according to KHG §6.

(21)

2.5 Full Data for the Analysis

1. Nurse Assistances with no formal qualification [NURSE 1]

2. Nurses with an apprenticeship of one year [NURSE 2]

3. Nurses with an apprenticeship of three years [NURSE 3]

4. Nurses with an apprenticeship of three years and a completed advanced training of two years [NURSE 4]

However, only three of eight hospitals employed nurse assistances and on average only about one nurse with a one year qualification was working in every hospital. Therefore, the first three levels of qualification for nurses were summarised leaving two levels of qualifica- tion for nurses15:

I. Nurses assistances with no formal qualification or nurses with either a one-year degree or a three-year degree [STAFF-I]

II. Nurses with a three-year degree and a completed advanced training of two years (equal to point 4 above) [STAFF-II]

The levels of qualification for doctors can be differentiated as follows:

III. Doctors with a medical degree [STAFF-III]

IV. Doctors with a completed degree as a medical specialist [STAFF-IV]

The structural data had to be changed as well (compare section 2.2) to match the DRG- Data. If necessary, similar transformations were performed, i.e. in one hospital the staffing levels and number of beds of the obstetric department (Code 2500) were included in the gy- naecological department (Code 2400) and in another hospital the accident surgery (Code 1600) as well as neurosurgery (Code 1700) was merged with the general surgical department (Code 1500).

2.5 Full Data for the Analysis

The collapsed DRG-data were then merged with the structural data set. Therefore, the data comprise for every day of the year in 2004 inter alia the total number of patients and beds per department as well as all staff working in the respective department.

15Table with staffing data before summarising is shown in table A.1 in the appendix.

(22)

2.5 Full Data for the Analysis

As mentioned in section 2.1, the 7 587 cases with missing values were included in count- ing the total number of patients per day in the respective departments. Therefore, equivalent transformations to the DRG-data set were performed with the data of these patients and they were included again to the full data set at this point to give the exact number of patients per department. With completing transformations data is now a panel. However, in some depart- ments time periods (days) are missing. The days are missing, because no patients were in the respective department during these times as it can easily happen with "Belegabteilungen"16. Overall, the full data consist of 58 departments in 8 hospitals covering 20 561 days, i.e.

on average 354 days per department are included in further analysis.

2.5.1 Definition of dependent Variable

The outcome variable is in-hospital mortality [MORTALITY]17. In-hospital mortality was identified by reason for discharge ("entlgrund_vorne"18 coded with "07"19). After collaps- ing the DRG-data, the variable indicates the proportion of patients deceased on day i in department jof hospitalk:

MORTALITYi jk= Deceasedi jk Casesi jk

The "logit" transformation log[MORTALITY/(1−MORTALITY)] has not been used, because in-hospital mortality is a rare event, i.e. a large percentage is at one extreme, namely zero and the "logit" transformation is only defined for values strictly between zero and one20.

2.5.2 Definition of Variables measuring Capacity Utilisation

Bed occupancy [OCCUPANCY] is measured as the ratio of cases per dayi in department j of hospitalk to number of beds for department j in hospitalk with patients admitted or discharged only counting as half a day in hospital. This adjustment is necessary, because otherwise admissions and discharges are both used in modelling bed occupancy, i.e. two

16Beds in "Belegabteilungen" are booked for patients from consultants with medical practices outside the hospitals. The consultants are only using the hospital facilities.

17In brackets the names of the defined variables are given.

18Name of the 3M grouper-variable

19See Schlüssel 5 nach Anlage 2 zur §301-Vereinbarung.

20See Papke and Wooldrige (1996), p. 620.

(23)

2.5 Full Data for the Analysis

patients (the one admitted and the one discharge on that day) occupy the same bed.

OCCUPANCYi jk= (Admissionsi jk+Dischargesi jk)/2+Casesi jk

Bedsjk

Staff to patient ratio [STAFF-RATIO] is measured as the ratio of staff with level of quali- ficationlworking in department jof hospitalkto cases per dayiof department jin hospital k.

STAFFRATIOi jkl =STAFFjkl Casesi jk

A realistic modelling of staff to patient ratio has not been possible as it implies to obtain the daily work schedules for each shift of nurses and doctors, which has not been possible.

Hence, certain assumptions are made due to lack of detailed data about staffing:

1. Staffing levels were fixed for the whole year.

2. There is no reduced staffing during weekends and public holidays.

3. The whole staff per department was working everyday.

It is plausible to assume, that OCCUPANCY and STAFF-RATIO impose non-linear ef- fects on the mortality rates, e.g. the incremental effect could reduce with increasing levels.

Thus, for further analyses the squared values of STAFF-RATIO and OCCUPANCY will be used as well to control for these supposed effects. Interaction terms of bed occupancy and staffing ratios with weekends will also be considered in the different regression models.

Figure 2.1 shows the average daily bed occupancy for each hospital throughout the year 2004, while figure 2.2 presents the average weekly bed occupancy for the year 2004. Fig- ure 2.1 demonstrates that daily deviations of bed occupancy hardly exceed 100% in neither hospital, while figure 2.2 proves that bed occupancy decreases over the weekend, which is consistent with admission and discharge behaviour in German hospitals21. Obviously, both figures suggest that bed occupancy is modelled correctly after the applied transformations.

The table A.2 in the appendix shows the average occupancy per department for each hospital in 2004. Three departments in three different hospitals show average bed occupancy of more than 100%, casting doubts on the accuracy of the available structural data.

21See Nüssler et al. (2006), p. 927ff.

(24)

2.5 Full Data for the Analysis

030609012003060901200306090120

1Apr 1Jul 1Oct

1Apr 1Jul 1Oct 1Apr 1Jul 1Oct

1 2 3

4 5 6

7 8

Average Occupancy per Hospital in Percentage

Graphs by hospital

Figure 2.1: Average Bed Occupancy per Hospital 2004

Table 2.4:Descriptive Statistics for Mortality and Capacity Utilisation Hospital Mortality Staff t1 Staff t2 Staff t3 Staff t4 Occupa y Beds Cases

1 0.75% 1.45 0.90 0.29 0.35 79% 79.43 62.46

(3.41%) (2.17) (2.70) (0.58) (0.80) (45%) (42.70) (39.18)

2 0.29% 1.00 0.03 0.14 0.16 66% 71.14 45.26

(1.13%) (0.89) (0.09) (0.11) (0.19) (17%) (33.55) (20.96)

3 0.17% 0.70 0.12 0.09 0.16 77% 60.17 45.50

(0.68%) (0.67) (0.10) (0.07) (0.09) (19%) (13.61) (13.31)

4 0.22% 0.78 0.04 0.15 0.08 82% 57.00 42.93

(0.69%) (0.46) (0.05) (0.09) (0.05) (28%) (29.82) (22.34)

5 0.28% 0.88 0.11 0.11 0.12 75% 49.66 37.02

(2.04%) (0.91) (0.13) (0.12) (0.13) (22%) (18.91) (18.49)

6 0.36% 1.35 0.14 0.14 0.13 62% 85.00 52.95

(0.82%) (0.70) (0.14) (0.07) (0.07) (14%) (0.00) (12.12)

7 0.28% 1.17 0.42 0.43 0.41 83% 75.33 56.27

(1.94%) (0.95) (1.26) (0.83) (0.68) (36%) (56.36) (42.45)

8 0.05% 0.85 0.01 0.07 0.72 71% 21.25 15.54

(1.28%) (1.29) (0.02) (0.18) (0.97) (36%) (17.78) (14.74)

Total 0.31% 1.04 0.28 0.22 0.29 76% 63.87 46.63

(1.91%) (1.17) (1.20) (0.51) (0.58) (31%) (41.72) (32.50)

Note: SD in parentheses

(25)

2.5 Full Data for the Analysis

5060708090Occupancy in Percentage

Sunday Wednesday Saturday

hospital = 1 hospital = 2 hospital = 3 hospital = 4 hospital = 5 hospital = 6 hospital = 7 hospital = 8

Figure 2.2: Average weekly Bed Occupancy for all Hospitals

As already mentioned, the number of beds used for computing bed occupancy refers to planned beds (see section 2.4) which may differ from the actual beds in the respective depart- ments. With an average daily bed occupancy of more than 100% throughout the year 2004, it has to be concluded that structural data for the total number of beds is inaccurate, at least for the respective departments. Therefore, a new dummy variable was generated [DUM_OCCU]

to control for these three departments, which was used for a subgroup analysis.

Table 2.4 shows overall summary statistics for in-hospital mortality and variables measur- ing capacity utilisation including the average number of beds and patients for each hospital.

On average staff to patient ratio is highest for nurses. Surprisingly more medical specialists than doctors with a medical degree are working in the eight hospitals. Bed occupancy is nearly identical with the federal average bed occupancy for 2004 (compare table 1.1: 76%

and 75.5%, respectively).

(26)

2.5 Full Data for the Analysis

2.5.3 Variables used for Risk-Adjustment

In this study the following variables, all referring to the total number of patients per day in each department, were used for risk-adjustment: Mean age [AGE] and mean patient clinical complexity level [PCCL], proportion of males [MALE], proportion of patients who were admitted as an emergency [EMERG]22. All these variables are supposed to affect the out- come in-hospital mortality independently from bed occupancy or staff to patient ratio, i.e.

with increasing levels of e.g. PCCL an increase of mortality rates is expected. Additionally, the proportion of patients who are privately insured [PRIVATE]23 is included in the model.

Median of effective DRG-weight24 [MEFFW], median of all diagnoses [MDDX], and the median for LOS [MLOS] are assumed to be correlated with in-hospital mortality as well.

In addition to these variables generated from the DRG-data set, one other variable used for risk-stratification was defined: Proportion of patients [TOP50] whose main diagnosis was one of the 50 diagnoses with the highest in-hospital mortality rates in Germany. It is expected that with a higher proportion of theses patients, mortality rates will increase. The diagnoses were derived from the Federal Statistical Office (Statistisches Bundesamt Deutschland) and were matched with the ICD2510 Code of the main diagnosis for each patient from the DRG- data.

2.5.4 Other control Variables

To control for effects of an admission during weekends26and public holidays, a binary vari- able was generated: Admission [ADM_WEEKE] on weekends or public holidays27. The variable measures the proportion of patients staying in hospital admitted on weekends. As discussed in chapter 1, international evidence suggests that patients admitted on weekends have a higher risk of dying. Hence, it is suspected that in-hospital mortality rises with an increase of the proportion of patients admitted on weekends.

Dummy variables for the eight different hospitals [DUM_HOSP], for the different de-

22In congruence with Cram et al. (2004) better outcomes for scheduled admissions are assumed.

23The variable PRIVATE is in the model to determine whether the insurance status has an effect on the outcome. It is assumed that privately insured patients may have better access to medical specialists.

24Effective DRG weights are used for calculating revenue per case. Deviations from average LOS influence the effective DRG weights. See Friedrich & Günster (2006), p. 160f.

25ICD stands for international classification of diseases.

26Weekend is defined from midnights Friday until midnights Sunday.

27Identification of public holidays in the different Bundesländer was possible after merging with another data set.

(27)

2.5 Full Data for the Analysis

partments [DUM_DEP] and for weekends including public holidays [WEEKENDS] were generated as well. These account for unobserved differences in hospital departments as fixed differences in mortality levels between the departments are suspected, e.g. an ICU compared with ear-nose and throat department. Table 2.5 shows summary statistics of the variables used for risk-stratification and the percentages of weekend including public holiday admis- sions for each hospital.

The deviation of median LOS from the average LOS in table 1.1 is not surprising due to the computation of LOS in this study. The total LOS for each patient for every day during his or her stay in hospital was used for calculation of median LOS of the total number of patients in each department, i.e. patients with an exceptional long stay have more weight in calculating LOS for each day. Hence, to decrease the effects of outliers, the median was chosen for LOS as well as for the total number of diagnoses [MDDX] and for the effective DRG-weight [MEFFEW].

(28)

2.5 Full Data for the Analysis

Table 2.5: Descriptive Statistics for Risk-Adjustment and control Variables

Hospital Age Male Emer- Private Top50 gencies Insu-

rances

1 64.11 0.54 0.40 0.09 0.50

(6.55) (0.11) (0.14) (0.06) (0.19)

2 48.23 0.48 0.34 0.08 0.34

(28.46) (0.20) (0.17) (0.05) (0.21)

3 46.98 0.43 0.28 0.10 0.26

(24.22) (0.17) (0.15) (0.05) (0.20)

4 54.88 0.32 0.38 0.07 0.24

(17.70) (0.16) (0.17) (0.05) (0.18)

5 60.60 0.49 0.39 0.06 0.36

(12.68) (0.19) (0.19) (0.04) (0.19)

6 66.69 0.42 0.51 0.06 0.33

(6.25) (0.07) (0.15) (0.03) (0.19)

7 49.85 0.53 0.22 0.06 0.20

(19.14) (0.18) (0.18) (0.05) (0.19)

8 56.81 0.40 0.00 0.04 0.07

(13.97) (0.31) (0.05) (0.11) (0.13)

Total 54.32 0.48 0.29 0.07 0.28

(19.76) (0.20) (0.20) (0.06) (0.22)

Hospital Pccl Median Median Median

LOS number effective

of diagnoses DRG-weight

1 2.63 17.59 9.04 2.47

(0.69) (11.07) (4.85) (2.64)

2 2.07 11.72 5.97 1.24

(0.83) (5.45) (2.05) (0.53)

3 1.91 9.96 4.92 1.08

(0.66) (3.08) (1.71) (0.37)

4 1.24 10.77 3.13 1.02

(0.60) (3.31) (1.25) (0.33)

5 2.25 12.63 5.81 1.33

(0.71) (5.51) (1.69) (0.59)

6 2.43 11.44 7.50 1.21

(0.63) (2.91) (1.71) (0.27)

7 1.93 11.68 5.55 2.13

(0.85) (7.93) (3.13) (2.74)

8 0.87 7.64 4.45 1.01

(0.79) (5.40) (1.88) (0.62)

Total 1.97 11.96 5.86 1.61

(0.89) (7.29) (3.11) (1.86)

Note: SD in parentheses

(29)

Chapter 3

Basic econometric Model

This part of the study focuses on the formulation of an appropriate statistical model.

The basic econometric model used in this study for estimating the quality of care in German hospitals can be seen in equation (3.1)1. Equation (3.1) is a linear multivariate re- gression model, which allows to control for multiple factors affecting the dependent variable Yi jk. When simultaneously controlling for these factors, the inference of a relationship of ca- pacity utilisation on quality of care can be made with greater confidence. Thereby, the notion ofceteris paribus2is of great importance. Only if all (known) other factors are held constant, it is possible to estimate whether and to what extent capacity utilisation is correlated with quality of care3.

Yi jk01Ci jk2Xi jk3Di4Dj5Dki jk (3.1) The subscripts of equation (3.1) are a time index for daysi=1, . . . ,366; a department index j=1, . . . ,24, and finally a hospital indexk=1, . . . ,8 withβbeing the coefficients of interest.

The term Yi jk the estimated quality of care. Ci jk is the main independent explanatory variable for capacity utilisation, either referring to bed occupancy or staff to patient ratio. The variableXi jk controls for the case-mix. Di, Dj, andDk are indicator (or dummy) variables controlling for unobservable effects of weekends/public holidays, department and hospital

1The model will be specified in the following chapters.

2Ceteris paribus means "holding other factor fixed" (see Wooldrige (2006), p. 13).

3See Wooldrige (2006), p. 73.

(30)

3.1 The Panel Regression Models

characteristics, respectively. The composite error (or disturbance) term νi jk is estimated to be distributed identically and independently overi, j, and k with an average of zero and a variance ofσ2ε.

The units of observation are the different departments in each hospital (in total 58 depart- ments in the eight hospitals) between 01.01.2004 and 31.12.2004. For estimating the effects of capacity utilisation on in-hospital mortality [MORTALITY], different models of multi- variate regression analyses are used. Equation (3.1) will be estimated with a pooled ordinary least squares (OLS) regression model. This approach ignores the effect of data being a panel by simply combining the time series and cross-sectional data (pooled regression model). In an OLS regression model, the composite error has to be uncorrelated with the independent variables, otherwise the estimated coefficients are not consistent. The resulting bias of the pooled OLS regression is called the heterogeneity bias, which is a bias resulting from time- constant omitted variables4. Another approach uses the whole structure of the panel data.

Panel data has the advantage of observing changes in the dependent variable over time as well as the variation in different units at a given point in time. Additionally, panel data can diminish problems which arise from omitted variables given these are constant over time5. Hence, in addition to the OLS regression model, different multivariate regression models explicitly used for panel data are discussed in the following section.

3.1 The Panel Regression Models

The general panel model is shown in equation (3.2):

Yi jk01Ci jk2Xi jk3Di4Dj5Dkjki jk (3.2) The difference between equations (3.1) and (3.2) manifests only in the error terms. The error term of equation (3.1)νi jk, which is often called the composite error6, is the sum of two error terms:

νi jkjki jk (3.3)

4See Wooldrige (2006), p. 462.

5See Pindyck & Rubinfeld (1998), p. 250f.

6See Wooldrige (2006), p. 462.

(31)

3.1 The Panel Regression Models

αjk is the so called unobserved or fixed effect. This term (Wooldrige, 2006, p. 461) "cap- tures all unobserved, time constant factors that affect"Yi jk, while υi jk is often called the idiosyncratic error or time-varying error7. The time-constant error term αjk captures all omitted variables, i.e. all time-constant unknown variables affecting the quality of care in each department. One possible way of eliminating the fixed-effectαjk is a time-demeaning transformation. From every variable in equation (3.2) the time-demeaned average is sub- tracted and becauseαjk is a constant factor, it is eliminated from the equation like any other time constant parameter8. This so called "fixed-effects" (FE) model allows for correlation betweenαjk and independent variables. Eachαjk can be interpreted as a separate intercept for the cross-sectional units (departments) and therefore, using a FE model means allowing a different intercept for each department9. The FE model is shown in equation (3.4):

Yi jk01Ci jk2Xi jk3Dii jk (3.4) Panel data can also be analysed using the "random-effects" (RE) model. A random-effects model is chosen, when assumingαjk is uncorrelated10 with each explanatory variable for each time period (in contrast to the "fixed effect" model11). If this is the case, fixed-effects and first-differencing, which will be introduced in the last section of this chapter, would result in inefficient estimators. However, on the other hand, a RE model means having a serially correlated composite error term, becauseαjkis part of the error term for every time period. Normally, the generalised least squares (GLS) used in the random-effects model does allow correcting the standard errors as well as the test statistics, but for the GLS to have good properties, it is necessary to have more units (departments) than time points (days), which is not the case in the data used for analysis12. However, the FE model struggles with the same problems; any violations of the assumptions for applying fixed-effects like serial correlation of the error terms (see section 3.2.4) can cause inefficient estimators, especially in data with more time units than cross-sectional units13.

Choosing the appropriate model is therefore difficult. It is common practice to per-

7Hence, there is no time subscript forαjk, while there is a time subscript forυi jk. See Wooldrige (2006), p. 461.

8See Wooldrige (2006), p. 485f. for a more thoroughly explanation of time demeaning.

9See Wooldrige (2006), p. 492f.

10Hence, the random effects equation does not differ from equation (3.2). See Wooldrige (2006), p. 494.

11See Wooldrige (2002), p. 252.

12See Wooldrige (2006), p. 494.

13See Wooldrige (2006), p. 492.

(32)

3.2 Regression Diagnostics

form both regression models with the time-varying variables and then use a test proposed by Hausman14. The Hausman test is testing the key assumption of the random effects model thatαjk is uncorrelated with each explanatory variable15. With p>0.0000 the Hausman test was highly significant and therefore, the hypothesis of no correlation had to be rejected.

Hence, for further analyses, the FE model was used.

It is also possible to eliminate the fixed effect with a so called "first-differencing" (FD) transformation. The data is differenced across two successive days16, i.e. equation (3.2) in time periodi=1 is subtracted from equation (3.2) in time periodi=2. Becauseαjk is fixed in both time periods, the unobserved error term disappears in equation (3.5):

∆Yi jk0+∆β1Ci jk+∆β2Xi jk+∆β3Di+∆υi jk (3.5) The first-differenced equation (3.5) can be analysed using OLS assuming the idiosyncratic error∆υi jkis uncorrelated with the explanatory variables in the two time periods. Obviously, when using FD estimations, the first time period is lost17, because nothing can be subtracted fromi=1 and time constant terms can not be used in the regression model, because the effect ofαjk on the dependent variable could not be separated from the effect of any other time- constant variable (like in the FE model18). The interceptδ0in equation (3.5) is the difference of the intercept in two successive time periods. The dummy variable used for controlling the effects of weekends or public holidays remains in the model, because it changes over time (just like in the FE model). There is one main difference between the FD model and the other models discussed in this section. The FD model estimates (Wooldrige 2006, p. 464)

"how changes in the explanatory variables over time affect the change iny over the same time period".

3.2 Regression Diagnostics

Linear regression models rely on certain assumptions, normally called the Gauss-Markow assumptions19. Certain assumptions are tested for adjusting the models and for allowing

14See Hausman (1978), p. 330ff.

15See Wooldrige (2006), p. 498.

16See Wooldrige (2006), p. 463.

17See Wooldrige (2006), p. 471.

18See Wooldrige (2006), p. 463.

19See Wooldrige (2006), p. 64f.

(33)

3.2 Regression Diagnostics

valid inferences from the obtained coefficients. Regression diagnostics have been performed for both, bed occupancy and staff to patient ratio. If not stated, the differences in the models were not substantial.

3.2.1 Testing for Multicollinearity

Multicollinearity does not truly violate any of the Gauss-Markow assumptions. It is defined as (Wooldrige 2002, p. 102) "high (but not perfect) correlation between two or more inde- pendent variables". However, with multicollinearity certain issues can arise20:

The coefficient estimates are strongly affected by small changes in the data.

Coefficients may have higher standard errors as well as low significance levels.

The sign of the coefficient may be affected or even the magnitude can be implausible.

Relatively high correlations between explanatory variables could indicate a problem with multicollinearity in the regression models. The table (3.2) shows Pearson’s correlation for variables used in the models21.

As table 3.2 shows, multiple variables have relatively high statistically significant cor- relations. Hence, multicollinearity was tested in the regression models with the variance inflation factor (VIF)22. A VIF of more than 10 could raise concerns about multicollinear- ity23. After running the OLS-regression including dummies for weekends, hospitals and departments, but without interactions VIF was computed. Table 3.1 shows the results of the VIF.

Obviously, there is high correlation between dummies for departments and AGE. How- ever, both variables are just in the model to control for (un)known factors affecting MOR- TALITY, so the high collinearity imposes no problem on the model.

3.2.2 Testing for Normality of the Error Term

A normal distribution of the error term is necessary for validity of the estimators ˆβi jk. If the error term is not normally distributed, than the ˆβi jk will also not be normally distributed, i.e.

thet and F statistics will not have the necessaryt or F distributions, respectively24. Most

20See Greene (2000), p. 256.

21See Pindyck & Rubinfeld, p. 96ff or Wooldrige, p. 102 ff.

22See Greene (2000), p. 257f.

23See Chen et al. (2007), chapter 2

24See Wooldrige (2006), p. 181.

(34)

3.2 Regression Diagnostics

Table 3.1:Variance Inflation Factor

Variable VIF 1/VIF

Age 25.5 0.039208

Departm6 12.97 0.07709 Departm15 12.39 0.080692

Departm7 8.66 0.1155

Meffgew 8.14 0.12285

Pccl 8.02 0.124649

Mddx 7.9 0.126606

Variables with smaller VIF omitted Mean VIF 4.32

likely the error terms are not normally distributed, if the dependent variable does not have a normal distribution. Given that mortality is a rare event even in hospitals, this is most likely the case. Hence, after running an OLS regression, a visual inspection of the predicted residuals was performed (see figure 3.1) using a graph plotting the quantiles of the error term against the quantiles of a normal distribution. This graph is especially sensitive to a deviation of normality near the tails of the normality distribution25. As expected, the residuals show a substantial deviation from the normal distribution.

One approach to correct for a non-normal distribution of the error term is correcting the distribution of the dependent variable with linear transformations. However, a rare event can not be transformed into having a normal distribution; hence it is necessary rely on the central limit theorem (CLT), i.e. for large enough sample sizes asymptotic26 normality can be assumed27and therefore, thet andF statistics of the OLS model remain applicable.

3.2.3 Testing for Heteroskedasticity

As stated the variance of the composite error termνi jkis assumed to be constant (homoskedasticity- assumption). However, this assumption may be unreasonable for error terms of units with different sizes28. It seems plausible that the error terms of departments with more beds have larger variance than the ones of smaller departments. Therefore, a first OLS-regression was

25See Chen et al. (2007), chapter 2.

26See Wooldrige (2002), p. 40f. for a discussion of asymptotic normality.

27See Wooldrige (2006), p. 182.

28See Pindyck & Rubinfeld (1998), p. 146f.

(35)

3.2 Regression Diagnostics

020406080100Residuals

−10 −5 0 5 10

Inverse Normal

Figure 3.1: Quantile Normality Plot

performed29 with a graphical method for detecting heteroskedasticity30. In figure 3.2 the residuals of the model using bed occupancy are plotted against the fitted values, while fig- ure 3.3 plots the residuals of the model using staffing ratios against the fitted values. In a well-fitted model, there should be no recognisable pattern in this graphic. It seems obvious, that (1) with the typical triangular pattern31, the assumption of homoskedasticity can not be maintained and (2) the regression model of mortality on staffing ratios is more strongly af- fected by heteroskedasticity. Hence, the Breusch-Pagan (BP) test32 was performed for both models to test the hypothesis of homoskedasticity in the disturbance term.

The BP tests were significant, i.e. the null of a homoskedastic variance of the error termνi jk had to be rejected in both models. The diagnosed heteroskedasticity does neither affect the consistency or unbiasedness of the OLS estimators ˆβi jk, nor the goodness-of-fit measureR2. However, the estimators of the variance of estimated ˆβi jk are biased with het- eroskedasticity. The variances of ˆβi jkare used for computing the OLS standard errors, which

29The regression was performed without any interactions or non-linear variables.

30See Chen et al. (2007), chapter 2.

31See Backhaus et. al (2003), p. 85.

32See Wooldrige (2006), p. 280ff.

(36)

3.2 Regression Diagnostics

0.2.4.6.81Residuals

−.02 0 .02 .04 .06 .08

Fitted values

Figure 3.2: Residuals for Bed Occupancy plotted against fitted Values

are not valid under heteroskedasticity for constructing the respective confidence intervals andt statistics. Hence, the only difference between the usual OLS-Model and the one ac- counting for heteroskedasticity is the way the standard errors are computed33. Therefore, a heteroskedasticity-robust procedure for the different models had to be used34. A correction of heteroskedasticity by using GLS35 did not appear to be sensible due to the small number of departments and the large number of time points (see section 3.1).

Figure 3.2 also shows a possible slightly deviation of the error term from the expected value of zero. However, this imposes no serious problem on the model, because the estimated slope parameters ˆβi jkwill not be changed with an error value unequal to zero. The intercept will pick up this effect36.

33See Wooldrige 2006, p. 274.

34See Wooldrige 2006, p. 271f.

35See Wooldrige (2006), p. 472.

36See Pindyck & Rubinfeld, p. 146.

Referenzen

ÄHNLICHE DOKUMENTE

Proposition 3c: The negative relationship between foreign physicians’ perceived dissimilarity and job performance/intention to leave is moderated by the interaction

Purpose To construct a pain classification model using binary logistic regression to calculate pain probability and monitor pain based on heart rate variability (HRV)

In an environment of weak domestic demand and low- capacity utilisation, exporting firms increase their efforts to shift sales from domestic to export markets or

The Logistic Regression will implement both ideas: It is a model of a posterior class distribution for classification and can be interpreted as a probabilistic linear classifier.

The Logistic Regression will implement both ideas: It is a model of a posterior class distribution for classification and can be interpreted as a probabilistic linear classifier..

The aims of this research program were (1) to develop a conceptual framework of implicit rationing of nursing care and an instrument to measure it as an empirical factor; (2)

• Dann ist das TSP das Problem eine geschlossene Tour minimaler Kosten zu finden, die durch jede Stadt genau einmal führt... Ant Colony Algorithmen

In this paper, we review the elements involved in establishing traceability in a development project and derive a quality model that specifies per element the acceptable