Multiple testing procedures for identifying desirable dose

(1)

Multiple testing procedures for identifying desirable dose

combinations in bifactorial designs

Multiple Testprozeduren zur Identifikation sinnvoller

Dosiskombinationen in bifaktoriellen Plänen

• Bettina Buchheister¹• Walter Lehmacher¹

Die von Hung, Chi und Lipicky vorgeschlagenen AVE- und MAX-Tests ermöglichen die Analyse in einem bifaktoriellen Plan, ob Kombinationen von zwei Medikamenten mit verschiedenen Dosen existieren, die die wünschenswerte Eigenschaft der Über- legenheit zu beiden Einzelkomponenten besitzen. Die Tests sind jedoch globale Tests und können keine speziellen Dosis-Kombinationen identifizieren, die effektiver sind als ihre beiden Einzelkomponenten. In dieser Arbeit werden multiple Testprozeduren vorgeschlagen, die auf linearen Kontrast-Tests und dem Abschlusstest-Prinzip be- ruhen. Ein Vergleich mit simultanen Min-Tests nach Laska und Meisner wird angestellt sowie das Verhalten dieser Ansätze anhand von Simulationsstudien untersucht.

Schlüsselwörter:Medikamenten-Kombination, Min-Test, bifaktorielle Pläne, AVE- und MAX-Test, Abschlusstest-Prozedur, Linearer Kontrast-Test, experimentweises Signi- fikanzniveau

Hung, Chi, and Lipicky proposed the AVE and MAX tests to analyse in a bifactorial design whether combinations of two drugs at several doses fulfil the desirable property of superiority to both their single drug components. These are global tests and do not identify the special combinations which are more effective than their respective single components. Here multiple testing procedures based on linear contrast tests and on the closed testing principle will be presented. They will be compared with simultaneous Min tests of Laska and Meisner. The performance of these approaches is investigated by simulation studies.

Keywords:drug combination, Min test, bifactorial design, AVE- and MAX test, closed testing procedure, linear contrast test, experimentwise error rate

1Institute for Medical Statistics, Informatics and Epidemiology, University of Cologne, Cologne, Germany

OriginalArticle

(2)

Introduction

According to the guidelines of the U. S. Food and Drug Administration (FDA, CFR 300.50), one of the require- ments for approving the use of a drug combination is that each component must make a contribution to the claimed effects. Analogously the guideline for combination drugs of the European Agency for the Evaluation of Medical Products (EMEA, CPMP/EWP/240/95) requires that the benefit/risk assessment of the fixed combination is equal or exceeds the one of each of its substances taken alone. That means, the combination needs to be simultaneously more effective than its single components.

This property of superiority may be tested by using the Min test of Laska, Meisner [1], [2] in (2x2) factorial design trials where each component of the combination is chosen at some fixed dose level based on prior in- formation. Due to unknown potential interactions of the components the preselection of the dose combination is often difficult. Therefore multi level factorial designs involving simultaneous multiple dose combinations are demanded.

In case of more than one combination drug, however, there is a multiple testing problem and two questions are posed: (1) Globally: Is there any combination which fulfils the property of superiority to their single components? - Here one global hypothesis is tested. (2) Lo- cally: Which specific combinations fulfil this property?

- In this case test procedures controlling the experimentwise error rate α are required. That means the probability of at least one wrong inference should be controlled by the error rate α.

Neither dose response analyses nor general analysis of variance with interactions are suited for answering these two questions. Approaches which solely compare the effect of the combinations with the effects of their components are interesting. Hung, Chi, Lipicky [3] and Hung [4] developed two global test procedures which protect the overall type I error rate. For the local problem they only recommended to use a adjusted simultaneous Min tests according to the Hochberg [5]

procedure.

In this article new procedures based on the closed testing principle will be presented. They are formed of different families of elementary hypotheses and allow for multiple testing hypotheses in a step down manner.

Specific linear contrast tests were used. Furthermore, a local maximum test based on the global MAX test from Hung et al. [3] will be developed and a modification of simultaneous Min tests will be suggested. All procedures control the experimentwise error rate α.

The performance of the proposed test procedures is

investigated by simulation studies and suggestions for practical applications will be formulated.

• Notation

Consider a two factorial design with I dose levels of drug A and J dose levels of drug B. Let µ_ij, i=0, …, I and j=0, …, J, denote the true mean responses of the dose combination (i, j), whereby high values of µ_ij's indicate benefit. (i, 0) and (0, j) denote the single drug components (Table 1).

Table 1: Scheme of a bifactorial dose combination design

Let µ_ij be estimated by the group mean , where X_ijk, k=1, …, n_ij, is the observed effect of the k-th subject in the (i, j)-th dose combination group. n_ijis the sample size of the (i, j)-th dose combination group. Assuming variance homogen- eity, the pooled estimator of σ²is given by

There are 2 • IJ marginal hypotheses, which compare each combination with one of its single component:

versus

versus .

Examining the claimed combination superiority, IJ local combination hypotheses can be formulated as union hypotheses of the two marginal hypotheses:

versus

These IJ local combination hypotheses should be tested controlling the experimentwise error rate α. If

OriginalArticle

(3)

the global testing problem is considered the global hypothesis is

.

Previous approaches

Up to now, two approaches have been published. In case of only one combination drug Laska and Meisner [1], [2] suggest the Min test. For the general (I+1) x (J+1) design, two global tests are proposed by Hung, Chi, Lipicky [3].

• Laska-Meisner Min test

In the simple case (I = J = 1) only one combination drug is observed and the hypothesis of interest is the following union hypothesis:

versus

H^AB will be rejected if both marginal hypotheses H^A and H^Bare rejected at level α, using appropriate test statistics. This so-called Min test is a test for the simple combination drug problem with experimentwise level α.

Under rather mild conditions Laska and Meisner [1], [2] showed, that this test is the uniformly most powerful within the class of monotone level α tests. A general- ization for testing union hypotheses with more than two hypotheses is possible. By the extended Min test procedure a union hypothesis is rejected if each partial hypothesis can be rejected at level α.

• Global tests from Hung, Chi and Lipicky

In the general (I + 1) x (J + 1) case, a multiple testing problem arises. Two global tests are presented by Hung, Chi, Lipicky [3]. Their test statistics are based on the minimum gains over all dose combinations . The tested global hypo-

t h e s i s v e r s u s

is equivalent to H₀.

The "AVErage" global test statistic T_AVEis defined by the average of the observed minimum gains, and the

"MAXimum" global test statistic T_MAXis the maximum of the observed minimum gains. That is,

where S is the pooled estimator of σ and the observed mean effect of the combination (i, j). Both tests are one-sided level α tests, requiring a balanced design and normally distributed data with homogeneous variances. The distributions of T_AVE and T_MAX are derived by Hung et al. [3]. A more precisely and extended table of critical values of the tests as that given by Hung et al. [3] is presented in Table 2. The tests, however, are not concerned with the multiplicity of the testing problem of the IJ local combination union hypotheses.

Note that these two global tests are developed for balanced designs, but two modified global tests in case of unequal sample sizes are provided [6].

New approaches

In practice, one might be often interested in the local question: Which dose combination(s) have the property of superiority to their respective components? There- fore, multiple procedures to find desirable combinations which control the experimentwise error rate are required.

• Closed testing procedure of IJ local combination hypotheses

Consider the IJ local combination hypotheses as elementary hypotheses. Constructing a closed system of hypotheses, the global hypothesis is the intersection of all IJ local combination hypotheses H₀. Therefore each global test for H₀is a competitor to the AVE and MAX global tests. The family of all intersections of the IJ local combination hypotheses can be tested at the α level by using a step down procedure.

Two examples of closed system of hypothesis with local combination hypotheses are given in Figure 1 and Figure 2. In latter only the hierarchy of hypothesis is presented; arrows are omitted for sake of clearness.

OriginalArticle

(4)

Table 2: Level α critical values of AVE and MAX tests in a balanced design

Figure 1: Closed system of hypotheses with local combination hypotheses, (2x3)-design

Figure 2: Closed system of hypotheses with local combination hypotheses, (3x3)-design

Notice, that all hypotheses are intersection union hypotheses. But most of the general used level α test procedures are constructed for union intersection hypotheses. Thus each intersection union hypothesis

must first be transformed into union intersection hypothesis by the rules of elementary set theory algebra.

Afterwards generalized Min tests may be used.

Specific level α tests for the intersection hypotheses based on linear contrast tests will be specified later on.

• Closed testing procedure of 2 • IJ marginal hypotheses

Consider the 2 • IJ marginal hypotheses and , which compare the effect of the combination with the effect of one of its components, as elementary hypotheses. Then, a system of hypotheses closed under intersection can be constructed (e. g. Figure 3). This system of hypotheses contains hypotheses and is substantially larger than the system of hypotheses constructed by the IJ elementary local combination hypotheses (cf. Figure 3). However, it contains only intersection hypotheses without unions which are easier to test. The family of all intersections of the 2 • IJ marginal hypotheses will be tested by a step down procedure. Subsequently a local combination hypothesis can be rejected by the Min test principle, if both of its marginal hypotheses and

OriginalArticle

(5)

Figure 3: Closed system of hypotheses with marginal hypotheses, (2x3)-design are rejected by the step down procedure using

level α tests.

The global hypothesis of this closed system of hypotheses is the intersection of all marginal hypotheses and differs from H₀:

.

Accordingly, a global test for is not a competitor to the AVE and MAX global tests. This test procedure allows one only to answer the local question.

• Two simultaneous closed testing procedures for each drug

Consider two simultaneous closed testing procedures generated by the IJ marginal hypotheses and the IJ marginal hypotheses (e. g. Figure 4).

Figure 4: Closed systems of hypotheses for each drug with marginal hypotheses drugs, (3x3)-design This procedure includes the advantages of both approaches mentioned above: Both systems of hypotheses are as small as in the first approach, and there are only intersection hypotheses without unions as in the second approach. But the disadvantage is that an α adjustment is required in order to control the overall error rate α. The two families of hypotheses will be tested separately for drug A and for drug B by step

OriginalArticle

(6)

down procedures at level α/2. Finally Min tests can be applied to test the local combination hypotheses.

As in the approach before both global hypotheses and and their union or intersection differ from H₀. This approach does not test the global question.

• Simultaneous Min tests and a modification

A common procedure to answer the local question by controlling the overall error rate are simultaneous Min tests. Each local combination hypotheses will be simultaneous tested using the Min test from Laska, Meis- ner [2] at an adjusted level α* ≤ α . Several α adjust- ments are described in the literature (e. g. Bonferroni [7], [5] or [8], [9]). Simultaneous Min tests also belong to the class of closed testing principle. An α adjustment by Holm is e. g. a closed testing procedure using the Bonferroni inequality at each step.

Lehmacher [10] and Lehmacher, Wassmer, Reitmeir [11] propose a modification which is a short cut version of a closed testing procedure. Their suggested approach is a two step procedure where both the global and the local combination hypotheses are tested. A combination drug fulfils the property of superiority over its components if the global hypothesis can be rejected at level α and the corresponding local combination hypothesis can be rejected using the modified Bonfer- roni-Holm procedure with modified levels α/(IJ-1), α/(IJ- 1), α/(IJ-2), …, α/2, α.

• Special linear contrast tests

When testing superiority of combination drugs in a multi level two factorial design not all pairwise comparisons of treatments will be considered, but the 2 • IJ comparisons of combination drug with their components. Therefore, in the above described closed testing procedure partition hypotheses will be tested where two or more disjunctive hypotheses are intersected

(e. g. . Usual test

statistics like, e. g., F-tests do not apply. In order to control the experimentwise error rate, partition hypotheses can be tested by multiple tests with α adjustment. Another possibility is to use a special linear contrast test which could be less conservative and more applicable.

Thus, each hypothesis in the closed test procedures will be tested by a specific linear contrast statistic. In case of testing an intersection of union hypotheses a transformation in an union of intersection hypotheses is required. The test statistics of the partition hypo-

theses will be constructed by averaging the corresponding marginal hypotheses. That is, the suitable contrasts c_ij will be calculated as the sum of the differences between the effect of the combinations and the effect of their single components. The test statistic is given by

where

Index set π₁and π₂ {(i, j) |i=1,…, I; j=1,…,J} and |π₁| and |π₂| denote the number of elements in π₁and π₂, respectively. S is the pooled estimator of σ for all treatment groups with c_ij 0.

In case of normally distributed data with homogeneous variances T is t-distributed with

degrees of freedom.

• Local MAX test

The MAX test from Hung, Chi, Lipicky [3] is a global test. Here an extension to the local question, the local MAX test will be developed. The test statistic T_MAXof the global MAX test is based on the combination drug with the maximum observed minimum gain over its components (in the following called "MAX-combination"). Rejecting the global hypothesis (T_MAX> ) at least the MAX-combination fulfils the property of superiority to their respective components. But indeed there could be other combinations which fulfil this property too.

The idea of the local MAX test is to test all local combination hypotheses against the critical value . Thus, each combination drug whose local test statistic is greater or equal to fulfils the property of combination superiority.

OriginalArticle

(7)

The local MAX test is a step up procedure (cf. [12]) based on the ordered local test statistics T₍₁₎ ≤ … ≤ T_(IJ)= T_MAXand a fixed critical value C:

step 1: test the local combination hypothesis by the critical value C:

2. step

Stop! Reject all local combination hypotheses

step 2: test the local combination hypothesis by the critical value C:

3. step

Stop! Retain and reject all , i = 2,..., IJ

step n: test the local combination hypothesis by the critical value C:

(n = 3,...,IJ) n + 1. step

Stop!Retain and

reject all .

It is obvious that the local MAX test controls the experimentwise error rate α when the critical value C is

. Consider any true hypothesis , then error rate

= P (Reject )

= 1 - P (Retain )

= 1 - P (T₍₁₎< C, …, T_(i)< C).

Under this error rate is clearly maximum and from this follows that α = 1 - P (T₍₁₎< C, …, T_MAX< C) and consequently C = .

Simulation studies of statistical power

All approaches described above control the experimentwise error rate α. The power of the following methods for the two posed questions is compared by simulation.

The global hypothesis H₀is tested by:

• Average (AVE) and maximum (MAX) test,

• Linear contrast test for the transformed global hypothesis H₀(GCo) and

• Multiple test of Simes [13] using Min tests (SIM).

The IJ local combination hypotheses are tested by:

• Simultaneous Min tests according to the Hochberg procedure (simMin),

• Simultaneous Min tests according to the modified Holm procedure after rejection of the global hypothesis (MinAVE, MinMAX, MinGCo, MinSIM),

• Closed test procedure of IJ local combination hypotheses (CTPloc),

• Closed test procedures of 2 • IJ marginal hypotheses (CTPmar),

• Two simultaneous closed test procedures at level α/2 (TwoCTP) and

• Local MAX test (loMAX).

The simulations are based on normally distributed data with different means, homogeneous variances σ = 1, balanced design n_ij = 30 and significance level α = 0.05. All approaches are very conservative. There is no procedure which is uniformly more powerful than the other ones. Nevertheless depending on the kind of design and a priori informations, suggestions for practical applications can be formulated.

In this paper the results of simulation are mainly qual- itatively described. Some quantitative results for a (2x3) as well as for a (3x3)-design (Table 3) are presented as an example. Detailed power analyses are given by Buchheister [14].

In case of the global question the results clearly show that the global contrast test (GCo) may substitute the global AVE test. GCo is the most powerful test in situations where as yet the global AVE test was more powerful than the global MAX test (see Figure 5).

An example of comparison of power simultaneous in different situations for (2x3)-design is given in Figure

OriginalArticle

(8)

Table 3: (2x3) and (3x3)-design (examples)

Figure 5: Power of global tests in (3x3)-design of Table 3when all combinations are similar more effective than their components (δ₁₁=δ₁₂=δ₂₁=δ₂₂=δ)

6. There the black areas describe situations where the global contrast test is more powerful than the global MAX test and in grey areas is power(GKo) <

power(MAX).

The global contrast test (GCo) is recommended when all combination drugs fulfil the property of superiority or only few combinations do not and the others are similarly strong effective than their components. Other- wise the global MAX test of Hung, Chi, Lipicky [3] is suggested. In case of very large multi level two factorial design, when the tables of critical values for the global MAX test (cf. Table 2 or [3]) do not suffice,

for practical reasons the multiple test of Simes (SIM) will be recommended. The loss of power is negligible.

The results of the simulations, concerning the local question are quite difficult to summarize. Comparing the three new closed testing procedures among each other shows that the closed testing procedures of IJ local combination hypotheses (CTPloc) are often the most powerful one. But in large multi level designs this closed testing procedure is hardly used in practice because of many transformations of intersection union hypotheses in union intersection hypotheses. In contrast the practical application of the closed testing procedures CTPmar and twoCTP is much easier.

OriginalArticle

(9)

Figure 6: Difference of power of global MAX test (MAX) and global contrast test (GKo) in (2x3)-design of Table 3 Therefore the closed testing procedure of 2 • IJ mar-

ginal hypotheses is preferable, except that there is no interest in the global hypothesis. The loss of power in relevant situations is negligible too. The approach of two simultaneous closed testing procedures at level α/2 is very conservative in small designs. However, it becomes interesting in larger designs because of its manageable systems of hypotheses.

Depending on the number of dose combinations which fulfil the property of superiority and the size of their effectiveness, different most powerful procedures can be specified:

1) In case that all or nearly all combinations are similar more effective than their components the three new closed testing procedures have the largest power. In this situation the closed testing procedure of the 2 • J marginal hypotheses is suggested for small designs and two simultaneous closed testing procedure at level α/2 are suggested in case of large designs.

2) If the sizes of effectiveness of the dose combinations differ a lot, but there are only few combinations which are not simultaneously more effective than all of their single components, simultaneous Min tests

like simMin, MinMAX or MinSIM are more powerful.

The power of these three procedures is similar.

3) Otherwise, if only few dose combinations fulfil the property of superiority the local maximum test has the largest power. Using the tabulated level α critical value (cf. Table 2 or [3]) it is easy to apply.

These suggestions are simplified and summarized in Table 4 and an example is given in Figure 7.

If no a priori informations about the effect of the combinations are known it will be difficult to assume that all dose combinations are similarly effective to their components, especially in larger multi level two factorial designs. Therefore concerning simplicity, ro- bustness and power, the new global linear contrast test with subsequent α adjusted simultaneous Min tests according to the modified Holm procedures (MinGCo) are a good compromise and thus the procedure of choice in this situation. The loss of power by using a simultaneous Min test in case of similarly strong effects is small.

OriginalArticle

(10)

Table 4: Suggestions for approaches depending on design and asked question

Figure 7: Power of local tests in (3x3)-design of Table 3with (δ₁₁=0, δ₁₂=0.5δ, δ₂₁=0, δ₂₂=δ)

Discussion

When more than one dose combination should be tested if they are simultaneous more effective than all of their single components, a multiple testing problem arises. Different test procedures concerning the global or the local approach can be used. Due to the complexity of the problem, an optimal procedure cannot be recommended in general (cf. [14]).

It is not really worse retaining the use of the hitherto most used global tests of Hung et al [3]. Beside a very easy quick extension of the global MAX test to the local question is the here called local MAX test. Any- way there are some advantages of the closed testing procedures. With the closed testing procedures there

is less restriction to the data in contrast to the two global tests from Hung et al [3]. which requires normally distributed data with homogeneous variances.

Each intersection hypothesis may be tested by suitable level α tests regarding the nature of the data. Because of the numerous partition hypotheses in the closed testing procedures for testing the property of combination superiority it is recommended to use special contrast tests. In case of variance heterogeneity Welch-type modifications of the tests can be easily applied. In case of binary data Gauss tests can be used. Note also that applying closed testing procedures no placebo dose combination must be included in the study. This can be important when there are ethical or medical problems to administer a placebo.

OriginalArticle

(11)

In view of the complexity and the multiplicity of the problem (cf. the application in [15]) a sequential design as presented by Lehmacher, Kieser, Hothorn [16] could be more advantageous, because during the conduct of the study drug combinations can be skipped.

This paper is focussed on the multiple identification of desirable dose combinations. There are multiple de- cision procedures and related simultaneous confidence intervals are not available. Another related topic is the identification of minimum effective doses; multiple testing procedures for this problem are proposed by Hellmich and Lehmacher [17].

Corresponding author:

• Walter Lehmacher, Institute for Medical Statistics, Informatics and Epidemiology, University of Cologne, 50924 Cologne, Germany

Walter.Lehmacher@Uni-Koeln.de

References:

[1] Laska EM, Meisner MJ. Testing whether an identified treatment is best: The combination problem. Proceedings of the Biopharmaceutical Section of the American Statistical Association. 1986;163-70.

[2] Laska EM, Meisner MJ. Testing whether an identified treatment is best. Biometrics. 1989;45:1139-51.

[3] Hung HMJ, Chi GYH, Lipicky RJ. Testing for the existence of a desirable dose combination. Biometrics. 1993;49:85-94.

[4] Hung HMJ. Testing for existence of desirable dose combination (Correspondence). Biometrics. 1994;50:307-8.

[5] Hochberg Y. A sharper Bonferroni procedure for multiple tests of significance. Biometrika. 1988;75:800-2.

[6] Hung HMJ. Evaluation of a combination drug with multiple doses in unbalanced factorial design clinical trials. Statist Med. 2000;19:2079-87.

[7] Holm S. A simple sequentially rejective multiple test procedure. Scand J Statist. 1979;6:65-70.

[8] Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika.

1988;75:383-6.

[9] Hommel G. A comparison of two Bonferroni procedures.

Biometrika. 1989;76:624-5.

[10] Lehmacher W. Verlaufskurven und Crossover.

Heidelberg: Springer; 1987.

[11] Lehmacher W, Wassmer G, Reitmeir P. Procedures for two-sample comparisons with multiple endpoints controlling the experimentwise error rate. Biometrics. 1991;47:511-21.

[12] Tamhane AC, Hochberg Y, Dunnett CW. Multiple test procedures for dose finding. Biometrics. 1996;52:21-37.

[13] Simes RJ. An improved Bonferroni procedure for multiple tests of significance. Biometrika. 1986;73:751-4.

[14] Buchheister B. Statistische Methoden zum Nachweis der Effektivität von Kombinationspräparaten [Dissertation].

Köln: Medizinische Fakultät der Universität zu Köln; 2001.

[15] Letzel H, Blümner E. Bivariate Dosis-Wirkungs-Beziehungen für ein

Kombinationsantihypertensivum: Biometrische Erfahrungen mit einem komplexen Studienmodell. In: Baur MP et al.:

Medizinische Informatik, Biometrie und Epidemiologie, 41.

Jahrestagung der GMDS, Bonn. München: Urban und Vogel;

1997. p. 382-6.

[16] Lehmacher W, Kieser M, Hothorn L. Sequential and multiple testing for dose-response analysis. Drug Inf J.

2000;34:591-7.

[17] Hellmich M, Lehmacher W. Closure Procedures for Monotone Bi-Factorial Dose-Response Designs. Biometrics.

2005;61:270-7.

OriginalArticle