• Keine Ergebnisse gefunden

High-confidence nonparametric fixed-width uncertainty intervals and application...

N/A
N/A
Protected

Academic year: 2022

Aktie "High-confidence nonparametric fixed-width uncertainty intervals and application..."

Copied!
29
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=lsqa20

Sequential Analysis

Design Methods and Applications

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/lsqa20

High-confidence nonparametric fixed-width

uncertainty intervals and applications to projected high-dimensional data and common mean

estimation

Ansgar Steland & Yuan-Tsung Chang

To cite this article: Ansgar Steland & Yuan-Tsung Chang (2021) High-confidence nonparametric fixed-width uncertainty intervals and applications to projected high-dimensional data and common mean estimation, Sequential Analysis, 40:1, 97-124, DOI: 10.1080/07474946.2021.1847966 To link to this article: https://doi.org/10.1080/07474946.2021.1847966

© 2021 The Author(s). Published with license by Taylor & Francis Group, LLC Published online: 11 Mar 2021.

Submit your article to this journal

Article views: 159

View related articles

View Crossmark data

(2)

High-confidence nonparametric fixed-width uncertainty intervals and applications to projected high-dimensional data and common mean estimation

Ansgar Stelanda and Yuan-Tsung Changb

aInstitute of Statistics, RWTH Aachen University, Aachen, Germany;bDepartment of Social Information, Mejiro University, Tokyo, Japan

ABSTRACT

Nonparametric two-stage procedures to construct fixed-width confi- dence intervals are studied to quantify uncertainty. It is shown that the validity of the random central limit theorem (RCLT) accompanied by a consistent and asymptotically unbiased estimator of the asymp- totic variance already guarantees consistency and first-order as well as second-order efficiency of the two-stage procedures. This holds under the common asymptotics where the length of the confidence interval tends toward 0 as well as under the novel proposed high-confidence asymptotics where the confidence level tends toward 1. The approach is motivated by and applicable to data ana- lysis from distributed big data with nonnegligible costs of data queries. The following problems are discussed: Fixed-width intervals for the mean, for a projection when observing high-dimensional data, and for the common mean when using nonlinear common mean estimators under order constraints. The procedures are investi- gated by simulations and illustrated by a real data analysis.

ARTICLE HISTORY Received 16 October 2019 Revised 29 June 2020, 5 September 2020 Accepted 9 September 2020 KEYWORDS

Big data; data science; high- dimensional data; jackknife;

sequential analysis;

sequential sampling SUBJECT CLASSIFICATIONS 62L15; 60G40; 62F12; 62F15

1. INTRODUCTION

In this article, we study fully nonparametric two-stage procedures to construct a fixed- width interval for a parameter to quantify uncertainty. Both the common high-accuracy framework, where the asymptotics assumes that the width of the interval shrinks, and a novel high-confidence framework are studied. Under high-confidence asymptotics the required uncertainty in terms of the width of the interval is fixed and the asymptotics assumes that the confidence level increases. General sufficient conditions are derived that yield consistency and efficiency for both frameworks. We study three statistical problems: Nonparametric fixed-width intervals for the mean of univariate data, which may be the most common setting; for the mean projection of high-dimensional data to illustrate the application to big data; and for the common mean of two samples as a classic statistical problem leading to a nonlinear estimator, which has not yet been

CONTACT Ansgar Steland steland@stochastik.rwth-aachen.de Institute of Statistics, RWTH Aachen University, Aachen, W€ullnerstr. 3, Aachen D-52065, Germany.

Recommended by Bhargab Chattopadhyay.

ß2021 The Author(s). Published with license by Taylor & Francis Group, LLC

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.

2021, VOL. 40, NO. 1, 97124

https://doi.org/10.1080/07474946.2021.1847966

(3)

treated in the literature from a two-stage sampling perspective. The focus is on two- stage procedures, because they provide a good compromise between the conflicting goals of a minimal sample size, which requires purely sequential sampling, and feasibil- ity in applications in terms of required computing resources and logistic simplicity, which is better matched by one- or two-stage sampling procedures.

Two-stage sequential sampling is a well-established approach motivated by the aim to make statistical statements with minimal samples sizes without relying on purely sequential sampling. Instead, the data are sampled in two batches, a first-stage sample and a second-stage sample, if required. In the second stage, the final sample size is determined using the information contained in the first-stage (pilot) sample. The devel- opment of such procedures was mainly motivated by the need to base statistical infer- ence on small samples in a world where large samples are not available. But this technique is also of interest in various areas including emerging ones such as data sci- ence and big data, where massive amounts of variables are collected and need to be processed and analyzed: When analyzing big data distributed over many nodes of a net- work, each single query may be associated with a high response time and substantial data transmission costs, ruling out a purely sequential sampling strategy, because the benefit of fewer required observations on average is overcompensated by the high costs for each query. In contrast, the two-stage methods proposed in this article allow effi- cient estimations of the means of the variables and their projections with preassigned accuracy and confidence. The general construction of the sample size rules mainly fol- lows the established approach. However, compared to the existing literature, we use a slightly modified first-stage sample size rule that takes into account prior knowledge and historical estimates, respectively, of the data uncertainty. Our studies indicate that even if we use only three data points to get a rough guess of variability, the resulting first-stage sample size is much closer to the actually required sample size, thus avoiding oversampling at this stage. In the context of distributed data, the proposed methods with this three-observations rule need at most three database queries.

This article contributes to the existing literature on two-stage procedures (see Stein (1945), Mukhopadhyay 1980; Ghosh, Mukhopadhyay, and Sen 1997; Mukhopadhyay and Silva 2009; Steland 2015, 2017; and references therein) by proposing a concrete nonparametric procedure with the following properties: For any estimator of the mean (or a parameter h) that satisfies the random central limit theorem (CLT) and whose asymptotic variance can be estimated by a consistent and asymptotically unbiased esti- mator, the random sample size leading to the proposed fixed-width confidence interval is consistent and asymptotically unbiased for the optimal sample size. Further, the pro- cedure yields the right asymptotic coverage probability and exhibits first-order as well as second-order efficiency.

Further, and more important, we go beyond the classic framework that establishes the above properties when the width of the confidence interval tends toward 0. We argue that this is to some extent counterintuitive in view of the posed problem to con- struct a fixed-width confidence interval. It also limits the approximate validity of the results to cases where one aims at high-accuracy estimation. But in many applications it is more appropriate to fix the width of the confidence interval and to assume that a larger number of observations is due to a higher confidence level. Therefore, we

(4)

propose a novel framework and study the construction of a fixed-width interval under high-confidence asymptotics (equivalently: low error probability asymptotics). This is also motivated by the fact that in many areas such as high-quality, high-throughput production engineering or statistical genetics and brain research, where high-dimen- sional data are collected, large confidence levels and small significance levels, respect- ively, are in order and used in practice. For example, in production engineering the accuracy is fixed by the technical specifications and not by the statistician, and in genet- ics as well as in brain research small error probabilities are required to reach scientific relevance and to take multiple testing into account.

It is shown that the proposed two-stage procedure is valid under high-confidence asymptotics and exhibits first- and second-order efficiency properties, as long as the parameter estimator satisfies the random central limit theorem (RCLT) and a consistent and asymptotically unbiased estimator of the asymptotic variance is at our disposal.

Having in mind big data sets with a large number of variables, we then apply the general results to projections of high-dimensional data. It is assumed that the observa- tions are given by a data stream of (possibly) increasing dimension, which is sampled in batches by our two-stage procedure. Two-stage procedures for high-dimensional data have been studied in-depth in Aoshima and Yata (2011) assuming that the dimension p tends toward 1and the sample size is either fixed or tends toward 1as well. Here we consider a projection of high-dimensional data, where, when having sampledn observa- tions, the projection may depend on the sample size n. The asymptotic properties (con- sistency and efficiency) of the fixed-width confidence interval for the mean projection hold for high-accuracy asymptotics as well as high-confidence asymptotics. The dimen- sionpmay be increasing withn in an unconstrained way.

As an interesting and nontrivial classical application, we consider the problem of common mean estimation. Here one aims at estimating the mean from two samples assuming that they have the same mean but possibly different or ordered variances.

Many of the estimators proposed and studied in the literature are given by a convex combination of the sample means with convex weights depending on the sample means and the sample variances.

This article is organized as follows. Section 2 studies nonparametric two-stage fixed- width confidence intervals for the mean under both asymptotic frameworks, starting with the usual high-accuracy approach and then discussing the novel high-confidence asymptotics. Section 3 provides the results when dealing with a projection of high- dimensional data. Common mean estimation is treated in Section 4. Results from simu- lations and a data example are provided inSection 5.

2. NONPARAMETRIC TWO-STAGE FIXED-WIDTH CONFIDENCE INTERVAL LetY1,Y2,:::be independent and identically distributed with common distribution func- tion F (F observations with mean l and finite variance r22 ð0,1Þ: Further, let l^n be an estimator forlusing the firstn observationsY1,:::,Yn: We focus on the mean as the parameter of interest, but it is easy to see that all results remain true for any univariate parameterh¼hðFÞand an estimator^hn:

(5)

The classical approach to the construction of a confidence interval is based on a sam- ple of fixed (but large) sample size N and determines a random interval ½UN,VN depending on the sample(s), such that its coverage probability equals the given confi- dence level 1a2 ð0, 1Þ for each N1 or has asymptotic coverage 1a, as N tends toward 1: As a consequence, the length L¼VNUN of the interval, which represents the reported accuracy, is random.

There are, however, situations where we want to report an interval of a fixed, preas- signed accuracyd, symmetric around the point estimator of l, so that L¼2d:Then the coverage probability of the resulting interval ½^lNd,l^Nþd depends on the distribu- tion of ^lN and the sample size N becomes the parameter we may select to achieve a certain confidence level. In mathematical terms, we wish to find some N so that the interval around the estimator ^lN based on a sample of size Nhas coverage probability

Pð½l^Nd,l^Nþd像lÞ ¼1aþoð1Þ, (2.1) as the precision parameter d tends toward 0. The o(1) term is required as the CLT (respectively RCLT) for ^lN is used to construct a solution.

Usually, d is small and N increases when d decreases. Thus, it is reasonable to con- sider asymptotic properties as d tends toward 0. We shall, however, also consider the case of a fixed accuracy d, not necessarily small, and study asymptotic properties when the confidence level tends toward 1.

Suppose that the estimator ^ln satisfies the CLT; that is, ffiffiffin

p ð^lnlÞ !d Nð0,r2lÞ, (2.2) as n! 1, for some positive constant r2l, the asymptotic variance of our estimator for l. Throughout the article we shall assume that we have an estimator for r2l at our disposal, which we denote by r^2n¼r^2l,n if it is based on the first n observations. The most common choice for ^ln is, of course, the sample average Yn¼1nPn

i¼1Yi, which satisfies (2.2)with r2l¼r2: The canonical estimator for r2l isS2n¼n11 Pn

i¼1ðYiYnÞ2: When considering a parameter h estimated by ^hn such that the analogue of (2.2) holds—that is, ffiffiffi

pn

ð^hnhÞ!dNð0,r2hÞ—one formally replaces r2l by r2h and needs an estimator ^r2h,n having the properties required in Assumption (E) in the next section.

For simplicity of presentation and proofs, we stick to the case of the mean, however.

Invoking the CLT for^lN, it is easy to see that the problem is solved by theasymptot- ically optimal sample sizeNopt ¼ dNopt e, where

Nopt ¼r2lU1ð1a=2Þ2

d2 , (2.3)

because the left-hand side of (2.1)is equal to Pðj ffiffiffiffi pN

ð^lNlÞj dÞ ¼2U ffiffiffiffi pN

d=rl

1þoð1Þ: Observe that Nopt ! 1, if d#0, which in turn justifies the application of the CLT.

If r2l were known, then dNopt e would solve the posed problem. The proposed two- stage procedure draws a random sample of size N0 at the first stage, which is larger than or equal to a given minimal sample size N0:The first-stage sample size N0 will be larger if the required precision gets smaller. The first-stage sample is used to estimate the uncertainty of the estimator, and that (random) estimate is then used to specify the

(6)

final sample size N^opt used in the second stage. Before discussing how one should spe- cifyN0andN^opt, let us summarize the basic algorithm:

Preparations: Specify the minimal sample size N0, the confidence level 1a, and the precision d.

Stage I: Draw an initial sample of sizeN0N0, in order to estimate unknowns (in our case r2l) based on that data, yielding an estimator N^opt forNopt ; that is, a random sam- ple size.

Stage II: Draw additional N^optN0 observations to obtain a sample of size N^opt: Estimate lby ^lN^opt

Solution:Output the fixed-width confidence interval h^lN^opt d,^lN^opt þdi :

2.1. Fixed-Width Interval under High-Accuracy Asymptotics

Let us first study the classical approach to fix the confidence level 1aand to assume that the accuracy is small suggesting to investigate approximations for d#0: This framework can be calledhigh-accuracy asymptotics.

In the sequel, we review part of the literature that focused on normal data and the associated optimal estimators. We follow the arguments developed for the Gaussian case to motivate our fully nonparametric proposal where ^ln may be an arbitrary estimator satisfying the required regularity assumptions stated below in detail.

The original Stein procedure (see Stein 1945) addresses Gaussian i.i.d. observations and estimates l by the sample mean, such that r2l¼r2 and a natural estimator for r2l based onY1,:::,Ynis S2n:Stein uses the rule

N¼max N0,

tðN021a=2S2N

0

d2

þ1

( )

:

Because N0 is fixed, the procedure turns out to be inconsistent. To overcome this issue, Chow and Robbins (1965) proposed a purely sequential rule, namely,

N ¼inffnN0 :ntðn1Þ21a=2S2n=d2g:

Mukhopadhyay (1980) noted that one gets for small d the lower bound N U1ð1a=2Þ and proposed to increase the variance estimate slightly by 1=n: Indeed, for small enough dwe have the lower bound NtðN01a=2=d, if one replaces the estimateS2n

0 by S2n

0þ1=n, because for d1, tðn1Þ21a=2ðS2nþ1=nÞ

d2 ¼tðn1Þ21a=2S2n

d2 þtðn1Þ21a=2 nd2 tðn1Þ21a=2

nd2 ,

such that any nN0 with ntðn1Þ21a=2ðS2nþ1=nÞ=d2 satisfies ntðn 1Þ21a=2=ðnd2Þ and hence n2 ntðn1Þ21a=2=d2: This leads to ntðN01a=2=d for any such n, such that we obtain the lower bound NtðN01a=2=d: Therefore, the purely sequential rule

(7)

N0¼inffnN0:ntðn1Þ21a=2ðS2nþ1=nÞ=d2g satisfiesN0 maxfN0,tðN01a=2=dg:

The idea is to now to use this lower bound maxfN0,btðN01a=2=dc þ1g, as the first-stage sample size for Gaussian data; for nonnormal samples one replaces tðN01a=2 by the corresponding quantile, U1ð1a=2Þ, of the standard normal distribution. However, this rule does not take into account the scale of data and can lead to unrealistically large sample sizes; see the data example in Section 5.

Mukhopadhyay (1980) has proposed the modified rule maxfN0,bðtðN01a=2= dÞ2=ð1þcÞc þ1g, for some 0<c<1:ccan be selected to obtain a reasonable first-stage sample size; see the discussion and example in Mukhopadhyay and Silva (2009, p. 115).

But, indeed, Mukhopadhyay’s argument also applies when using S2n

0þf2=n0, for some f >0, and then one gets the lower bound maxfN0,tðN01a=2f=dg and maxfN0,U1ð1a=2Þf=dg, respectively. It is easy to check that all of the above argu- ments go through as well, if we replace S2n by any guess or pilot estimate using a (very) small sample.

Three-Observations Rule: Because frequently in applications it is possible to sample at least three observations, we propose to choosef as an estimater^l, 3 ofrl, using three additional observations, on which we condition in what follows. This leads to our pro- posal for the first-stage sample size, namely,

N0¼max N0,

U1ð1a=2Þ^rl, 3

d

þ1

( )

: (2.4)

Note that N0depends on the preassigned precisiondand satisfiesN0! 1, asd#0: It is natural to estimate Nopt by N^opt ¼r^

2

N0U1ð1a=2Þ2

d2 , and this leads to the final sam- ple size of the procedure,

N^opt¼max N0,

^r2N0U1ð1a=2Þ2 d2

þ1

( )

, (2.5)

which is a random variable (depending on the first-stage data), because r^2N0 estimates the asymptotic variance of the estimator ^lN0 using the first-stage sample of size N0. Observe that we continue to add a in notation to indicate quantities that may not be integer valued.

Let us briefly review the following facts and considerations leading to the notions of consistency and unbiasedness: Note that Nopt ! 1 and N^opt ! 1, in probability, for any (arbitrary) weakly consistent estimator r^2l of the asymptotic variance r2N0 (based on the first-stage sample of size N0), which slightly complicates their comparison. If we only know that j^r2lr2lj ¼oPð1Þ, which follows from ratio consistency j^r2l=r2l1j ¼ oPð1Þ, then the difference N^optNopt is not guaranteed to be bounded, because

jN^optNopt j ¼ j^r2lr2ljU1ð1a=2Þ2

d2 ,

where the first factor is oPð1Þ, but the second one diverges, as d#0: For this reason, N^opt is called consistent for the asymptotically optimal sample size Nopt , if the ratio approaches 1 in probability; that is, if

(8)

N^opt

Nopt ¼1þoPð1Þ,

as d#0: Having in mind that the second-stage (final) sample size N^opt is random whereas the unknown optimal value, Nopt , is nonrandom, the question arises as to whetherN^opt is at least close to Nopt on average. Therefore, to address this question and going beyond consistency, N^opt is called asymptotically first-order efficient in the sense of Chow and Robbins (1965)if

EðN^optÞ

Nopt ¼1þoð1Þ, asd#0:

Observe that the last property allows for the case that the differenceN^optNopt tends toward 1even in the mean, as d tends toward 0. This typically indicates that the pro- cedure is substantially oversampling the optimal sample size. A procedure for which the estimated optimal sample size remains on average in a bounded vicinity of the optimal truth is, of course, preferable.N^opt is calledsecond-order asymptotically efficient if

EðN^optNopt Þ ¼Oð1Þ, asd#0:

The regularity assumptions we need to impose are as follows:

Assumption (E). The estimator r^2N0 is consistent and asymptotically unbiased for r2l; that is,

r^2N0

r2l ¼1þoPð1Þ, Eð^r2N0Þ

r2l ¼oð1Þ, as d#0:

This assumption is not restrictive and is satisfied by many estimators. For example, the jackknife variance estimator studied in Shao (1993), Shao and Wu (1989), and Steland and Chang (2019) provides an example satisfying Assumption (E).

Further, we require the following strengthening of the CLT to hold.

Assumption (A). ^ln satisfies the RCLT; that is, for any family fNa:a>0g of stopping times for which Na=a!P k, 0<k<1, it holds that

ffiffiffiffiffiffi Na

p ð^lNalÞ= ffiffiffiffiffiffiffiffiffiffi r2lak q

!dNð0, 1Þ, as a! 1:

The validity of the RCLT is required, because we have to employ a normal approxi- mation with the first-stage sample size, which is random by construction. Clearly, how- ever, for i.i.d. observations following an arbitrary distribution with finite second moment and l^n the arithmetic mean, Assumption (A) is well known (see, e.g., Ghosh, Mukhopadhyay, and Sen1997, theorem 2.7.2).

The following theorem summarizes the main asymptotic first-order properties of the proposed two-stage approach to construct a fixed-width confidence interval.

(9)

Theorem 2.1. Suppose that Assumption (E) is satisfied. Then the following two assertions hold true:

i. The estimated optimal sample sizeN^opt is consistent for Nopt ; that is, N^opt

Nopt ¼1þoPð1Þ, as d#0:

ii. N^opt is asymptotically first-order efficient for Nopt ; that is, EðN^optÞ

Nopt ¼1þoð1Þ, as d#0:

If, in addition, Assumption (A) holds, then we have iii. The fixed-width confidence interval IN^

opt has asymptotic coverage 1a; that is, P IN^

opt像l

¼1aþoð1Þ, as d#0:

Remark 2.1. It is worth mentioning that the proof of Theorem 2.1 (i)–(ii) shows the following stronger properties:

(i) N^opt is consistent forNopt , if and only ifr^2N0 is consistent forr2l:

(ii) N^opt is asymptotically unbiased for Nopt , if and only if r^2N0 is asymptotically unbiased forr2l:

Let us now discuss the second-order properties of the fully nonparametric procedure.

In the literature, so far second-order efficiency for the problem at hand has been studied for parametric (Gaussian) models (see Mukhopadhyay and Duggan 1999), leading to a known distribution of N^opt, a chi-square distribution induced by the fact that the sample variance follows a chi-square law, which converges to the normal law if d !0. To achieve second-order efficiency, the probability PðN^opt ¼Nopt Þ that the sam- ple size is not increased in the second stage needs to decrease faster than the first-stage sample size N0. In a parametric setting that probability can be handled and estimated by means of appropriate Taylor expansions using properties of the known distribu- tion function.

In a fully nonparametric framework, the exact distribution is unknown to us, and estimating the probability under the limiting law is not sufficient, because we have to take into account the error of approximation. But due to the Berry-Esseen bound, the error is of the order OðN01=2Þ: Therefore, the following result proceeds in a different way than the proofs for parametric settings and bounds the probability PðN^opt ¼Nopt Þ using nonparametric techniques.

Theorem 2.2. Assume that Assumption (E) holds and Y1,Y2,::: are i.i.d. with EðY18Þ<1. Then the two-stage procedure given byN^opt is second-order efficient; that is,

(10)

EðN^optÞ Nopt ¼OðdÞ, as d#0:

2.2. Fixed-Width Interval under High-Confidence Asymptotics

Theorem 2.1 establishes the validity of the proposed sampling strategy for small-accur- acy d; that is, in ahigh-accuracy framework: The (asymptotic) first-order properties hold if d tends toward zero. To some extent, this is counterintuitive because we aim at con- structing a fixed-width confidence interval and, for applications, we are then essentially limited to confidence statement whendis small.

In some applications, however, d may be not small (enough) but one aims to ensure the confidence statement that the interval covers the true parameter with high confi- dence. This suggests considering the case in which d is fixed and 1a tends toward 1 (or equivalently a!0). That type of asymptotics may be of particular importance in fields such as statistical genetics or brain research, where it is common to use very small significance levels (a).

Recalling formula (2.3) for the asymptotically optimal (unknown) sample size Nopt and noticing that, for fixed d Nopt ! 1 holds if 1a=2!1, again justifying the application of the CLT, the question arises as to whether consistency and efficiency can be established under this different asymptotic regime.

To begin, let us notice that the notions of consistency, asymptotically first- and second-order efficiency, and asymptotic coverage under high-confidence asymptotics can be defined analogously as under high-accuracy asymptotics by replacing the limits limd!0 by lim1a!1:

The following theorem asserts that the proposed methodology is valid without any modification ofN^opt under high-confidence asymptotics, although the proof differs.

Theorem 2.3.

i. Assume that (E) holds. ThenN^opt is consistent for Nopt ; that is, N^opt

Nopt ¼1þoPð1Þ,

as1a!1, and asymptotically first-order efficient; that is, EðN^optÞ

Nopt ¼1þoð1Þ, as1a!1:

ii. If Assumptions (E) and (A) hold, then IN^

opt has asymptotic coverage 1a; that is,

1lima!1 P IN^

opt像l

ð1aÞ

¼0:

Remark 2.2. The assertions of Theorem 2.3 also hold true if the first-stage sample size N0is defined as

(11)

N0 ¼max N0,

fU1ð1a=2Þ d

2=ð1þcÞ

( )

for some 0<c<1 and an arbitrary given constant f >0, as proposed in Mukhopadhyay (1980; withf¼1).

The question arises as to whether the procedure exhibits second-order efficiency under the high-accuracy regime as well. The answer is positive.

Theorem 2.4. Assume that Y1,Y2,::: are i.i.d. with EðY18Þ<1. Then the two-stage pro- cedure given by N^opt is second-order efficient. Precisely, we have

jEðN^optÞ Nopt j ¼Oð1Þ, as1a!1:

2.3. Proofs

Proof of Theorem 2.1. First, notice that, by definition ofN^opt, N^opt

^r2N0U1ð1a=2Þ2 d2

þ1^r2N0U1ð1a=2Þ2

d2 : (2.6)

It is easy to see that maxfbzc,ag zþa for all nonnegative real z and any positive constant a. Therefore, we have

N^opt¼max N0,

r^2N0U1ð1a=2Þ2 d2

þ1

( )

N0þ^r2N0U1ð1a=2Þ2 d2 þ1: Plugging in the definition ofN0, we further obtain

N^opt max

^rl, 3U1ð1a=2Þ d

,N0

( )

þ^r2N0U1ð1a=2Þ2

d2 þ1

r^2N0U1ð1a=2Þ2

d2 þ^rl, 3U1ð1a=2Þ

d þN0þ1:

Combining the last estimate with(2.6), we arrive at r^2N0U1ð1a=2Þ2

d2 N^opt r^2N0U1ð1a=2Þ2

d2 þ^rl, 3U1ð1a=2Þ

d þN0þ1, which implies, due to (2.3),

^r2N0

r2l N^opt

Nopt r^2N0

r2l þ d^rl, 3

U1ð1a=2Þr2lþ ðN0þ1Þd2

r2lU1ð1a=2Þ2: (2.7)

(12)

We are led to N^opt

Nopt 1

^r2N0 r2l 1

þ d^rl, 3

U1ð1a=2Þr2lþ ðN0þ1Þd2

r2lU1ð1a=2Þ2: (2.8) Recalling thatAnXnBn with An,Bn¼oPð1Þ impliesXn¼oPð1Þ, because for any d>0 we have PðjXnj>dÞ PðAn dÞ þPðBn>dÞ ¼oð1Þ, as n! 1, the first assertion follows: By (2.8),N^opt is consistent forNopt if ^r2N0 is consistent forr2l, as d#0:(2.7)also immediately yields

^r2N0

r2l 1

N^opt

Nopt 1

,

which shows that “only if” holds as well. Next, taking expectations in (2.7) implies the result on the asymptotic unbiasedness. It remains to show that the fixed-width confi- dence interval has asymptotic coverage 1a:First, observe that

P IN^

opt像l

¼P d ffiffiffiffiffiffiffiffiffi N^opt q

rl

ffiffiffiffiffiffiffiffiffi N^opt

q ^lN^optl rl

d ffiffiffiffiffiffiffiffiffi N^opt q

rl

0

@

1 A

¼P d ffiffiffiffiffiffiffiffi Nopt p

rl

ffiffiffiffiffiffiffiffiffi Nopt N^opt s ffiffiffiffiffiffiffiffiffi

N^opt

q l^N^optl rl

d ffiffiffiffiffiffiffiffi Nopt p

rl

0

@

1 A:

Define

HdðxÞ ¼P

ffiffiffiffiffiffiffiffiffi Nopt N^opt

s ffiffiffiffiffiffiffiffiffi N^opt

q ^lN^optl rl

x 0

@

1

A, x2R:

By Assumption (A) and Slutzky’s lemma, we have supx2RjHdðxÞ UðxÞj !0,

as d#0: Now, by definition of Hd and Nopt and linearity of the function evaluationf7!fjba¼fðbÞ fðaÞ(for fixed realsa,b2D) for a function defined onD R,

PðIN^opt像lÞ ¼HdjUU11ð1ð1a=2Þa=2Þ

¼ ðHdUþUÞjUU11ð1ð1a=2Þa=2Þ

¼ ðHdUÞjUU11ð1a=2Þð1a=2Þþ ð1aÞ:

Therefore, we obtain P IN^opt像l

ð1aÞ

2 sup

x2RjHdðxÞ UðxÞj !0,

asd#0, which completes the proof. w

Proof of Theorem 2.2. First observe that N0! 1 if and only if d!0, so that we can show the result forN0 ! 1:We use the refined basic inequality

^r2N0U1ð1a=2Þ2

d2 N^optN01ðN^opt ¼N0Þ þr^2N0U1ð1a=2Þ2

d2 þ1,

(13)

which implies, in view of Assumption (E),

Nopt þoð1Þ EðN^optÞ N0PðN^opt¼N0Þ þNopt þ1þoð1Þ:

This means

oð1Þ EðN^optÞ Nopt N0PðN^opt¼N0Þ þ1þoð1Þ, and the result follows if we show that

PðN^opt ¼N0Þ ¼OðN01Þ:

We may assume that N0 is large enough to ensure that N0¼ b^rl,3U1dð1a=2Þc þ1:

Because by definition of N^opt, N^opt ¼N0 ()

r^l, 3U1ð1a=2Þ

d þ1

þ1

^rN0U1ð1a=2Þ d2

þ1,

and using the elementary estimates bxc x1 and bxc þ1xþ1, if x0, we obtain

PðN^opt ¼Nopt Þ P ^r2N0U1ð1a=2Þ2

d2 ^rl, 3U1ð1a=2Þ

d þ1

!

¼P ^r2N0r2 r2þ ^rl, 3

U1ð1a=2Þþ d U1ð1a=2Þ2

! d

! :

Because ðU1rð1a=2Þ^l,3 þU1ð1da=2Þ2Þd¼oð1Þ, asd#0, we obtain

PðN^opt¼Nopt Þ ¼Pðr2r^2N0r2þoð1ÞÞ P r2^r2N0 r2 2

: But ifEðY18Þ<1, then

P r2r^2N0 r2 2

Eð^r2N04

ðr2=2Þ4 ¼OðN02Þ,

see, for example, lemmas A.1 and A.2 in Steland and Chang (2019). Hence, the asser-

tion follows. w

Let us now prove the corresponding results under the high-confidence asymp- totic framework.

Proof ofTheorem 2.3. Repeating the purely algebraic calculations from above we again obtain(2.8)for anyd:

N^opt Nopt 1

r^2N0 r2l 1

þ d^rl, 3

U1ð1a=2Þþ ðN0þ1Þd2 r2lU1ð1a=2Þ:

Clearly, the second and third terms are o(1) if d is fixed and 1a!1()a#0: Further, noting that N0! 1 ifd is fixed and a#0, the first term isoPð1Þ if dis fixed and a#0: Taking expectations, similar arguments apply. Hence, (i) is shown. To estab- lish (ii), first observe that if Na, a>0, is an arbitrary family of stopping times with Na=a!Pk2 ð0,1Þ, then Na=ak!P1, as a! 1, and therefore by Slutzky’s lemma and the RCLT,

(14)

ffiffiffiffiffi

pakl^Nal rl

!dNð0, 1Þ,

as a! 1: Now consider N^opt as a family of stopping times parameterized by a¼ U1ð1a=2Þ2: Obviously, a! 1 ()a#0: Then, using (i) and recalling (2.3), N^opt=a!Pk¼r2l=d2, as a! 1:Because dis fixed, we have 0<k<1: Consequently,

TN^

opt ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi U1ð1a=2Þ2r2l

d2

s ^lN^opt l rl

!dNð0, 1Þ, (2.9)

asa! 1: BecauseTN^opt ¼U1ð1a=2Þd ð^lN^optlÞ, it follows that P IN^

opt像l

¼Pðd^lN^optldÞ

¼P U1ð1a=2Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi U1ð1a=2Þ2r2l

d2 s

l^N^ l rl

U1ð1a=2Þ 0

@

1 A

¼HajUU11ð1a=2Þð1a=2Þ,

where Ha denotes the distribution function of TN^

opt: By (2.9) and Polya’s theorem, we obtain

a!1lim sup

x2RjHaðxÞ UðxÞj ¼0: Now we can conclude that

P IN^opt像l

ð1aÞ

ðH aUU11ð1ð1a=2Þa=2Þ 2 sup

x

jHaðxÞ UðxÞj !0, asa! 1, which completes the proof. w

Proof of Remark 2.2. The proof of the consistency needs a minor modification. Plugging in the new definition ofN0, we obtain

N^optmax

fU1ð1a=2Þ d

2=ð1þcÞ

,N0

( )

þ^r2N0U1ð1a=2Þ2

d2 þ1

^r2N0U1ð1a=2Þ2

d2 þ fU1ð1a=2Þ d

2=ð1þcÞ

þN0þ1:

Combining the last estimate with(2.6), we arrive at

^

r2N0U1ð1a=2Þ2

d2 N^opt^r2N0U1ð1a=2Þ2

d2 þ fU1ð1a=2Þ d

2=ð1þcÞ

þN0þ1, which implies, due to (2.3),

(15)

^ r2N0

r2l N^opt Nopt ^r2N0

r2l þf2=ð1þcÞ

r2l d2c=ð1þcÞU1ð1a=2Þ2c=ð1þcÞþ ðN0þ1Þd2

r2lU1ð1a=2Þ2: (2.7) We are led to

N^opt

Nopt 1

r^2N0

r2l 1

þf2=ð1þcÞ

r2l d2c=ð1þcÞU1ð1a=2Þ2c=ð1þcÞþ ðN0þ1Þd2 r2lU1ð1a=2Þ2:

(2.8) Because c>0, the second term is oPð1Þif 1a!1: w Proof of Theorem 2.4. Observing that N0! 1 if and only if 1a!1 and OðN01Þ ¼ OðU1ð1a=2ÞÞ, the proof of Theorem 2.2 carries over and provides the bound

jEðN^optÞ Nopt j N0OðN02Þ þ1¼OðN01Þ þ1:

w

3. APPLICATION TO A PROJECTION OF HIGH-DIMENSIONAL DATA

An interesting application is to study the construction of a fixed-width confidence inter- val for the mean projection of high-dimensional data. As an example, we may ask for how long we need to observe the asset returns of stocks associated to a portfolio given by portfolio vector wn in order to set up a ð1aÞ-confidence interval for the mean portfolio return w0nlwith a precision d. Such uncertainty quantification of a mean pro- jection also arises when projecting a data vector to reduce dimensionality, with calculat- ing a projection w0nXn of a pn-dimensional data vectorXn being a common approach to handle multivariate data when the dimension of the observed vectors is large. Widely used methods are principal component analysis, where one projects onto eigenvectors of the (estimated) covariance matrix of the data, sparse principal component analysis yielding sparse projection vectors, or LASSO (least absolute shrinkage and selection operator) regressions. For the latter approach, recall that a LASSO regression determines a sparse weighting vector, the regression coefficients, such that the associated projection of the regressors provides a good explanation of the response variable.

Of particular interest is the situation in which the dimension gets larger when the sample size increases, in order to mimic the case of large dimension when relying on asymptotics where the sample size increases. Then the weighting vectors depend on the sample size. We show that the general assumptions established in the previous section apply under mild uniform integrability conditions onw0nXn,n1:

3.1. Procedure

Suppose that we are given a series of random vectors of increasing dimension, such that at time instantn we have at our disposaln i.i.d. random vectors

Xn1,:::,Xnn ðln,RnÞ,

of dimension p¼pn, where pn! 1, as n! 1 is allowed. Here the notation nn ðln,RnÞ means that the random vector nn follows a distribution with mean vector ln

(16)

and covariance matrix Rn: We are interested in an asymptotic fixed-width confidence interval for the asymptotic projected mean vector

h¼ lim

n!1hn, hn¼w0nln,

that is, for a preassigned accuracy d>0 and a given confidence level 1a we aim at finding a sample size N such that the confidence interval ½TNd,TNþd has asymp- totic confidence 1a,

PðTNdw0NlNTNþdÞ ¼1aþoð1Þ,

as d#0: Here, for any (generic) sample size n, Tn is an estimator of w0nl: For simpli- city, we consider the unbiased estimator Tn¼w0nXn whereXn¼n1Pn

i¼1Xni:

The proposed two-stage procedure is as in the previous section: In the first stage, draw

N0¼max N0,

U1ð1a=2Þ^rin

d

þ1

( )

: (3.1)

observations, where N0 is a given minimal sample size and ^rin is an initial estimate of the standard deviation of the projections using independent pilot data, such as the three-observations rule estimator. Next, we estimate the variance of the projections from the first-stage sample by^r2N0 and calculate the final sample size

N^opt ¼max N0,

^r2N0U1ð1a=2Þ2 d2

þ1

( )

: (3.2)

In practice, one considers a certain number of variables, so we assume that these var- iables are observable for the relevant sample sizesN0andN^opt as well as that the projec- tion vectors only have nonzero entries for those variables of interest. The mathematical framework allowing for an increasing dimension is used to justify the procedure when the number of variables (respectively dimension) is large compared to the sample sizes in use.

Let us assume that

n!1lim w0nRnwn¼r2w >0, jhhnj ¼oðn1=2Þ (3.3) and forr2 f1, 2g,

sup

n1Eðjw0nXn1jr1ðjw0nXn1jk>cÞÞ !0, c! 1, k¼1,:::,r: (3.4) Assumption (3.3) is mild and rules out cases where the variance of the projection vanishes asymptotically. The uniform integrability required in(3.4)is a crucial technical condition to ensure that the weak law of large numbers and the CLT apply. In many cases the condition can be formulated in terms of the Xn: For that purpose, suppose additionally that the norms Wn¼ jjwnjj2 are bounded by constant W<1: Then, using the simple fact that jw0nXnj WjjXnjj2, condition(3.4)holds, if theXn satisfy

(17)

sup

n1EðjjXn1jjr21ðjjXn1jjk2>cÞÞ !0, c! 1, k¼1,:::,r: (3.5) A simple sufficient condition for (3.5)is the moment condition

sup

n1EjjXnjjrþd2 <1 (3.6)

for some d>0:

We need to verify Assumptions (A) and (E). The verification of the RCLT is some- what involved, because the projection statistic of interest is a weightedsum with weights depending on the sample size. DenoteTn¼ ðTnhÞ=rw

Theorem 3.1. Suppose that (3.3) and (3.4) hold. If sa, a>0, is a sequence of integer- valued random variables with sa=a!k, in probability, for some 0<k<1, then the statistic Tn¼w0nXn satisfies the RCLT; that is,

Tsa!d Nð0, 1Þ, as a! 1:

Define the estimator

^r2n¼1 n

Xn

i¼1

w0nXniw0nXn

2

: (3.7)

The following result verifies that Assumptions (E) holds for this estimator under weak technical conditions.

Theorem 3.2. If(3.3), then

(i) ^r2nw0nRnwn!P0and^r2n!Pr2w, as n! 1:

(ii) Eð^r2nÞ w0nRnwn!0and Eð^r2nÞ !r2w, as n! 1:

The above two theorems imply that the proposed two-stage procedure is asymptotic- ally consistent as well as first-order and second-order efficient.

3.2. Proofs

3.2.1. Preliminaries on the RCLT.According to the general theoretical results of the previous section, it suffices to establish Assumption (A), the validity of the RCLT, and Assumption (E) for the statistic of interest; that is, the projected data. As a preparation, let us briefly review Anscombe’s RCLT (see, e.g., Ghosh, Mukhopadhyay, and Sen 1997) and sufficient conditions. Consider a sequence X1,X2,:::of i.i.d. random variables with mean l and finite variance r2>0 and a family sa, a>0, of integer-valued ran- dom variables, often but not necessarily stopping times, with sa=a!c, as a! 1, for a finite constantc. The RCLT asserts that

Ss

a ¼Ssaffiffiffiffisal sa

p r !dNð0, 1Þ,

(18)

asa! 1, where forn2N

Sn¼Xn

i¼1

Xi, Sn¼Snffiffiffinl pn

r :

This means that the sample size n can be replaced by sa. The basic idea why this holds is as follows (cf. Ghosh, Mukhopadhyay, and Sen 1997): The approximation sa

ac suggests thatSs

a should have the same limiting distribution as Sn

a, where na¼ bacc:

The CLT gives Sna !Nð0, 1Þ, as a! 1, in distribution. Now, for any e>0 and d>0,

PðjSsaSn

aj eÞ P sa

na1

>d

!

þP max

jnnajdna

jSnSn

aj e

, (3.8)

because in the event jsa=na1j d we have jSs

aSn

aj maxjnnajdnajSnSn

aj: The second term on the right-hand side of (3.8) can be made arbitrarily small if the sequence fSn:n1g is uniformly continuous in probability (u.c.i.p.). In general, a sequence fYn:n1g is called u.c.i.p. if for any e>0 there exists some d>0 so that for large enough n,

P max

0knd

jYnþkYnj e

<e:

The u.c.i.p. property of Sn can be shown using Kolmogorov’s maximal inequality, because Sn is a sum of i.i.d. random variables.

3.3. Proofs

Let us first show the sufficiency of (3.6)for(3.5). Observe that

EðjjXn1jjr21ðjjXn1jjk2>cÞÞ ¼E jjXn1jjrþd2 jjXnjjd2 1ðjjXnjjd2 <c1=qkÞ

c1=qkEjjXnjjrþd2 :

As a preparation for the following proofs, observe that we may choosec large enough to ensure that

Eðw0nXn1Þ2c2þsup

n1Eððw0nXn1Þ21ðjw0nXn1j>cÞÞ 2c2 (3.9) holds for all n1:Therefore, the sequence of second moments of w0nXn1 is bounded.

This implies

w0nRnwn¼Var ðw0nXn1Þ ¼Oð1Þ, (3.10)

Ejw0nXn1j ¼Oð1Þ: (3.11)

For technical reasons, we first show Theorem 3.2.

Proof of Theorem 3.2. Observe that the random variables Zni¼ ðw0nXniÞ2,i¼1,:::,n, are independent and nonnegative with mean EðZn1Þ ¼w0nMnwn, where Mn1¼EðXn1X0n1Þ:

Hence, by the strong law of large numbers for arrays due to Gut (1992) under the uni- form integrability condition (e) given there,

Referenzen

ÄHNLICHE DOKUMENTE

The present paper purports to analyze the effects of liberalized economic policies 5 like tariff cut 6 and foreign capital inflow on the gender based wage gap

3.4 Even within districts that have an urbanisation level above the median, i.e., the first quartile (U1Q) and second quartile (U2Q), and which contain 236 of 247 notified SEZs, one

Condence intervals and tests for the location parameter are considered for time series generated by F EXP models.. An exact closed form expresssion for the asymptotic variance of ^ d

In an urban context, biosphere stewardship of the twenty-first century involves actions that reconnect people and development to the biosphere foun- dation (Folke et al. 2011),

Beyond test modification to eliminate inconsistencies between tests and CIs, we pre- sented algorithms for the calculation of modified p values and confidence bounds.. Our

Solutions based on trusted devices can be classified into standalone approaches, where the trusted device (TD) is not attached to the user’s untrusted platform, and connected

More recently, however, the Commission highlighted in a Communication from July 2013 that mem- ber states seeking to rely on Article 346 in the context of state aid

one significant single variable at 5% level with FWER multiple testing adjustment. I Riboflavin production with Bacillus Subtilis (genomics) n = 71, p