Generalized Runs Tests for the IID Hypothesis

Generalized Runs Tests for the IID Hypothesis JIN SEO CHO School of Economics Yonsei University Email: [email protected] HALBERT …...

0 downloads 191 Views 451KB Size
Generalized Runs Tests for the IID Hypothesis∗ JIN SEO CHO

HALBERT WHITE

School of Economics

Department of Economics

Yonsei University

University of California, San Diego

Email: [email protected]

Email: [email protected]

First version: March, 2004. This version: July, 2010.

Abstract We provide a family of tests for the IID hypothesis based on generalized runs, powerful against unspecified alternatives, providing a useful complement to tests designed for specific alternatives, such as serial correlation, GARCH, or structural breaks. Our tests have appealing computational simplicity in that they do not require kernel density estimation, with the associated challenge of bandwidth selection. Simulations show levels close to nominal asymptotic levels. Our tests have power against both dependent and heterogeneous alternatives, as both theory and simulations demonstrate. Key Words IID condition, Runs test, Geometric Distribution, Gaussian process, Dependence, Structural Break. JEL Classification C12, C23, C80.

1

Introduction

The assumption that data are independent and identically distributed (IID) plays a central role in the analysis of economic data. In cross-section settings, the IID assumption holds under pure random sampling. As Heckman (2001) notes, violation of the IID property, therefore random sampling, can indicate the presence of sample selection bias. The IID assumption is also important in time-series settings, as processes driving time series of interest are often assumed to be IID. Moreover, transformations of certain time series can be shown to be IID under specific null hypotheses. For example Diebold, Gunther, and Tay (1998) show that ∗

Acknowledgements The authors are grateful to the Co-editor, Ronald Gallant, and two anonymous referees for their very

helpful comments. Also, they have benefited from discussions with Robert Davies, Ai Deng, Juan Carlos Escanciano, Chirok Han, Yongmiao Hong, Estate Khmaladze, Jin Lee, Tae-Hwy Lee, Leigh Roberts, Peter Robinson, Peter Thomson, David VereJones and participants at FEMES07, NZESG, NZSA, SRA, UC-Riverside, SEF and SMCS of Victoria University of Wellington, and Joint Economics Conference (Seoul National University, 2010).

1

to test density forecast optimality, one can test whether the series of probability integral transforms of the forecast errors are IID uniform (U[0,1]). There is a large number of tests designed to test the IID assumption against specific alternatives, such as structural breaks, serial correlation, or autoregressive conditional heteroskedasticity. Such special purpose tests may lack power in other directions, however, so it is useful to have available broader diagnostics that may alert researchers to otherwise unsuspected properties of their data. Thus, as a complement to special purpose tests, we consider tests for the IID hypothesis that are sensitive to general alternatives. Here we exploit runs statistics to obtain necessary and sufficient conditions for data to be IID. In particular, we show that if the underlying data are IID, then suitably defined runs are IID with the geometric distribution. By testing whether the runs have the requisite geometric distribution, we obtain a new family of tests, the generalized runs tests, suitable for testing the IID property. An appealing aspect of our tests is their computational convenience relative to other tests sensitive to general alternatives to IID. For example, Hong and White’s (2005) entropy-based IID tests require kernel density estimation, with its associated challenge of bandwidth selection. Our tests do not require kernel estimation and, as we show, have power against dependent alternatives. Our tests also have power against structural break alternatives, without exhibiting the non-monotonicities apparent in certain tests based on kernel estimators (Crainiceanu and Vogelsang, 2007; Deng and Perron, 2008). Runs have formed an effective means for understanding data properties since the early 1940’s. Wald and Wolfowitz (1940), Mood (1940), Dodd (1942), and Goodman (1958) first studied runs to test for randomness of data with a fixed percentile p used in defining the runs. Granger (1963) and Dufour (1981) propose using runs as a nonparametric diagnostic for serial correlation, noting that the choice of p is important for the power of the test. Fama (1965) extensively exploits a runs test to examine stylized facts of asset returns in US industries, with a particular focus on testing for serial correlation of asset returns. Heckman (2001) observes that runs tests can be exploited to detect sample selection bias in cross-sectional data; such biases can be understood to arise from a form of structural break in the underlying distributions. Earlier runs tests compared the mean or other moments of the runs to those of the geometric distribution for fixed p, say 0.5 (in which case the associated runs can be computed alternatively using the median instead of the mean). Here we develop runs tests based on the probability generating function (PGF) of the geometric distribution. Previously, Kocherlakota and Kocherlakota (KK, 1986) have used the PGF to devise tests for discrete random variables having a given distribution under the null hypothesis. Using fixed values of the PGF parameter s, KK develop tests for the Poisson, Pascal-Poisson, bivariate Poisson, or bivariate Neyman type A distributions. More recently, Rueda, P´erez-Abreu, and O’Reilly (1991) study PGF-based tests for the Poisson null hypothesis, constructing test statistics as functionals of stochastic

2

processes indexed by the PGF parameter s. Here we develop PGF-based tests for the geometric distribution with parameter p, applied to the runs for a sample of continuously distributed random variables. We construct our test statistics as functionals of stochastic processes indexed by both the runs percentile p and the PGF parameter s. By not restricting ourselves to fixed values for p and/or s, we create the opportunity to construct tests with superior power. Further, we obtain weak limits for our statistics in situations where the distribution of the raw data from which the runs are constructed may or may not be known and where there may or may not be estimated parameters. As pointed out by Darling (1955), Sukhatme (1972), Durbin (1973), and Henze (1996), among others, goodness-of-fit (GOF) based statistics such as ours may have limiting distributions affected by parameter estimation. As we show, however, our test statistics have asymptotic null distributions that are not affected by parameter estimation under mild conditions. We also provide straightforward simulation methods to consistently estimate asymptotic critical values for our test statistics. We analyze the asymptotic local power of our tests, and we conduct Monte Carlo experiments to explore the properties of our tests in settings relevant for economic applications. In studying power, we give particular attention to dependent alternatives and to alternatives containing an unknown number of structural breaks. To analyze the asymptotic local power of our tests against dependent alternatives, we assume a first-order Markov process converging to an IID process in probability at the rate n−1/2 , where n is the sample size, and we find that our tests have nontrivial local power. We work with first-order Markov processes for conciseness. Our results generalize to higher-order Markov processes, but that analysis is sufficiently involved that we leave it for subsequent work. Our Monte Carlo experiments corroborate our theoretical results and also show that our tests exhibit useful finite sample behavior. For dependent alternatives, we compare our generalized runs tests to the entropy-based tests of Robinson (1991), Skaug and Tjøstheim (1996), and Hong and White (2005). Our tests perform respectably, showing good level behavior and useful, and in some cases superior, power against dependent alternatives. For structural break alternatives, we compare our generalized runs tests to Feller’s (1951) and Kuan and Hornik’s (1995) RR test, Brown, Durbin and Evans’s (1975) RE-CUSUM test, Sen’s (1980) and Ploberger, Kr¨amer and Kontrus’s (1989) RE test, Ploberger and Kr¨amer’s (1992) OLS-CUSUM test, Andrews’s (1993) Sup-W test, Andrews and Ploberger’s (1994) Exp-W and Avg-W tests, and Bai’s (1996) M-test. These prior tests are all designed to detect a finite number of structural breaks at unknown locations. We find good level behavior for our tests and superior power against multiple breaks. An innovation is that we consider alternatives where the number of breaks grows with sample size. Our new tests perform well against such structural break alternatives, whereas the prior tests do not. This paper is organized as follows. In Section 2, we introduce our new family of generalized runs

3

statistics and derive their asymptotic null distributions. These involve Gaussian stochastic processes. Section 3 provides methods for consistently estimating critical values for the test statistics of Section 2. This permits us to compute valid asymptotic critical values even when the associated Gaussian processes are transformed by continuous mappings designed to yield particular test statistics of interest. We achieve this using other easily simulated Gaussian processes whose distributions are identical to those of Section 2. Section 4 studies aspects of local power for our tests. Section 5 contains Monte Carlo simulations; this also illustrates use of the simulation methods developed for obtaining the asymptotic critical values in Section 2. Section 6 contains concluding remarks. All mathematical proofs are collected in the Appendix. Before proceeding, we introduce mathematical notation used throughout. We let 1{ · } stand for the indicator function such that 1{A} = 1 if the event A is true, and 0 otherwise. ⇒ and → denote ‘converge(s) d

weakly’ and ‘converge(s) to’, respectively, and = denotes equality in distribution. Further, k · k and k · k∞ denote the Euclidean and uniform metrics respectively. We let C(A) and D(A) be the spaces of continuous and cadlag mappings from a set A to R respectively, and we endow these spaces with Billingsley’s (1968, 1999) or Bickel and Wichura’s (1971) metric. We denote the unit interval as I := [0, 1].

2 2.1

Testing the IID Hypothesis Maintained Assumptions

We begin by collecting together assumptions maintained throughout and proceed with our discussion based on these. We first specify the data generating process (DGP) and a parameterized function whose behavior is of interest. A1 (DGP): Let (Ω, F, P) be a complete probability space. For m ∈ N, {Xt : Ω 7→ Rm , t = 1, 2, ...} is a stochastic process on (Ω, F, P). A2 (PARAMETERIZATION): For d ∈ N, let Θ be a non-empty convex compact subset of Rd . Let h : Rm × Θ 7→ R be a function such that (i) for each θ ∈ Θ, h(Xt ( · ), θ) is measurable; and (ii) for each ω ∈ Ω, h(Xt (ω), · ) is such that for each θ, θ 0 ∈ Θ, |h(Xt (ω), θ) − h(Xt (ω), θ 0 )| ≤ Mt (ω)kθ − θ 0 k, where Mt is measurable and is OP (1), uniformly in t. Assumption A2 specifies that Xt is transformed via h. The Lipschitz condition of A2(ii) is mild and typically holds in applications involving estimation. Our next assumption restricts attention to continuously distributed random variables. A3 (C ONTINUOUS R ANDOM VARIABLES): For given θ ∗ ∈ Θ, the random variables Yt := h(Xt , θ ∗ ) have continuous cumulative distribution functions (CDFs) Ft : R 7→ I, t = 1, 2, ....

4

Our main interest attaches to distinguishing the following hypotheses: H0 : {Yt : t = 1, 2, ...} is an IID sequence; vs. H1 : {Yt : t = 1, 2, ...} is not an IID sequence. Under H0 , Ft ≡ F (say), t = 1, 2, ... . We separately treat the cases in which F is known or unknown. In the latter case, we estimate F using the empirical distribution function. We also separately consider cases in which θ ∗ is known or unknown. In the latter case, we assume θ ∗ ˆ n . Formally, we impose is consistently estimated by θ ˆ n : Ω 7→ Θ} such that √n(θ ˆn − A4 (E STIMATOR): There exists a sequence of measurable functions {θ θ ∗ ) = OP (1). Thus, the sequence of transformed observations {Yt : t = 1, 2, ...} need not be observable. Instead, it will suffice that these can be estimated, as occurs when regression errors are of interest. In this case, h(Xt , θ ∗ ) can be regarded as a representation of regression errors X1t − E[X1t |X2t , ..., Xmt ], say. Estimated residuˆ n ). We pay particular attention to the effect of parameter estimation als then have the representation h(Xt , θ on the asymptotic null distribution of our test statistics.

2.2

Generalized Runs (GR) Tests

Our first result justifies popular uses of runs in the literature. For this, we provide a characterization of the runs distribution, new to the best of our knowledge, that can be exploited to yield a variety of runs-based tests consistent against departures from the IID null hypothesis. We begin by analyzing the case in which θ ∗ and F are known. We define runs in the following two steps: first, for each p ∈ I, we let Tn (p) := {t ∈ {1, ..., n} : F (Yt ) < p}, n = 1, 2, ... . This set contains those indices whose percentiles F (Yt ) are less than the given number p. That is, we first employ the probability integral transform of Rosenblatt (1952). Next, let Mn (p) denote the (random) number of elements of Tn (p), let tn,i (p) denote the ith smallest element of Tn (p), i = 1, ..., Mn (p), and define the p-runs Rn,i (p) as Rn,i (p) :=

 

tn,i (p),

i = 1;

 t (p) − t n,i n,i−1 (p), i = 2, ..., Mn (p). Thus, a p-run Rn,i (p) is a number of observations separating data values whose percentiles are less than the given value p. This is the conventional definition of runs found in the literature, except that F is assumed known for the moment. Thus, if the population median is known, then the conventional runs given by WaldWolfowitz (1940) are identical to ours with p = 0.5. The only difference is that we apply the probability integral transform; this enables us to later accommodate the influence of parameter estimation error on the 5

asymptotic distribution. In Section 2.3 we relax the assumption that F is known and examine how this affects the results obtained in this section. Note that Mn (p)/n = p + oP (1). Conventional runs are known to embody the IID hypothesis nonparametrically; this feature is exploited in the literature to test for the IID hypothesis. For example, the Wald-Wolfowitz (1940) runs test considers the standardized number of runs, whose distribution differs asymptotically from the standard normal if the data are not IID, giving the test its power. It is important to note that for a given n and p, n need not be an element of Tn (p). That is, there may be an ”incomplete” or ”censored” run at the end of the data that arises because F (Yn ) ≥ p. We omit this censored run from consideration to ensure that all the runs we analyze have an identical distribution. To see why this is important, consider the first run, Rn,1 (p), and, for the moment, suppose that we admit censored runs (i.e., we include the last run, even if F (Yn ) ≥ p). When a run is censored, we denote its length by k = ∅. When the original data {Yt } are IID, the marginal distribution of Rn,1 (p) is then   (1 − p)pk , if k ≤ n; P(Rn,1 (p) = k) = .  pn , if k = ∅, Thus, when censored runs are admitted, the unconditional distribution of Rn,1 (p) is a mixture distribution. The same is true for runs other than the first, but the mixture distributions differ due to the censoring. On the other hand, the uncensored run Rn,1 (p) is distributed as Gp , the geometric distribution with parameter p. The same is also true for uncensored runs other than the first. That is, {Rn,i (p), i = 1, 2, ..., Mn (p)} is the set of runs with identical distribution Gp , as every run indexed by i = 1, 2, . . . , Mn (p) is uncensored. (The censored run, when it exists, is indexed by i = Mn (p) + 1. When the first run is censored, Mn (p) = 0.) Moreover, as we show, the uncensored runs are independent when {Yt } is IID. Thus, in what follows, we consider only the uncensored runs, as formally defined above. Further, we construct and analyze our statistics in such a way that values of p for which Mn (p) = 0 have no adverse impact on our results. We now formally state our characterization result. For this, we let Kn,i stand as a shorthand notation PKn,i (p,p0 ) for Kn,i (p, p0 ), with p0 ≤ p, satisfying Kn,0 (p, p0 ) = 0, and j=K Rn,j (p) = Rn,i (p0 ). The 0 n,i−1 (p,p )+1 desired characterization is as follows. L EMMA 1: Suppose Assumptions A1, A2(i), and A3 hold. (a) Then for each n = 1, 2, ..., {Yt , t = 1, ..., n} is IID only if the following regularity conditions (R) hold: 1. for every p ∈ I such that Mn (p) > 0, {Rn,i (p), i = 1, ..., Mn (p)} is IID with distribution Gp , the geometric distribution with parameter p; and 2. for every p, p0 ∈ I with p0 ≤ p such that Mn (p0 ) > 0, (i) Rn,j (p) is independent of Rn,i (p0 ) if j ∈ / {Kn,i−1 + 1, Kn,i−1 + 2, ..., Kn,i }; 6

(ii) otherwise, for w = 1, ..., Mn (p0 ), m = 1, ..., w, and ` = m, ..., w, m+Kn,i−1

P(

X

Rn,j (p) = `, Rn,i (p0 ) = w|Kn,i−1 , Kn,i )

j=1+Kn,i−1

  ( `−1 )(1 − p)`−m (p − p0 )m (1 − p0 )w−(`+1) p0 , if ` = m, · · · , w − 1; m−1 =  ( `−1 )(1 − p)`−m (p − p0 )m−1 p0 , if ` = w, m−1 (b) If R holds, then Yt is identically distributed and pairwise independent. Conditions (1) and (2) of Lemma 1(a) enable us to detect violations of IID {Yt } in directions that differ from the conventional parametric approaches. Specifically, by Lemma 1, alternatives to IID {Yt } may manifest as p-runs with the following alternative (A) properties: A(i) : the p-runs have distribution Gq , q 6= p; A(ii) : the p-runs have non-geometric distribution; A(iii) : the p-runs have heterogeneous distributions; A(iv): the p-runs and p0 -runs have dependence between Rn,i (p) and Rn,j (p0 ) (i 6= j, p0 ≤ p); A(v) : any combination of (i) − (iv). Popularly assumed alternatives to IID data can be related to the alternatives in A. For example, stationary autoregressive processes yield runs with geometric distribution, but for a given p, {Rn,i (p)} has a geometric distribution different from Gp and may exhibit serial correlation. Thus stationary autoregressive processes exhibit A(i) or A(iv). Alternatively, if the original data are independent but heterogeneously distributed, then for some p, {Rn,i (p)} is non-geometric or has heterogeneous distributions. This case thus belongs to A(ii) or A(iii). To keep our analysis manageable, we focus on detecting A(i) − A(iii) by testing the p-runs for distribution Gp . That is, the hypotheses considered here are as follows: H00 : {Rn,i (p), i = 1, ..., Mn (p)} is IID with distribution Gp for each p ∈ I such that Mn (p) > 0; vs. H01 : {Rn,i (p), i = 1, ..., Mn (p)} manifests A(i), A(ii), or A(iii) for some p ∈ I such that Mn (p) > 0. Stated more primitively, the alternative DGPs aimed at here include serially correlated and/or heterogeneous alternatives. Alternatives that violate A(iv) without violating A(i) − A(iii) will generally not be detectable. Thus, our goal is different from the rank-based white noise test of Hallin, Ingenbleek, and Puri (1985) and the distribution-function based serial independence test of Delgado (1996). Certainly, it is of interest to devise statistics specifically directed at A(iv) in order to test H0 fully against the alternatives of H1 . Such statistics are not as simple to compute and require analysis different than those motivated by H01 ; moreover, the Monte Carlo simulations in Section 5 show that even with

7

attention restricted to H01 , we obtain well-behaved tests with power against both commonly assumed dependent and heterogeneous alternatives to IID. We thus leave consideration of tests designed specifically to detect A(iv) to other work. Lemma 1(b) is a partial converse of Lemma 1(a). It appears possible to extend this to a full converse (establishing {Yt } is IID) using results of Jogdeo (1968), but we leave this aside here for brevity. There are numerous ways to construct statistics for detecting A(i) − A(iii). For example, as for conventional runs statistics, we can compare the first two runs moments with those implied by the geometric distribution. Nevertheless, this approach may fail to detect differences in higher moments. To avoid difficulties of this sort, we exploit a GOF statistic based on the PGF to test the Gp hypothesis. For this, let −1 < s < 0 < s¯ < 1; for each s ∈ S := [s, s¯], define  Mn (p)  1 X sp Gn (p, s) := √ sRn,i (p) − , {1 − s(1 − p)} n

(1)

i=1

if p ∈ (pmin,n , 1), and Gn (p, s) := 0 otherwise, where pmin,n := min[F (Y1 ), F (Y2 ), . . . , F (Yn )]. This is a scaled difference between the p-runs sample PGF and the Gp PGF. Two types of GOF statistics are popular in the literature: those exploiting the empirical distribution function (e.g., Darling, 1955; Sukhatme, 1972; Durbin, 1973; and Henze, 1996) and those comparing empirical characteristic or moment generating functions (MGFs) with their sample estimates (e.g., Bierens, 1990; Brett and Pinkse, 1997; Stinchcombe and White 1998; Hong, 1999; and Pinkse, 1998). The statistic in (1) belongs to the latter type, as the PGF for discrete random variables plays the same role as the MGF, as noted by Karlin and Taylor (1975). The PGF is especially convenient because it is a rational polynomial in s, enabling us to easily handle the weak limit of the process Gn . Specifically, the rational polynomial structure permits us to represent this weak limit as an infinite sum of independent Gaussian processes, enabling us to straightforwardly estimate critical values by simulation, as examined in detail in Section 3. GOF tests using (1) are diagnostic, as are standard MGF-based GOF tests; thus, tests based on (1) do not tell us in which direction the null is violated. Also, like standard MGF-based GOF tests, they are not consistent against all departures from the IID hypothesis. Section 4 examines local alternatives to the null; we provide further discussion there. Our use of Gn builds on work of Kocherlakota and Kocherlakota (KK, 1986), who consider tests for a number of discrete null distributions, based on a comparison of sample and theoretical PGFs for a given finite set of s’s. To test their null distributions, KK recommend choosing s’s close to zero. Subsequently, Rueda, P´erez-Abreu, and O’Reilly (1991) examined the weak limit of an analog of Gn ( p, · ) to test the IID Poisson null hypothesis. Here we show that if {Rn,i (p)} is a sequence of IID p-runs distributed as Gp then Gn ( p, · ) obeys the functional central limit theorem; test statistics can be constructed accordingly.

8

Specifically, for each p, Gn ( p, · ) ⇒ G(p, · ), where G(p, · ) is a Gaussian process such that for each s, s0 ∈ S, E[G(p, s)] = 0, and   E G(p, s)G(p, s0 ) =

ss0 p2 (1 − s)(1 − s0 )(1 − p) . {1 − s(1 − p)}{1 − s0 (1 − p)}{1 − ss0 (1 − p)}

(2)

This mainly follows by showing that {Gn (p, · ) : n = 1, 2, ...} is tight (see Billingsley, 1999); the given covariance structure (2) is derived from E [Gn (p, s)Gn (p, s0 )] under the null. Let f : C(S) 7→ R be a continuous mapping. Then by the continuous mapping theorem, under the null any test statistic f [Gn (p, · )] obeys f [Gn (p, · )] ⇒ f [G(p, · )]. As Granger (1963) and Dufour (1981) emphasize, the power of runs tests may depend critically on the specific choice of p. For example, if the original data set is a sequence of independent normal variables with population mean zero and variance dependent upon index t, then selecting p = 0.5 yields no power, as the runs for p = 0.5 follow G0.5 despite the heterogeneity. Nevertheless, useful power can be delivered by selecting p different from 0.5. This also suggests that better powered runs tests may be obtained by considering numerous p’s at the same time. To fully exploit Gn , we consider Gn as a random function of both p and s, and not just Gn (p, · ) for given p. Specifically, under the null, a functional central limit theorem ensures that Gn ⇒ G

(3)

on J × S, where J := [p, 1] with p > 0, and G is a Gaussian process such that for each (p, s) and (p0 , s0 ) with p0 ≤ p, E [G(p, s)] = 0, and   E G(p, s)G(p0 , s0 ) =

ss0 p0 2 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)} . {1 − s(1 − p)}{1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)}

(4)

When p = p0 then the covariance structure is as in (2). Note also that the covariance structure in (4) is symmetric in both s and p, as we specify that p0 ≤ p. Without this latter restriction, the symmetry is easily seen, as the covariance then has the form ss0 min[p, p0 ]2 (1 − s)(1 − s0 )(1 − max[p, p0 ]){1 − s0 (1 − max[p, p0 ])} . {1 − s(1 − max[p, p0 ])}{1 − s0 (1 − min[p, p0 ])}2 {1 − ss0 (1 − max[p, p0 ])} To obtain (3) and (4), we exploit the joint probability distribution of runs associated with different percentiles p and p0 . Although our statistic Gn is not devised to test for dependence between Rn,j (p) and Rn,j (p0 ), verifying eq.(4) nevertheless makes particular use of the dependence structure implied by A(iv). This structure also makes it straightforward to devise statistics specifically directed at A(iv); we leave this aside here to maintain a focused presentation. For each s, Gn ( · , s) is cadlag, so the tightness of {Gn } must be proved differently from that of {Gn (p, · )}. Further, although G is continuous in p, it is not differentiable almost surely. This is because 9

Rn,i is a discrete random function of p. As n increases, the discreteness of Gn disappears, but its limit is not smooth enough to deliver differentiability in p. The weak convergence given in (3) is proved by applying the convergence criterion of Bickel and Wichura (1971, theorem 3). We verify this by showing that the modulus of continuity based on the fourthorder moment is uniformly bounded on J × S. By taking p > 0, we are not sacrificing much, as Mn (p) decreases as p tends to zero, so that Gn (p, · ) converges to zero uniformly on S. For practical purposes, we can thus let p be quite small. We examine the behavior of the relevant test statistics in our Monte Carlo experiments of Section 5 by examining what happens when p is zero. As before, the continuous mapping theorem ensures that, given a continuous mapping f : D(J × S) 7→ R, under the null the test statistic f [Gn ] obeys f [Gn ] ⇒ f [G]. Another approach uses the process Gn (· , s) on J. Under the null, we have Gn ( · , s) ⇒ G( · , s), where G( · , s) is a Gaussian process such that for each p and p0 in J with p0 ≤ p, E [G( · , s)] = 0, and   E G(p, s)G(p0 , s) =

s2 p0 2 (1 − s)2 (1 − p){1 − s(1 − p)} . {1 − s(1 − p)}{1 − s(1 − p0 )}2 {1 − s2 (1 − p)}

(5)

Given a continuous mapping f : D(J) 7→ R, under the null we have f [Gn ( · , s)] ⇒ f [G( · , s)]. We call tests based on f [Gn (p, · )], f [Gn (· , s)], or f [Gn ] generalized runs tests (GR tests) to emphasize their lack of dependence on specific values of p and/or s. We summarize our discussion as T HEOREM 1: Given conditions A1, A2(i), A3, and H0 , (i) for each p ∈ I, Gn ( p, · ) ⇒ G(p, · ), and if f : C(S) 7→ R is continuous, then f [Gn (p, · )] ⇒ f [G(p, · )]; (ii) for each s ∈ S, Gn ( · , s) ⇒ G( · , s), and if f : D(J) 7→ R is continuous, then f [Gn ( · , s)] ⇒ f [G( · , s)]; (iii) Gn ⇒ G, and if f : D(J × S) 7→ R is continuous, then f [Gn ] ⇒ f [G]. The proofs of Theorem 1(i, ii, and iii) are given in the Appendix. Although Theorem 1(i and ii) follow as corollaries of Theorem 1(iii), we prove Theorem 1(i and ii) first and use these properties as lemmas in proving Theorem 1(iii). Note that Theorem 1(i) holds even when p = 0, because Gn ( 0, · ) ≡ 0, and for every s, G(0, s) ∼ N (0, 0) = 0. We cannot allow p = 0 in Theorem 1(iii), however, because however large n is, there is always some p close to 0 for which the asymptotics break down. This necessitates our consideration of J instead of I in (iii). We remark that we do not specify f in order to allow researchers to form their own statistics based upon their particular interests. There are a number of popular mappings and justifications for these in the literature, especially those motivated by Bayesian interpretations. For example, Davies (1977) considers the mapping that selects the maximum of the random functions generated by nuisance parameters 10

present only under the alternative. The motivation for this is analogous to that for the Kolmogorov (K) goodness-of-fit statistic, namely, to test non-spurious peaks of the random functions. Bierens (1990) also proposes this choice for his consistent conditional moment statistic. Andrews and Ploberger (1994) study this mapping together with others, and propose a mapping that is optimal in a well defined sense. Alternatively, Bierens (1982) and Bierens and Ploberger (1997) consider integrating the associated random functions with respect to the nuisance parameters, similar to the Smirnov (S) statistic. This is motivated by the desire to test for a zero constant mean function of the associated random functions. Below, we examine K- and S-type mappings for our Monte Carlo simulations. A main motivation for this is that the goodness-of-fit aspects of the transformed data tested via the PGF have interpretations parallel to those for the mappings used in Kolmogorov’s and Smirnov’s goodness-of-fit statistics.

2.3

Empirical Generalized Runs (EGR) Tests

We now consider the case in which θ ∗ is known, but the null CDF of Yt is unknown. This is a common situation when interest attaches to the behavior of raw data. As the null CDF is unknown, Gn cannot be computed. Nevertheless, we can proceed by replacing the unknown F with a suitable estimator. The empirical distribution function is especially convenient here. Specifically, for each y ∈ R, we define P Fen (y) := n1 nt=1 1{Yt ≤y} . This estimation requires modifying our prior definition of p-runs as follows: fn (p) denote the (random) number of First, for each p ∈ I, let Ten (p) := {t ∈ N : Fen (Yt ) < p}, let M fn (p). (Note that elements of Ten (p), and let e tn,i (p) denote the ith smallest element of Ten (p), i = 1, ..., M fn (p)/nc = p.) We define the empirical p-runs as bM   e tn,i (p), e Rn,i (p) :=  e tn,i (p) − e tn,i−1 (p),

i = 1; fn (p). i = 2, ..., M

For each s ∈ S, define  Mn (p)  1 X sp en,i (p) R e Gn (p, s) := √ s − {1 − s(1 − p)} n f

(6)

i=1

e n (p, s) := 0 otherwise. if p ∈ ( n1 , 1), and G e n different from that for Gn . We The presence of Fen leads to an asymptotic null distribution for G now examine this in detail. For convenience, for each p ∈ I, let qen (p) := inf{x ∈ R : Fen (x) ≥ p}, let pen (p) := F (e qn (p)), and abbreviate pen (p) as pen . Then (6) can be decomposed into two pieces as fn (p) R en,i (p) e n = Wn + Hn , where for each (p, s), Wn (p, s) := n−1/2 PM G − se pn /{1 − s(1 − pen )}), and i=1 (s fn (p)(se Hn (p, s) := n−1/2 M pn /{1 − s(1 − pen )} − sp/{1 − s(1 − p)}). Our next result relates Wn to the random function Gn , revealing Hn to be the contribution of the CDF estimation error.

11

L EMMA 2: Given conditions A1, A2(i), A3, and H0 , (i) sup(p,s) ∈ J×S |Wn (p, s) − Gn (p, s)| = oP (1); (ii) Hn ⇒ H, where H is a Gaussian process on J × S such that for each (p, s) and (p0 , s0 ) with p0 ≤ p, E[H(p, s)] = 0, and   E H(p, s)H(p0 , s0 ) =

ss0 pp0 2 (1 − s)(1 − s0 )(1 − p) . {1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2

(7)

(iii) (Wn , Hn ) ⇒ (G, H), and for each (p, s) and (p0 , s0 ), E[G(p, s)H(p0 , s0 )] = −E[H(p, s)H(p0 , s0 )]. Lemma 2 relates the results of Theorem 1 to the unknown distribution function case. As Wn is asymptotically equivalent to Gn (as defined in the known F case), Hn must be the additional component incurred by estimating the empirical distribution function. e n , we let Ge be a Gaussian process on J × S such To state our result for the asymptotic distribution of G e s)] = 0, and that for each (p, s) and (p0 , s0 ) with p0 ≤ p, E[G(p, e s)G(p e 0 , s0 )] = E[G(p,

ss0 p0 2 (1 − s)2 (1 − s0 )2 (1 − p)2 . {1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)}

(8)

The analog of Theorem 1 can now be given as follows. T HEOREM 2: Given conditions A1, A2(i), A3, and H0 , e n (p, · ) ⇒ G(p, e · ), and if f : C(S) 7→ R is continuous, then f [G e n (p, · )] ⇒ (i) for each p ∈ I, G e · )]; f [G(p, e n ( · , s) ⇒ G( e · , s), and if f : D(J) 7→ R is continuous, then f [G e n ( · , s)] ⇒ (ii) for each s ∈ S, G e · , s)]; f [G( e n ⇒ G, e and if f: D(J × S) 7→ R is continuous, then f [G e n ] ⇒ f [G]. e (iii) G e n (p, · )], f [G e n ( · , s)], or f [G e n ] empirical generalized runs tests (EGR tests) to We call tests based on f [G highlight their use of the empirical distribution function. We emphasize that the distributions of the GR and EGR tests differ, as the CDF estimation error survives in the limit, a consequence of the presence of the component Hn .

2.4

EGR Tests with Nuisance Parameter Estimation

Now we consider the consequences of estimating θ ∗ by θˆn satisfying A4. As noted by Darling (1955), Sukhatme (1972), Durbin (1973), and Henze (1996), estimation can affect the asymptotic null distribution of GOF-based test statistics. Nevertheless, as we now show in detail, this turns out not to be the case here. We elaborate our notation to handle parameter estimation. Let Yˆn,t := h(Xt , θˆn ) and let Fˆn (y) := 1 Pn ˆ ˆ ˆ t 1{Yˆn,t ≤y} , so that Fn is the empirical CDF of Yn,t . Note that we replace θ ∗ with its estimate θ n to n

12

accommodate the fact that θ ∗ is unknown in this case. Thus, Fˆn contains two sorts of estimation errors: that arising from the empirical distribution and the estimation error for θ ∗ . Next, we define the associated runs using the estimates Yˆn,t and Fˆn . For each p in I, we now let ˆ n (p) denote the (random) number of elements of Tˆn (p), and let Tˆn (p) := {t ∈ N : Fˆn (Yˆn,t ) < p}, let M ˆ n (p). (Note that bM ˆ n (p)/nc = p.) We define tˆn,i (p) denote the ith smallest element of Tˆn (p), i = 1, ..., M the parametric empirical p-runs as ˆ n,i (p) := R

 

tˆn,i (p),

 tˆ (p) − tˆ n,i n,i−1 (p),

i = 1; ˆ n (p). i = 2, ..., M

ˆ n (p, s) := n−1/2 PMˆ n (p) (sRˆ n,i (p) − sp/{1 − s(1 − p)}) if p ∈ ( 1 , 1), and For each s ∈ S, define G i=1 n ˆ n (p, s) := 0 otherwise. Note that these definitions are parallel to those previously given. The only G difference is that we are using {Yˆn,t : t = 1, 2, ..., n} instead of {Yt : t = 1, 2, ..., n}. ˆ n as G ˆn = G ¨n + H ¨ n, To see why estimating θ ∗ has no asymptotic impact, we begin by decomposing G ¨ n (p, s) := where, letting qen (p) := inf{y ∈ R : Fen (y) ≥ p} and pen := F (e qn (p)) as above, we define G P ˆ n (p) R ˆ M ¨ n (p, s) := n−1/2 M ˆ n (p)(se (s n,i (p) − se pn /{1 − s(1 − pen )}), and H pn /{1 − s(1 − pen )} − n−1/2 j=1 en = sp/{1 − s(1 − p)}). Note that this decomposition is also parallel to the previous decomposition, G Wn + Hn . Our next result extends Lemma 2. L EMMA 3: Given conditions A1− A4 and H0 , ¨ n (p, s) − Gn (p, s)| = oP (1); (i) sup(p,s) ∈ J×S |G ¨ n (p, s) − Hn (p, s)| = oP (1). (ii) sup(p,s) ∈ I×S |H e n + oP (1), so the asymptotic ˆ n = Wn + Hn + oP (1) = G Given Lemma 2(i), it becomes evident that G ˆ n coincides with that of G e n , implying that the asymptotic runs distribution is primarily distribution of G determined by the estimation error associated with the empirical distribution Fˆn and not by the estimation of θ ∗ . The intuition behind this result is straightforward. As Darling (1955), Sukhatme (1972), Durbin (1973), and Henze (1996) note, the asymptotic distribution of an empirical process, say p 7→ Zˆn (p) := n1/2 {F (ˆ qn (p)) − p}, p ∈ I, where qˆn (p) := inf{y ∈ R : Fˆn (y) ≥ p}, is affected by parameter estimation error primarily because the empirical process Zˆn is constructed using the Yˆn,t := h(Xt , θˆn ) and the ˆ n , the parameter estimation error embodied in differentiable function F . Because h contains not θ ∗ but θ ˆ n is transmitted to the asymptotic distribution of Zˆn through qˆn and F. Thus, if we were to define runs as θ T¨n (p) := {t ∈ N : F (Yˆn,t ) < p}, then their asymptotic distribution would be affected by the parameter esˆ n,i } are constructed using Tˆn (p) := {t ∈ N : Fˆn (Yˆn,t ) < p}, timation error. Instead, however, our runs {R ˆ n is less important in this case, whereas the which replaces F with Fˆn , a step function. Variation in θ

13

estimation of F plays the primary role in determining the asymptotic runs distribution. This also implies ˆ n is estimated and F is known, it may be computationally convenient to construct the runs that when θ using Fˆn instead of F . The analog of Theorems 1 and 2 is: T HEOREM 3: Given conditions A1− A4 and H0 , ˆ n (p, · ) ⇒ G(p, e · ), and if f : C(S) 7→ R is continuous, then f [G ˆ n (p, · )] ⇒ (i) for each p ∈ I, G e · )]; f [G(p, ˆ n ( · , s) ⇒ G( e · , s), and if f : D(J) 7→ R is continuous, then f [G ˆ n ( · , s)] ⇒ (ii) for each s ∈ S, G e · , s)]; f [G( ˆ n ⇒ G, e and if f: D(J × S) 7→ R is continuous, then f [G ˆ n ] ⇒ f [G]. e (iii) G ˆ n ( p, · )], f [G ˆ n (·, s)], or f [G ˆ n ] parametric empirical generalized runs tests We call tests based on f [G (PEGR tests) to highlight their use of estimated parameters. By Theorem 3, the asymptotic null distribution ˆ n ] is identical to that of f [G e n ], which takes θ ∗ as known. We remark that I appears in Lemma of f [G ¨ n and Hn only involve the empirical distribution and not the 3(ii) and Theorem 3(i) rather than J, as H distribution of runs. This is parallel to results of Chen and Fan (2006) and Chan, Chen, Chen, Fan, and Peng (2009). They study semiparametric copula-based multivariate dynamic models and show that their pseudo-likelihood ratio statistic has an asymptotic distribution that depends on estimating the empirical ¨ n in Lemma 3 reflects the distribution but not other nuisance parameters. The asymptotically surviving H asymptotic influence of estimating the empirical distribution, whereas estimating the nuisance parameters has no asymptotic impact, as seen in Theorem 3.

3

Simulating Asymptotic Critical Values

Obtaining critical values for test statistics constructed as functions of Gaussian processes can be challenging. Nevertheless, the rational polynomial structure of our statistics permits us to construct representations of G and Ge as infinite sums of independent Gaussian random functions. Straightforward simulations then deliver the desired critical values. Given that Theorems 1, 2, and 3 do not specify the continuous mapping f , it is of interest to have methods yielding the asymptotic distributions of G and Ge rather than f [G] and e for a particular mapping f , as the latter distributions are easily obtained from the methods provided f [G] here once f is specified. e we use the Karhunen-Lo`eve (K-L) representation (Lo`eve, 1978, ch.11) of a To represent G and G, stochastic process. This represents Brownian motion as an infinite sum of sine functions multiplied by independent Gaussian random coefficients. Grenander (1981) describes this representation as a complete

14

orthogonal system (CONS) and provides many examples. For example, Krivyakov, Matynov, and Tyurin (1977) obtain the asymptotic critical values of von Mises’s ω 2 statistic in the multi-dimensional case by applying this method. In econometrics, Phillips (1998) has used the K-L representation to obtain asymptotic critical values for testing cointegration. Andrews’s (2001) analysis of test statistics for a GARCH(1,1) model with nuisance parameter not identified under the null also exploits a CONS representation. By theorem 2 of Jain and Kallianpur (1970), Gaussian processes with almost surely continuous paths have a CONS representation and can be approximated uniformly. We apply this result to our GR and (P)EGR test statistics; this straightforwardly delivers reliable asymptotic critical values.

3.1

Generalized Runs Tests

A fundamental property of Gaussian processes is that two Gaussian processes have identical distributions if their covariance structures are the same. We use this fact to represent G(p, · ), G( · , s), and G as infinite sums of independent Gaussian processes that can be straightforwardly simulated. To obtain critical values for GR tests, we can use the Gaussian process Z ∗ defined by Z ∗ (p, s) :=

∞ X  sp(1 − s)B00 (p) (1 − s)2 + sj Bjs p2 , (1 − p)1+j , 2 2 {1 − s(1 − p)} {1 − s(1 − p)}

(9)

j=1

where B00 is a Brownian bridge, and {Bjs : j = 1, 2, ...} is a sequence of independent Brownian sheets, whose covariance structure is given by E[Bjs (p, q)Bis (p0 , q 0 )] = 1{i=j} min[p, p0 ] · min[q, q 0 ]. The arguments of Bjs lie only in the unit interval, and it is readily verified that E[Z ∗ (p, s)Z ∗ (p0 , s0 )] is identical to (4), so Z has the same distribution as G. An inconvenient computational aspect of Z ∗ is that the terms Bjs require evaluation on a two dimensional square, which is computationally demanding. More convenient in this regard is the Gaussian process Z defined by   ∞ X sp(1 − s)B00 (p) (1 − s)2 p2 j 1+j Z(p, s) := + s (1 − p) Bj , {1 − s(1 − p)}2 {1 − s(1 − p)}2 (1 − p)1+j

(10)

j=1

where {Bj : j = 1, 2, ...} is a sequence of independent standard Brownian motions independent of the Brownian bridge B00 . It is straightforward to compute E[Z(p, s)Z(p0 , s0 )]. Specifically, if p0 ≤ p then E[Z(p, s)Z(p0 , s0 )] =

ss0 p0 2 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)} . {1 − s(1 − p)}{1 − s0 (1 − p0 )}2 {1 − ss0 (1 − p)}

This covariance structure is also identical to (4), so Z has the same distribution as G. The processes B00 and {Bj } are readily simulated as a consequence of Donsker’s (1951) theorem or the K-L representation (Lo`eve, 1978, ch.11), ensuring that critical values for any statistic f [Gn ] can be straightforwardly found by Monte Carlo methods. 15

Although one can obtain asymptotic critical values for p-runs test statistics f [Gn (p, · )] by fixing p in (9) or (10), there is a much simpler representation for G(p, · ). Specifically, consider the process Zp defined 1/2 P ∞ j j/2 Z , where {Z } is a sequence of IID standard normals. It by Zp (s) := sp(1−s)(1−p) j j j=0 s (1 − p) {1−s(1−p)} is readily verified that for each p, E[Zp (s)Zp ( s0 )] is identical to (2). Because Zp does not involve the Brownian bridge, Brownian motions, or Brownian sheets, it is more efficient to simulate than Z(p, · ). This convenient representation arises from the symmetry of equation (4) in s and s0 when p = p0 . The fact that equation (4) is asymmetric in p and p0 when s = s0 implies that a similar convenient representation for G(· , s) is not available. Instead, we obtain asymptotic critical values for test statistics f [Gn ( · , s)], by fixing s in (9) or (10). We summarize these results as follows. d

d

T HEOREM 4: (i) For each p ∈ I, G(p, · ) = Zp , and if f: C(S) 7→ R is continuous, then f [G(p, · )] = f [Zp ]; d

d

d

d

(ii) G = Z ∗ = Z, and if f: D(J × S) 7→ R is continuous, then f [G] = f [Z ∗ ] = f [Z]. As deriving the covariance structures of the relevant processes is straightforward, we omit the proof of Theorem 4 from the Appendix.

3.2

(P)EGR Tests

For the EGR statistics, we can similarly provide a Gaussian process whose covariance structure is the same as (8) and that can be straightforwardly simulated. By Theorem 3, this Gaussian process also yields critical values for PEGR test statistics. We begin with a representation for H. Specifically, consider the Gaussian process X defined by sp(1−s) 0 0 X (p, s) := − {1−s(1−p)} 2 B0 (p), where B0 is a Brownian bridge as before. It is straightforward to show

that when p0 ≤ p, E[X (p, s)X (p0 , s0 )] is the same as (7), implying that this captures the asymptotic distribution of the empirical distribution estimation error Hn , which survives to the limit. The representation Z for G in Theorem 4(ii) and the covariance structure for G and H required by Lemma 2(iii) together suggest representing Ge as Ze∗ or Ze defined by ∞ X  (1 − s)2 j s 2 1+j s B p , (1 − p) j {1 − s(1 − p)}2

(11)

  ∞ X p2 (1 − s)2 j 1+j s (1 − p) Bj {1 − s(1 − p)}2 (1 − p)1+j

(12)

Ze∗ (p, s) :=

j=1

and e s) := Z(p,

j=1

e is the sum of Z ∗ (resp. Z) and X with the identical B 0 in each. As is respectively, so that Ze∗ (resp. Z) 0 e 0 , s0 )Z(p, e s)]. Thus, simulating (11) readily verified, (8) is the same as E[Ze∗ (p0 , s0 )Ze∗ (p, s)] and E[Z(p 16

e n and G ˆ n. or (12) can deliver the asymptotic null distribution of G Similar to the previous case, the following representation is convenient when p is fixed:  ∞  sp(1 − s)(1 − p)1/2 X sj (1 − s) − p(1 − sj+1 ) e (1 − p)j/2 Zj . Zp (s) := {1 − s(1 − p)} {1 − s(1 − p)} j=0

e s) or Ze∗ (·, s). For fixed s, we use the representation provided by Z(·, We summarize these results as follows. d

T HEOREM 5 (i) H = X ; d e d e ·) = e · )] = (ii) For each p ∈ I, G(p, Zp , and if f : C(S) 7→ R is continuous, then f [G(p, f [Zep ]; d

d

d

d

e and if f : D(J × S) 7→ R is continuous, then f [G] e = f [Ze∗ ] = f [Z]. e (iii) Ge = Ze∗ = Z, As deriving the covariance structures of the relevant processes is straightforward, we omit the proof of Theorem 5 from the Appendix.

4

Asymptotic Local Power

Generalized runs tests target serially correlated autoregressive processes and/or independent heterogeneous processes violating A(i) − A(iii), as stated in Section 3. Nevertheless, runs tests are not always consistent against these processes, because just as for MGF-based GOF tests, PGF-based GOF tests cannot handle certain measure zero alternatives. We therefore examine whether the given (P)EGR test statistics have nontrivial power under specific local alternatives. To study this, we consider a first-order Markov process under which (P)EGR test statistics have nontrivial power when the convergence rate of the local alternative to the null is n−1/2 . Another motivation for considering this local alternative is to show that (P)EGR test statistics can have local power directly comparable to that of standard parametric methods. We consider first-order Markov processes for conciseness. The test can also be shown to have local power against higher-order Markov processes. Our results for first-order processes provide heuristic support for this claim, as higher-order Markov processes will generally exhibit first order dependence. A test capable of detecting true first-order Markov structure will generally be able to detect apparent first-order structure, as well. The situation is analogous to the case of autoregression, where tests for AR(1) structure are generally also sensitive to AR(p) structures, p > 1. We provide some additional discussion below in the simulation section. To keep our presentation succinct, we focus on EGR test statistics in this section. We saw above that the distribution theory for EGR statistics applies to PEGR statistics. This also holds for local power analysis. For brevity, we omit a formal demonstration of this fact here.

17

We consider a double array of processes {Yn,t }, and we let Fn,t denote the smallest σ-algebra generated by {Yn,t , Yn,t−1 , ..., }. We suppose that for each n, {Yn,1 , Yn,2 , ..., Yn,n } is a strictly stationary and geometric ergodic first-order Markov process having transition probability distributions P(Yn,t+1 ≤ y|Fn,t ) with the following Lebesgue-Stieltjes differential: H`1 : dFn (y|Fn,t ) = dF (y) + n−1/2 dD(y, Yn,t )

(13)

under the local alternative, where we construct the remainder term to be oP (n−1/2 ) uniformly in y. For this, we suppose that D(·, Yn,t ) is a signed measure with properties specified in A5, and that for a suitable signed measure Q with Lebesgue-Stieltjes differential dQ, Yn,t has marginal Lebesgue-Stieltjes differential dFn (y) = dF (y) + n−1/2 {dQ(y) + o(1)}.

(14)

We impose the following formal condition. A5 (LOCAL ALTERNATIVE): (i) For each n = 1, 2, ..., {Yn,1 , Yn,2 , ..., Yn,n } is a strictly stationary and geometric ergodic first-order Markov process with transition probability distributions given by eq. (13) and marginal distributions given by eq. (14), where (ii) D : R × R 7→ R is a continuous function such that D(·, z) defines a signed measure for each z ∈ R; (iii) supx |D(x, Yn,t )| ≤ Mn,t such that E[Mn,t ] ≤ ∆ < ∞ uniformly in t and n, and limy→±∞ D(y, Yn,t ) = 0 a.s.−P uniformly in t and n; (iv) R∞ R∞ supy −∞ |D(y, x)|dF (x) ≤ ∆ and supy | y D(y, x)dD(x, Yn,t )| ≤ Mn,t for all t and n. Thus, as n tends to infinity, {Yn,1 , Yn,2 , ...} converges in distribution to an IID sequence of random variables with marginal distribution F . Note that the marginal distribution given in eq. (14) is obtained by substituting the conditional distribution of Yn,t−j+1 |Fn,t−j (j = 1, 2, ...) into (13) and integrating with reR∞ spect to the random variables other than Yn,t . For example, −∞ D(y, z)dF (z) = Q(y). This implies that the properties of Q are determined by those of D. For example, limy→∞ Q(y) = 0 and supy |Q(y)| ≤ ∆. Our motivations for condition A5 are as follows. We impose the first-order Markov condition for conciseness. Higher-order Markov processes can be handled similarly. Assumption A5(i) also implies that {Yn,t } is an ergodic β–mixing process by theorem 1 of Davydov (1973). Next, assumptions A5(ii, iii) ensure that Fn ( · |Fn,t ) is a proper distribution for all n almost surely, corresponding to A3. Finally, assumptions A5(iii, iv) asymptotically control certain remainder terms of probabilities relevant to runs. Specifically, applying an induction argument yields that for each k = 1, 2, ..., 1 P(Yn,t+1 ≥ y, ..., Yn,t+k−1 ≥ y, Yn,t+k < y|Fn,t ) = p(1 − p)k−1 + √ hk (p, Yn,t ) + rk (p, Yn,t ), (15) n where p is a short-hand notation for F (y); for each p, h1 (p, Yn,t ) := C(p, Yn,t ) := D(F −1 (p), Yn,t ); h2 (p, Yn,t ) := w(p) − p C(p, Yn,t ); and for k = 3, 4, ..., hk (p, Yn,t ) := w(p)(1 − p)k−3 (1 − (k − 18

1)p) − p(1 − p)k−2 C(p, Yn,t ), where w(p) := α(F −1 (p)) :=

R∞ y

D(y, x)dF (x). Here, the remainder

term rk (p, Yn,t ) is sequentially computed using previous remainder terms and hk (p, Yn,t ). For example, R∞ for given p, r1 (p, Yn,t ) = 0, r2 (p, Yn,t ) := n−1 F −1 (p) D(F −1 (p), x)dD(x, Yn,t ), and so forth. These remainder terms turn out to be OP (n−1 ), mainly due to assumptions A5(iii, iv). Runs distributions can also be derived from (15), with asymptotic behavior controlled by assumptions A5(iii, iv). That is, if Yn,t < y, then the distribution of a run starting from Yn,t+1 , say Rn,i (p), can be obtained from (15) as P(Rn,i (p) = k) = P(Yn,t+1 ≥ y, Yn,t+2 ≥ y, ..., Yn,t+k < y|Yn,t < y) = p(1 − p)k−1 + n−1/2 Fn (F −1 (p))−1 hn,k (p) + Fn (F −1 (p))−1 rn,k (p),

(16)

R F −1 (p) R F −1 (p) where, as n tends to infinity, for each k, hn,k (p) := −∞ hk (p, x)dFn (x) → hk (p) := −∞ hk (p, x) R F −1 (p) R F −1 (p) rk (p, x)dF (x); and for each p, rk (p, x)dFn (x) → rk (p) := −∞ dF (x) and rn,k (p) := −∞ Fn (F −1 (p)) → p from assumptions A5(iii, iv). Further, the remainder term rk (p) is OP (n−1 ), uniformly in p. The local power of EGR test statistics stems from the difference between the distribution of runs given in eq. (16) and that obtained under the null. Specifically, the second component on the right-hand side (RHS) of (16) makes the population mean of Gn different from zero, so that the limiting distribution of Gn corresponding to that obtained under the null can be derived when its population mean is appropriately adjusted. This non-zero population mean yields local power for n−1/2 local alternatives for the EGR test statistics as follows. e n − µ ⇒ G, e where for each (p, s) ∈ J × S, T HEOREM 6: Given conditions A1, A2(i), A3, A5, and H`1 , G R F −1 (p) s(1−s) µ(p, s) := ps(1 − s){sw(p) − Q(F −1 (p))}/{1 − s(1 − p)}2 + {1−s(1−p)} C(p, z)dF (z). −∞ It is not difficult to specify DGPs satisfying the condition A5. For example, an AR(1) process can be constructed so as to belong to this case. That is, if for each t, Yn,t := n−1/2 Yn,t−1 + εt and εt ∼ IID N (0, 1), then we can let C(p, Yn,t ) = −ξ(p)Yn,t + oP (1) and w(p) = −ξ(p)2 , where ξ(p) := φ[Φ−1 (p)], and φ( · ) and Φ( · ) are the probability density function (PDF) and CDF of a standard normal random variable. This gives µ(p, s) = {ξ(p)2 s(1 − s)2 }/{1 − s(1 − p)}2 , with Q ≡ 0. Because we have convergence rate n−1/2 , the associated EGR test statistics have the same convergence rate as the parametric local alternative. We point out several implications of Theorem 6. First, if the convergence rate in (13) is lower than 1/2, the EGR test may not have useful power; EGR tests are not powerful against every alternative to H00 . For EGR tests to be consistent against first-order Markov processes, the rate must be at least 1/2. Second, the statement for first-order Markov process can be extended to further higher-order Markov 19

processes, although we do not pursue this here for brevity. Theorem 6 therefore should be understood as a starting point for identifying Markov processes as a class of n−1/2 -alternatives. Finally, the result of Theorem 6 does not hold for every local alternative specification. Our examination of a variety of other local alternative specifications reveals cases in which EGR tests have nontrivial power at the rate n−1/4 . For example, certain independent and non-identically distributed (INID) DGPs can yield EGR test statistics exhibiting n−1/4 rates. This rate arises because analysis of these cases requires an expansion of the conditional distribution of runs of order higher than that considered in Theorem 6. For brevity, we do not examine this further here.

5

Monte Carlo Simulations

In this section, we use Monte Carlo simulation to obtain critical values for test statistics constructed with f delivering the L1 (S-type) and uniform (K-type) norms of its argument. We also examine level and power properties of tests based on these critical values.

5.1

Critical Values

R p p (S1 ) := sups∈S1 |Gn (p, s)|, We consider the following statistics: T1,n (S1 ) := S1 |Gn (p, s)|ds, T∞,n R p p e n (p, s)|, where S1 := [−0.99, 0.99], and p ∈ e n (p, s)|ds, Te∞,n (S1 ) := sups∈S1 |G (S1 ) := S1 |G Te1,n R R s := s e s := |G e n (p, s)|dp, {0.1, 0.3, 0.5, 0.7, 0.9}; T1,n |G (p, s)|dp, T := sup |G (p, s)|, T n n p∈I ∞,n 1,n I I R R s e n (p, s)|, where s ∈ {−0.5, −0.3, −0.1, 0.1, 0.3, 0.5}; and T1,n (S) := Te∞,n := supp∈I |G I S |Gn (p, s)| R R e n (p, s)| dsdp, Te∞,n (S) := sup(p,s)∈I×S |G en dsdp, T∞,n (S) := sup(p,s)∈I×S |Gn (p, s)|, Te1,n (S) := I S |G (p, s)|, where we consider S1 := [−0.99, 0.99] and S2 := [−0.50, 0.50] for S. As discussed above, these Sand K-type statistics are relevant for researchers interested in testing for non-zero constant mean function and non-spurious peaks of Gn on I × S in terms of T1,n (S) and T∞,n (S) respectively. Note that these test statistics are constructed using I instead of J. There are two reasons for doing this. First, we want to examine the sensitivity of these test statistics to p. We have chosen the extreme case to examine the levels of the test statistics. Second, as pointed out by Granger (1963) and Dufour (1981), more alternatives can be handled by specifying a larger space for p. Theorems 4 and 5 ensure that the asymptotic null distributions of these statistics can be generated by simulating Zp , Z (or Z ∗ ), Zep , and Ze (or Ze∗ ), as suitably transformed. We approximate these using Wp (s) :=

50 sp(1 − s)(1 − p)1/2 X j s (1 − p)j/2 Zj , {1 − s(1 − p)} j=0

20

  40 X (1 − s)2 p2 sp(1 − s) 0 j 1+j e e B (p) + , W(p, s) := s (1 − p) Bj {1 − s(1 − p)}2 0 {1 − s(1 − p)}2 (1 − p)1+j j=1

 50  sp(1 − s)(1 − p)1/2 X j p f Wp (s) := s − (1 − p)j/2 Zj , {1 − s(1 − p)} {1 − s(1 − p)}

and

j=0

  40 2 2 X (1 − s) p j 1+j f s) := W(p, , s (1 − p) Bej {1 − s(1 − p)}2 (1 − p)1+j j=1

P respectively, where Be00 (p) := W0 (p) − pW0 (1), Bej (x + p) := Wx+1 (p) + xk=1 Wk (1) (with x ∈ N, and p ∈ I), and {Wk : k = 0, 1, 2, ...} is a set of independent processes approximating Brownian motion √ P (k) using the K-L representation, defined as Wk (p) := 2 100 `=1 {sin[(`−1/2)πp]}Z` /{(`−1/2)π}, where (j)

Z`

∼ IID N (0, 1) with respect to ` and j. We evaluate these functions for I, S1 , and S2 on the grids

{0.01, 0.02, ..., 1.00}, {−0.99, −0.98, ..., 0.98, 0.99}, and {−0.50, −0.49, ..., 0.49, 0.50}, respectively. Concerning these approximations, several comments are in order. First, the domains for p and s are approximated using a relatively fine grid. Second, we truncate the sum of the independent Brownian motions at 40 terms. The jth term contributes a random component with a standard deviation of sj p(1 − p)(1+j)/2 , which vanishes quickly as j tends to infinity. Third, we approximate Bej on the positive Euclidean line by the Brownian motion on [0, 10, 000]. Preliminary experiments showed the impact of these approximations to be small when S is appropriately chosen; we briefly discuss certain aspects of these experiments below. Table I contains the critical values generated by 10,000 replications of these processes.

5.2

Level and Power of the Test Statistics

In this section, we compare the level and power of generalized runs tests with other tests in the literature. We conduct two sets of experiments. The first examines power against dependent alternatives. The second examines power against structural break alternatives. To examine power against dependent alternatives, we follow Hong and White (2005) and consider the following DGPs: • DGP 1.1: Xt := εt ; • DGP 1.2: Xt := 0.3Xt−1 + εt ; 1/2

2 ; • DGP 1.3: Xt := ht εt , and ht = 1 + 0.8Xt−1 1/2

2 1 2 • DGP 1.4: Xt := ht εt , and ht = 0.25 + 0.6ht−1 + 0.5Xt−1 {εt−1 <0} + 0.2Xt−1 1{εt−1 ≥0} ;

• DGP 1.5: Xt := 0.8Xt−1 εt−1 + εt ; • DGP 1.6: Xt := 0.8ε2t−1 + εt ; • DGP 1.7: Xt := 0.4Xt−1 1{Xt−1 >1} − 0.5Xt−1 1{Xt−1 ≤1} + εt ; • DGP 1.8: Xt := 0.8|Xt−1 |0.5 + εt ; 21

• DGP 1.9: Xt := sgn(Xt−1 ) + 0.43εt ; where εt ∼ IID N (0, 1). Note that DGP 1.1 satisfies the null hypothesis, whereas the other DGPs represent interesting dependent alternatives. As there is no parameter estimation, we apply our EGR statistics and compare these to the entropy-based nonparametric statistics of Robinson (1991), Skaug and Tjøstheim (1996), and Hong and White (2005), denoted as Rn , STn , and HWn , respectively. We present the results in Tables II to IV. To summarize, the EGR test statistics generally show approxs exhibits level distortion imately correct levels, even using I instead of J. We notice, however, that Te1,n

when s gets close to one. This is mainly because the number of Brownian motions in the approximation is finite, and these are defined on the finite Euclidean positive real line, [0, 10,000]. If s and p are close to one and zero respectively, then the approximation can be coarse. Specifically, the given finite number of Brownian motions may not enough to adequately approximate the desired infinite sum of Brownian motions, and the given finite domain [0, 10,000] may be too small to adequately approximate the positive Euclidean real line. For the other tests, we do not observe similar level distortions. For the DGPs generating alternatives to H0 , the EGR tests generally gain power as n increases. As noted by Granger (1963) and Dufour (1981), a particular selection of p or, more generally, the choice of p s )-based mapping f can yield tests with better or worse power. Generally, we see that the Te1,n (S1 ) (resp. Te1,n p s )-based tests. Similarly, the T e1,n (S)-based tests outperform the (S1 ) (resp. Te∞,n tests outperform the Te∞,n p (S1 )-based tests, more extreme choices for p Te∞,n (S)-based tests for both S1 and S2 . Among the Te1,n

often yield better power. Also, in general, the power performances of the Te1,n (S2 )-based tests are midway s -based tests. Apart from these observations, there between those of the best and worst cases for the Te1,n p is no clear-cut relation between the Te1,n (S1 )-based tests and the Te1,n (S1 )-based tests. The more powerful p Te1,n (S1 )-based tests dominate the Te1,n (S1 )-based tests for DGPs 1.3-1.5, but the Te1,n (S1 )-based tests

dominate for DGPs 1.2, and 1.6-1.9. Comparing EGR tests to the entropy-based tests, we observe three notable features. First, Te1,n (S)s (S )-based tests generally dominate entropy-based tests for DGP 1.2 and 1.8. Second, based tests or Te1,n 2

for DGPs 1.3, 1.6, and 1.7, entropy-based tests are more powerful than the EGR tests. Finally, for the other DGPs, the best powered EGR tests exhibit power roughly similar to that of the best powered entropy-based tests. Such mixed results are common in the model specification testing literature, especially in non-parametric contexts where there is no generally optimal test. For example, Fan and Li (2000) compare the power properties of specification tests using kernel-based nonparametric statistics with Bierens and Ploberger’s (1997) integrated conditional moment (ICM) tests. They find that these tests are complementary, with differing power depending on the type of local alternative. Similarly, the entropy-based and EGR tests can also be

22

used as complements. In addition, we conducted Monte Carlo simulations for higher-order Markov processes. As the results are quite similar to those in Tables III and IV, we omit them for brevity. For structural break alternatives, we compare our PEGR tests to a variety of well-known tests. These include Feller’s (1951) and Kuan and Hornik’s (1995) RR test, Brown, Durbin, and Evans’s (1975) RECUSUM test, Sen’s (1980) and Ploberger, Kr¨amer and Kontrus’s (1989) RE test, Ploberger and Kr¨amer’s (1992) OLS-CUSUM test, Andrews’s (1993) Sup-W test, Andrews and Ploberger’s (1994) Exp-W and Avg-W tests, and Bai’s (1996) M-test.1 As these are all designed to test for a single structural break at an unknown point, they may not perform well when there are multiple breaks. In contrast, our PEGR statistics are designed to detect general alternatives to IID, so we expect these may perform well in such situations. We consider the following DGPs for our Monte Carlo simulations. These have been chosen to provide a test bed in which the behaviors of the various tests can be clearly contrasted. • DGP 2.1: Yt := Zt + εt ; • DGP 2.2: Yt := exp(Zt ) + εt ; • DGP 2.3: Yt := 1{t>b0.5·nc} + εt ; • DGP 2.4: Yt := Zt 1{t≤b0.5·nc} − Zt 1{t>b0.5·nc} + εt ; • DGP 2.5: Yt := Zt 1{t≤b0.3nc} − Zt 1{t>b0.3nc} + εt ; • DGP 2.6: Yt := Zt 1{t≤b0.1nc} − Zt 1{t>b0.1nc} + εt ; • DGP 2.7: Yt := exp(Zt )1{t≤b0.5·nc} + exp(−Zt )1{t>b0.5·nc} + εt ; • DGP 2.8: Yt := Zt 1{t∈Kn (0.02)} − Zt 1{t∈K / n (0.02)} + εt ; • DGP 2.9: Yt := Zt 1{t∈Kn (0.05)} − Zt 1{t∈K / n (0.05)} + εt ; • DGP 2.10: Yt := Zt 1{t∈Kn (0.1)} − Zt 1{t∈K / n (0.1)} + εt ; • DGP 2.11: Yt := Zt 1{t=odd} − Zt 1{t=even} + εt ; • DGP 2.12: Yt := exp(0.1 · Zt )1{t=odd} + exp(Zt )1{t=even} + εt , where Zt = 0.5Zt−1 + ut ; (εt , ut )0 ∼ IID N (0, I2 ); and Kn (r) := {t = 1, ..., n : (k − 1)/r + 1 ≤ t ≤ k/r, k = 1, 3, 5, ...}. For DGPs 2.1, 2.4–2.6, and 2.8–2.11, we use ordinary least squares (OLS) to estimate the parameters of a linear model Yt = α + βZt + vt , and we apply our PEGR statistics to the prediction errors vˆt := ˆ t . For DGP 2.3, we specify the model Yt = α + vt , and we apply our PEGR statistic to Yt − Yt − α ˆ − βZ P n−1 nt=1 Yt . The linear model is correctly specified for DGP 2.1, but is misspecified for DGPs 2.3–2.6 1

We also examined Chu, Hornik, and Kuan’s (1995a) ME test and Chu, Hornik, and Kuan’s (1995b) RE-MOSUM and OLS-

MOSUM tests. Their performance is comparable to that of the other prior tests, so for brevity we do not report those results here.

23

and 2.8–2.11. Thus, when DGP 2.1 is considered the null hypothesis holds, permitting an examination of the level of the tests. As the model is misspecified for DGPs 2.3–2.6 and 2.8–2.11, the alternative holds for vˆt , permitting an examination of power. DGPs 2.3–2.6 exhibit a single structural break at different break points, permitting us to see how the PEGR tests compare to standard structural break tests specifically designed to detect such alternatives. DGPs 2.8 through 2.11 are deterministic mixtures in which the true coefficient of Zt depends on whether or not t belongs to a particular structural regime. The number of structural breaks increases as the sample size increases, but the proportion of breaks to the sample size is constant. Also, the break points are equally spaced. Thus, for example, there are four break points in DGP 2.8 when the sample size is 100 and and nine break points when the sample size is 200. The extreme case is DGP 2.11, in which the proportion of breaks is equal to one, and the coefficient of Zt depends on whether or not t is even. Given the regular pattern of these breaks, this may be hard to distinguish from a DGP without a structural break. For DGPs 2.2, 2.7, and 2.12, we use nonlinear least squares (NLS) to estimate the parameters of a nonlinear model Yt = exp(βZt ) + vt , and we apply our PEGR statistics to the prediction errors ˆ t ). The situation is analogous to that for the linear model, in that the null holds for vˆt := Yt − exp(βZ DGP 2.2, whereas the alternative holds for 2.7 and 2.12. Examining these alternatives permits an interesting comparison of the PEGR tests, designed for general use, to the RR, RE, M, OLS-CUSUM and RE-CUSUM statistics, which are expressly designed for use with linear models. Our simulation results are presented in Tables V to VII. To summarize, the levels of the PEGR tests are approximately correct for both linear and nonlinear cases and generally improve as the sample size increases. On the other hand, there are evident level distortions for some of the other statistics, especially, as expected, for the linear model statistics with nonlinear DGP 2.2. The PEGR statistics also have respectable power. They appear consistent against our structural break alternatives, although the PEGR tests are not as powerful as the other (properly sized) break tests when there is a single structural break. This is as expected, as the other tests are specifically designed to detect a single break, whereas the PEGR test is not. As one might also expect, the power of the PEGR tests diminishes notably as the break moves away from the center of the sample. Nevertheless, the relative performance of the tests reverses when there are multiple breaks. All test statistics lose power as the proportion of breaks increases, but the loss of power for the non-PEGR tests is much faster than for the PEGR tests. For the extreme alternative DGP 2.11, the PEGR tests appear to be the only consistent tests. We also note that, as for the dependent alternatives, the integral norm-based tests outperform the supremum norm-based tests. Finally, we briefly summarize the results of other interesting experiments omitted from our tabulations

24

for the sake of brevity. To further examine level properties, we applied our EGR tests (i) with Yt ∼ IID C(0, 1), where C(`, s) denotes the Cauchy distribution with location and scale parameters ` and s respectively, and (ii) with Yt = (ut + 1)1{εt ≥0} + (ut − 1)1{εt <0} , where (εt , ut ) ∼ IID N (0, I2 ). We consider the Cauchy process to examine whether the absence of moments in the raw data matters, and we consider the normal random mixture to compare the results with the deterministic mixture, DGP 2.10. Our experiments yielded results very similar to those reported for DGP 1.1. This affirms the claims for the asymptotic null distributions of the (P)EGR test statistics in the previous sections. To further examine the power of our (P)EGR tests, we also considered the mean shift processes analyzed by Crainiceanu and Vogelsang (2007), based on DGP 2.3. Our main motivation for this arises from the caveat in the literature that CUSUM and CUSQ tests may exhibit power functions non-monotonic in α. (See Deng and Perron (2008) for further details.) In contrast, we find that the (P)EGR test statistics do not exhibit this non-monotonicity.

6

Conclusion

The IID assumption plays a central role in economics and econometrics. Here we provide a family of tests based on generalized runs that are powerful against unspecified alternatives, providing a useful complement to tests designed to have power against specific alternatives, such as serial correlation, GARCH, or structural breaks. Relative to other tests of this sort, for example the entropy-based tests of Hong and White (2005), our tests have an appealing computational simplicity, in that they do not require kernel density estimation, with the associated challenge of bandwidth selection. Our simulation studies show that our tests have empirical levels close to their nominal asymptotic levels. They also have encouraging power against a variety of important alternatives. In particular, they have power against dependent alternatives and heterogeneous alternatives, including those involving a number of structural breaks increasing with the sample size.

7

Appendix: Proofs

To prove our main results, we first state some preliminary lemmas. The proofs of these and other lemmas below can be found at url: http://web.yonsei.ac.kr/jinseocho/. Recall that J := [p, 1], p > 0, and for notational simplicity for every p, p0 ∈ I with p0 ≤ p and Mn (p0 ) > 0, we let Kn,i denote Kn,i (p, p0 ) such PKn,i (p,p0 ) that Kn,0 (p, p0 ) = 0 and j=K Rn,j (p) = Rn,i (p0 ). 0 n,i−1 (p,p )+1 L EMMA A1: Given A1, A2(i), A3, and H0 , if s ∈ S, p0 , p ∈ I, and p0 ≤ p such that Mn (p0 ) > 0, then PKn,1 Rn,i (p) E[ i=1 s ] = E[sRn,1 (p) ] · E[Kn,1 ]. 25

L EMMA A2: Given A1, A2(i), A3, and H0 , if s ∈ S, p0 , p ∈ J and p0 ≤ p such that Mn (p0 ) > 0, then 0

E[Kn,1 sRn,1 (p ) ] =

sp0 {1 − s(1 − p)} . {1 − s(1 − p0 )}2

(17)

In the special case in which p = p0 , we have Kn,1 = 1 and E(sRn,1 (p) ) = sp/(1 − s(1 − p)). L EMMA A3: Given A1, A2(i), A3, and H0 , if s ∈ S, p0 , p ∈ J, and p0 ≤ p such that Mn (p0 ) > 0, then PKn,1 Rn,i (p)+Rn,1 (p0 ) E[ i=1 s ] = s2 p0 /{1 − s2 (1 − p)} · [{1 − s(1 − p)}/{1 − s(1 − p0 )}]2 . L EMMA A4: Given A1, A2(i), A3, and H0 , if p0 , p ∈ J and p0 ≤ p such that Mn (p0 ) > 0, then P(Kn,1 = k) = (p/p0 )[(p − p0 )/p]k−1 and E[Kn,1 ] = p/p0 . M (p)

n L EMMA A5: Let p ∈ I such that Mn (p) > 0. If {Rn,i (p)}i=1 is IID with distribution Gp then, for Pm `−1 )(1 − p)`−m pm . m = 1, 2, ..., and ` = m, m + 1, m + 2, ..., P( i=1 Rn,i (p) = `) = (m−1

L EMMA A6: Let p, p0 ∈ I such that Mn (p0 ) > 0. Given condition R of Lemma 1, then for i, k = 1, 2, ..., P S 0 0 { m (i) if ` = i, i + 1, ..., i + k − 1, P( i+k+1−` j=2 Rn,j (p ) = i + k − `}) = p ; m=2 (ii) when p > p0 ,  m i  p0 (1 − p0 )i−1 , X [ if ` = i; { Rn,j (p) = i}, Rn,1 (p0 ) = `) = P(  p0 (p − p0 )(1 − p0 )`−2 , if ` = i + 1, · · · , i + k. m=1 j=1 L EMMA A7: Let p, p0 ∈ I such that Mn (p0 ) > 0. Given condition R of Lemma 1 and p > p0 , then Si+k+1−` Pm P S P 0 { j=2 Rn,j (p0 ) = P( im=1 { m for i, k = 1, 2, ..., i+k−1 j=1 Rn,j (p) = i}, Rn,1 (p ) = `, m=2 `=i P S 0 0 0 i−1 . i + k − `}) + P( im=1 { m j=1 Rn,j (p) = i}, Rn,1 (p ) = i + k) = pp (1 − p ) L EMMA A8: Let K be a random positive integer, and let {Xt } be a sequence of random variables such PK P that for each i = 1, 2, ..., E(Xi ) < ∞. Then E( K i=1 Xi ) = E( i=1 E(Xi |K)). Before proving Lemma 1, we define several relevant notions. First, for p ∈ I with Mn (p) > 0, we PUn,i (p) define the building time to i by Bn,i (p) := i − j=1 Rn,j (p), where Un,i (p) is the maximum number Pw P of runs such that j=1 Rn,j (p) < i; i.e., Un,i (p) := max{w ∈ N : w j=1 Rn,j (p) < i}. Now Bn,i (p) ∈ {1, 2, ..., i − 1, i}; and if Bn,i (p) = i then Rn,1 (p) ≥ i. For p, p0 ∈ I, p0 < p, with Mn (p0 ) > 0, we also PUn,i (p) PWn,i (p,p0 ) let Wn,i (p, p0 ) be the number of runs {Rn,i (p0 )} such that j=1 Rn,j (p) = j=1 Rn,j (p0 ). Proof of Lemma 1: As part (a) is easy, we prove only part (b). We first show that R implies that the original data {Yt } are independent. For this, we show that for any pair of two variables, (Yi , Yi+k ) (i, k ≥ 1), say, P(Fi (Yi ) ≤ p, Fi+k (Yi+k ) ≤ p0 ) = pp0 . We partition our consideration into three cases: (a) p = p0 ; (b) p < p0 ; and (c) p > p0 and obtain the given equality for each case. (a) Let p = p0 . We S P Sk+m Ph have P(Fi (Yi ) ≤ p, Fi+k (Yi+k ) ≤ p0 ) = P( im=1 { m j=1 Rn,j (p) = i}, h=m+1 { j=m+1 Rn,j (p) =

26

P P Sk Ph Pi Pm k}) = im=1 P( { m j=1 Rn,j (p) = i})P( h=1 { j=1 Rn,j (p) = k}) = m=1 P({ j=1 Rn,j (p) = P P i}) kh=1 P( hj=1 Rn,j ( p) = k) = p · p = p2 , where the second and third equalities follow from the condition R and Lemma A5 respectively. (b) Next, suppose p < p0 . We have P(Fi (Yi ) ≤ p, Fi+k (Yi+k ) ≤ Pm Sk+h Ph P S P 0 0 p0 ) = ih=1 P( im=1 { m j=1 Rn,j (p ) = i, m=h+1 { j=h+1 Rn,j (p ) = k}) = j=1 Rn,j (p) = i}, Pi Si Pm Ph Pk Pm 0 0 h=1 P( m=1 { j=1 Rn,j (p) = i}, j=1 Rn,j (p ) = i) m=1 P( j=1 Rn,j (p ) = k), where the P S P second equality follows from R. Further, Lemma A5 implies that ih=1 P( im=1 { m j=1 Rn,j (p) = Pm P P S P S Ph i i}, j=1 Rn,j (p0 ) = i) = P( ih=1 { hj=1 Rn,j (p0 ) = i}, im=1 m m=1 P( j=1 j=1 Rn,j (p) = i) = P P 0 0 Rn,j (p) = i) = p, and km=1 P( m j=1 Rn,j (p ) = k) = p . Thus, P(Fi (Yi ) ≤ p, Fi+k (Yi+k ) ≤ Pi p0 ) = pp0 . (c) Finally, let p0 < p. We have P(Fi (Yi ) < p, Fi+k (Yi+k ) < p0 ) = b=1 P(Fi (Yi ) < p, Fi+k (Yi+k ) < p0 , Bn,i (p) = b) and derive each piece constituting this sum separately. We first examine Pi+k−1 Si P the case b = i. Then P(Fi (Yi ) < p, Fi+k (Yi+k ) < p0 , Bn,i (p) = i) = `=i P( m=1 { m j=1 Rn,j (p) = P S P S m i i+k+1−` 0 0 { m i}, Rn,1 (p0 ) = `, m=2 j=2 Rn,j (p ) = i + k − `}) + P( m=1 { j=1 Rn,j (p) = i}, Rn,1 (p ) = i + k) = pp0 (1 − p0 )i−1 , where the last equality follows from Lemma A7. Next, we consider the cases b = 1, 2, ..., i − 1. Then it follows that P(Fi (Yi ) < p, Fi+k (Yi+k ) < p0 , Bn,i (p) = b) = Sb+k+u+1−` Pm Pm S P 0 [ b+k−1 { j=u+2 Rn,j (p0 ) = b+k− P( b+w m=1+w { j=1+w Rn,j (p) = b}, Rn,u+1 (p ) = `, m=u+2 `=b Sb+w P Si−b Pm 0 0 `}) + P( m=1+w { m j=1+w Rn,j (p) = b}, Rn,u+1 (p ) = b + k)] × P( m=1 { j=1 Rn,j (p ) = i − b}), where w and u are short-hand notations for Wn,i (p, p0 ) and Un,i (p). Given this, we further note that Pm Pi−b Pm S 0 0 0 P( i−b m=1 P( j=1 Rn,j (p ) = i−b) = p by Lemma A5; the condition m=1 { j=1 Rn,j (p ) = i−b}) = Pm Sb Pm S 0 R implies that P( b+w m=1+w { j=1+w Rn,j (p) = b}, Rn,u+1 (p ) = b + k) = P( m=1 { j=1 Rn,j (p) = Pm S 0 b}, Rn,u+1 (p0 ) = b+k) and for ` = b, b+1, ..., b+k −1, P( b+w m=1+w { j=1+w Rn,j (p) = b}, Rn,u+1 (p ) P S Sb+k+u+1−` Pm 0 { j=u+2 Rn,j (p0 ) = b + k − `}) = P( bm=1 { m = `, m=u+2 j=1 Rn,j (p) = b}, Rn,1 (p ) = Sb+k+1−` Pm `, m=2 { j=2 Rn,j (p0 ) = b + k − `}), so that P(Fi (Yi ) < p, Fi+k (Yi+k ) < p0 , Bn,i (p) = b) = P(Fb (Yi ) < p, Fb+k (Yb+k ) < p0 , Bn,b (p) = b)p0 = pp0 2 (1 − p0 )b−1 . Hence, P(Fi (Yi ) < p, Fi+k (Yi+k ) < P P 02 0 b−1 +pp0 (1−p0 )i−1 = pp0 . p0 ) = ib=1 P(Fi (Yi ) < p, Fi+k (Yi+k ) < p0 , Bn,i (p) = b) = i−1 b=1 pp (1−p ) Thus, Yi and Yi+k are independent. Next, suppose that {Yt } is not identically distributed. Then there is a pair, say (Yi , Yj ), such that for some y ∈ R, pi := Fi (y) 6= pj := Fj (y). Further, for the same y, P(Rn,(j) (pj ) = 1) = P(Fj (Yj ) ≤ Fj (y)|Fj−1 (Yj−1 ) ≤ Fj−1 (y)) = P(Fj (Yj ) ≤ Fj (y)) = pj , where the subscript (j) denotes the (j)th run of {Rn,i (pj )} corresponding to Fj (y), and the second equality follows from the independence property just shown. Similarly, P(Rn,(i) (pj ) = 1) = P(Fj (Yi ) ≤ Fj (y)) = P(Yi ≤ y) = pi . That is, P(Rn,(j) (pj ) = 1) 6= P(Rn,(i) (pj ) = 1). This contradicts the assumption that {Rn,i (p)} is identically distributed for all p ∈ I. Hence, {Yt } must be identically distributed. This completes the proof.



Proof of Theorem 1: (i) We separate the proof into three parts. In (a), we prove weak convergence of

27

Gn (p, · ). In (b), we show E [G(p, s)] = 0 for each s ∈ S. Finally, (c) derives E [G(p, s)G(p, s0 )]. ¯ 1 > 0, E[{Gn (p, s)−Gn (p, s0 )}4 ] ≤ ∆ ¯ 1 |s−s0 |4 . Note that for each (a) First, we show that for some ∆ PMn (p) Rn,i (p) 0 Rn,i (p) p, E[{Gn (p, s)−Gn (p, s0 )}4 ] = (1−n−1 )E[{G(p, s)−G(p, s0 )}4 ]+n−2 E[ i=1 {s −s + sp s0 p 4 0 4 {1−s(1−p)} − {1−s0 (1−p)} } ] ≤ 2E[{G(p, s) − G(p, s )} ] s0 p 4 {1−s0 (1−p)} } ], a consequence of finite dimensional weak

+ n−1 E[{sRn,i (p) − s0 Rn,i (p) +

sp {1−s(1−p)}



convergence, which follows from Lindeberg-

Levy’s central limit theorem (CLT) and the Cram´er-Wold device. We examine each piece on the RHS d

separately. It is independently shown in Theorem 4(i) that G(p, · ) = Zp . Thus, E[{G(p, s)−G(p, s0 )}4 ] = E[{Zp (s) − Zp (s0 )}4 ] uniformly in p. If we let mp (s) := sp(1 − s)(1 − p)1/2 {1 − s(1 P j −p)}−1 and Bj (p) := (1 − p)j/2 Zj for notational simplicity, then Zp (s) = mp (s) ∞ j=0 s Bj (p), and it follows that {Zp (s) − Zp (s0 )}4 = {Ap (s)[mp (s) − mp (s0 )] + mp (s0 )Bp (s)(s − s0 )}4 , where Ap (s) := P∞ j P∞ Pj 0 j−k sk B (p). We can also use the mean-value theorem to j j=0 s Bj (p) and Bp (s) := j=0 k=0 s obtain that for some s00 between s and s0 , mp (s) − mp (s0 ) = m0p (s00 )(s − s0 ). Therefore, if we let ∆1 := (1 − s)2 (1 − se)−2 with se := max[|s|, s¯], E[{Zp (s)−Zp (s0 )}4 ] = E[{Ap (s)m0p (s00 )+mp (s0 )Bp (s)}4 ]|s− s0 |4 ≤ ∆41 {|E[Ap (s)4 ]|+4|E[Ap (s)3 Bp (s)]|+6|E[Ap (s)2 Bp (s)2 ]|+4|E[Ap (s)Bp (s)2 ]|+|E[Bp (s)4 ]|}|s− s0 |4 , because supp,s |m0p (s)| ≤ ∆1 and supp,s |mp (s)| ≤ ∆1 . Some tedious algebra shows that sup(p,s)∈I×S 6 , (1−e s4 )2

E[Ap (s)4 ] ≤ ∆2 :=

and sup(p,s)∈I×S E[Bp (s)4 ] ≤ ∆3 :=

72 , (1−e s4 ) 5

so that E[{Zp (s) −

Zp (s0 )}4 ] ≤ 16∆41 ∆3 |s − s0 |4 . Using H¨older’s inequality, we obtain |E[Ap (s)3 Bp (s)]| ≤ |E[Ap (s)4 ]3/4 3/4

1/4

E[Bp (s)4 ]1/4 ≤ ∆2 ∆3

2/4

2/4

≤ ∆3 , |E[Ap (s)2 Bp (s)2 ]| ≤ |E[Ap (s)4 ]2/4 E[Bp (s)4 ]2/4 ≤ ∆2 ∆3 3/4

1/4

and |E[Ap (s)Bp (s)3 ]| ≤ |E[Ap (s)4 ]3/4 E[Bp (s)4 ]1/4 ≤ ∆2 ∆3

≤ ∆3 ,

≤ ∆3 , where the final inequalities

follow from the fact that ∆2 ≤ ∆3 . Next, we note that |sRn,i (p) − s0 Rn,i (p) | ≤ Rn,i (p)e sRn,i (p) |s − s0 | and |sp/{1 − s(1 − p)} − s0 p/{1 − s0 (1 − p)}| ≤

1 |s (1−e s)2

− s0 | with E[Rn,i (p)4 se4Rn,i (p) ] ≤ 24(1 − se4 )−5 .

e 1 := 384 × (1 − Thus, when we let Qn,i := Rn,i (p)e sRn,i (p) + (1 − se)−2 , it follows that E[Q4n,i ] ≤ ∆ e 1 |s − s0 |4 . se4 )−5 (1 − se)−8 , and E[{sRn,i (p) − s0 Rn,i (p) + sp/{1 − s(1 − p)} − s0 p/{1 − s0 (1 − p)}}4 ] ≤ ∆ ¯ 1 is defined by ∆ ¯ 1 := (32∆4 ∆3 + ∆ e 1 ) then E[|Gn (p, s) − Gn (p, s0 )|4 ] ≤ ∆ ¯ 1 |s − s0 |4 . Given this, if ∆ 1 Second, therefore, if we let s00 ≤ s0 ≤ s, E[|Gn (p, s) − Gn (p, s0 )|2 |Gn (p, s0 ) − Gn (p, s00 )|2 ] ≤ ¯ 1 |s − s00 |4 , where the first inequality E[|Gn (p, s) −Gn (p, s0 )|4 ]1/2 E[|Gn (p, s0 ) − Gn (p, s00 )|4 ]1/2 ≤ ∆ follows from Cauchy-Schwarz’s inequality. This verifies condition (13.14) of Billingsley (1999). The desired result follows from these, theorem 13.5 of Billingsley (1999) and the finite dimensional weak convergence, which obtains by applying the Cram´er-Wold device. PMn (p) Rn,i (p) PMn (p) (b) Under the given conditions and the null, E[ i=1 s − sp/{1 − s(1 − p)}] = E[ i=1 PMn (p) E[sRn,i (p) −sp/{1 − s(1 − p)}|Mn (p)]] = E[ i=1 sp/{1 − s(1 − p)} − sp/{1 − s(1 − p)}] = 0, where the first equality follows from Lemma A.8, and the second equality follows from the fact that given Mn (p), Rn,i (p) is IID under the null.

28

PMn (p) (c) Under the given the conditions and the null, E[Gn (p, s)Gn (p, s0 )] = n−1 E[ i=1 E[[sRn,i (p) − sp/{1 − s(1 − p)}][s0 Rn,i (p) − s0 p/{1 − s0 (1 − p)}]|Mn (p)]] = n−1 E[Mn (p)[ss0 p/{1 − ss0 (1 − p)} − ss0 p2 /{1 − s(1 − p)}{1 − s0 (1 − p)}]] = ss0 p2 (1 − s)(1 − s0 )(1 − p)/[{1 − ss0 (1 − p)}{1 − s(1 − p)}{1 − s0 (1 − p)}], where the first equality follows from Lemma A.8, and the last equality follows because n−1 E[Mn (p)] = p. Finally, it follows from the continuous mapping theorem that f [Gn (p, · )] ⇒ f [G(p, · )]. This is the desired result. (ii) This can be proved in numerous ways. We verify the conditions of theorem 13.5 of Billingsley (1999). Our proof is separated into three parts: (a), (b), and (c). In (a), we show the weak convergence of Gn ( · , s). In (b), we prove that for each p, E [G(p, s)] = 0. Finally, in (c), we show that E [G(p, s)G(p0 , s)] = s2 p02 (1 − s)2 (1 − p)/{1 − s(1 − p0 )}2 {1 − s2 (1 − p)}. (a) First, for each s, we have G(1, s) ≡ 0 as Gn (1, s) ≡ 0, and for any δ > 0, limp→1 P(|G(p, s)| > δ) ≤ limp→1 E(|G(p, s)|2 )/δ 2 = limp→1 s2 p2 (1 − s)2 (1 − p)/δ 2 {1 − s(1 − p)}2 {1 − s2 (1 − p)} = 0 uniformly on S, where the inequality and equality follow from Markov’s inequality and the result in (c) respectively. Thus, for each s, G(p, s) − G(1, s) ⇒ 0 as p → 1. Second, it’s not hard to show that E[{Gn (p, s) − Gn (p0 , s)}4 ] = E[{G(p, s) − G(p0 , s)}4 ] − n−1 p0 −1 E[{G(p, s) − G(p0 , s)}4 ] + PKn,1 Ri 0 0 n−1 p0 E[{ i=1 (s − E[sRi ]) − (sR1 − E[sR1 ])}4 ] using the finite dimensional weak convergence result. We examine each piece on the RHS separately. From some tedious algebra, it follows that E[{G(p, s) − G(p0 , s)}4 ] = 3s4 (1 − s)4 {ks (p)ms (p) − 2ks (p0 )ms (p) + ks (p0 )ms (p0 )}2 ≤ 3{|{ks (p) − ks (p0 )}ms (p)| + |ks (p0 ){ms (p0 ) − ms (p)}|}2 , where for each p, ks (p) := 1−p . {1−s2 (1−p)}

p2 , {1−s(1−p)}2

and ms (p) :=

Note that |ks |, |ms |, |ks0 | and |m0s | are bounded by ∆4 := max[∆1 , ∆2 , ∆3 ] uniformly in

e 2 > 0 such that if n is sufficiently large enough, then E[{G(p, s) − (p, s). This implies that there exists ∆ e 2 |p − p0 |2 . Some algebra implemented using Mathematicar shows that for some G(p0 , s)}4 ] ≤ ∆ PKn,1 Ri 0 0 ∆5 > 0, p0 E[{ i=1 (s − E[sRi ]) − (sR1 − E[sR1 ])}4 ] ≤ ∆5 p0 −1 |p − p0 |, so that given that p0 ≥ p > 0, ¯ 2 |p − p0 |2 for sufficiently large n, where if n−1 is less than |p − p0 | then E[{Gn (p, s) − Gn (p0 , s)}4 ] ≤ ∆ ¯ 2 := ∆ e 2 (1 + p−1 ) + ∆5 p−1 . Finally, for each p00 ≤ p0 ≤ p, E[{Gn (p, s) − Gn (p0 , s)}2 {Gn (p0 , s) − ∆ ¯ 2 |p−p00 |2 by the CauchyGn (p00 , s)}2 ] ≤ E[|Gn (p, s)−Gn (p0 , s)|4 ]1/2 E[|Gn (p0 , s)−Gn (p00 , s)|4 ]1/2 ≤ ∆ Schwarz inequality. The weak convergence of {Gn ( · , s)} holds by theorem 13.5 of Billingsley (1999) and finite dimensional weak convergence, which can be obtained by the Cram´er-Wold device. (b) For each p, E[G(p, · )] = 0 follows from the proof of Theorem 1(i, b). (c) First, for convenience, for each p and p0 , we let M and M 0 denote Mn (p) and Mn (p0 ) respectively, and let Ri and Ri0 stand for Rn,i (p) and Rn,i (p0 ). Then from the definition of Kn,j , E[Gn (p, s)Gn (p0 , s)] = PKn,1 Ri PKn,1 Ri 0 0 0 n−1 E[M 0 E[ i=1 (s − E[sRi ])(sR1 − E[sR1 ])|M, M 0 ]] = n−1 E[M 0 ]E[ i=1 (s −E[sRi ])(sR1 − PKn,1 Ri +R0 PKn,1 Ri 0 R10 R10 Ri Ri R10 1 − E[sR1 ])] = p0 E[ i=1 s i=1 s E[s ] − Kn,1 s E[s ] + Kn,1 E[s ]E[s ]], where the

29

first equality follows from Lemma A8 since {Kn,j } is IID under the null and Rj0 is independent of R` , if ` ≤ Kn,j−1 or ` ≥ Kn,j + 1. The second equality follows, as {M, M 0 } is independent of {Ri , R1 : P nm1 Ri Ri R10 0 0 i = 1, 2, ..., Kn,1 }. Further, E[ K i=1 s ] = E[s ] · E[Kn,1 ], E[Kn,1 s ] = sp /{1 − s(1 − p )} · PKn,1 Ri +R0 1 ] = s2 p0 /{1 − s2 (1 − p)} · [{1 − s(1 − p)}/{1− {1 − s(1 − p)}/{1 − s(1 − p0 )}, and E[ i=1 s s(1 − p0 )}]2 by Lemmas A1 to A4. Substituting these into the above equation yields the desired result. (iii) We separate the proof into two parts, (a) and (b). In (a), we prove the weak convergence of Gn , and in (b) we derive its covariance structure. (a) In order to show the weak convergence of Gn , we exploit the moment condition in theorem 3 of Bickel and Wichura (1971, p. 1665). For this, we first let B and C be neighbors in J × S such that B := (p1 , p2 ] × (s1 , s2 ] and C := (p1 , p2 ] × (s2 , s3 ]. Without loss of generality, we suppose that |s2 −s1 | ≤ |s3 −s2 |. Second, we define |Gn (B)| := |Gn (p1 , s1 )−Gn (p1 , s2 )−Gn (p2 , s1 )+Gn (p2 , s2 )|, then |Gn (B)| ≤ |Gn (p1 , s1 )−Gn (p1 , s2 )|+|Gn (p2 , s2 )−Gn (p2 , s1 )|, so that E[|Gn (B)|4 ] = E[|A1 |4 ]+ 4E[|A1 |3 |A2 |] + 6E[|A1 |2 |A2 |2 ] + 4E[|A1 ||A2 |3 ] + E[|A2 |4 ] ≤ E[|A1 |4 ] + 4E[|A1 |4 ]3/4 E[|A2 |4 ]1/4 + 6E[|A1 |4 ]2/4 E[|A2 |4 ]2/4 + 4E[|A1 |4 ]1/4 E[|A2 |4 ]3/4 + E[|A2 |4 ] using H¨older’s inequality, where we let A1 := |Gn (p1 , s1 ) − Gn (p1 , s2 )| and A2 := |Gn (p2 , s2 ) − Gn (p2 , s1 )| for notational simplicity. We ¯ 1 |s1 −s2 |4 and E[|A2 |4 ] ≤ ∆ ¯ 1 |s1 −s2 |4 in the proof of Theorem 1(i). Thus, already saw that E[|A1 |4 ] ≤ ∆ ¯ 1 |s1 − s2 |4 . Third, we define |Gn (C)| := |Gn (p2 , s2 ) − Gn (p2 , s3 ) − Gn (p3 , s2 ) + E[|Gn (B)|4 ] ≤ 16∆ Gn (p3 , s3 )|; then |Gn (C)| ≤ |Gn (p2 , s2 ) − Gn (p3 , s2 )| + |Gn (p3 , s3 ) − Gn (p2 , s3 )|. Using the same logic as above, H¨older’s inequality, and the result in the proof of Theorem 1(ii), we obtain E[|Gn (C)|4 ] ≤ ¯ 2 |p2 − p1 |2 for sufficiently large n. Fourth, therefore, using H¨older’s inequality, we obtain that for all 16∆ ¯ 2 − s1 |2 · |p2 − p1 |2 }2/3 ≤ ∆{|s ¯ 2− sufficiently large n, E[|B|4/3 |C|8/3 ] ≤ E[|B|4 ]1/3 E[|C|4 ]2/3 ≤ ∆{|s ¯ 3/4 λ(B)}2/3 {∆ ¯ 3/4 λ(C)}2/3 , where ∆ ¯ := 16∆ ¯ 1/3 ∆ ¯ 2/3 , s1 | · |p2 − p1 |}2/3 {|s3 − s2 | · |p2 − p1 |}2/3 = {∆ 1 2 and λ( · ) denotes the Lebesgue measure of the given argument. This verifies the moment condition (3) in theorem 3 of Bickel and Wichura (1971, p. 1665). Fifth, it trivially holds from the definition of Gn that G = 0 on {(p, s) ∈ J × S : s = 0}. Finally, the continuity of G on the edge of J × S was verified in the proof of Theorem 1(ii). Therefore, the weak convergence of {Gn } follows from the corollary in Bickel and Wichura (1971, p. 1664) and the finite dimensional weak convergence obtained by Lindeberg-Levy’s CLT and the Cram´er-Wold device. (b) As before, for convenience, for each p and p0 , we let M and M 0 denote Mn (p) and Mn (p0 ) respectively, and we let Ri and Ri0 be short-hand notations for Rn,i (p) and Rn,i (p0 ). Also, we let Kn,j be as

30

previously defined. Then, under the given conditions and the null, 0

0

0

E[Gn (p, s)Gn (p , s )] = n

−1

M X M X R0 R0 E[ (sRi − E[sRi ])(s0 j − E[s0 j ])]

(18)

j=1 i=1 Kn,1 0

= p E[

X i=1

0

Ri 0 R1

s s

Kn,1



X

E[s0

R10

]sRi − Kn,1 s0

R10

E[sRi ] + Kn,1 E[sRi ]E[s0

R10

]],

i=1

where the first equality follows from the definition of Gn , and the second equality holds for the same reason PKn,1 Ri as in the proof of Theorem 1(ii). From Lemmas A1 to A4, we have that E[ i=1 s ] = E[sRi ] · E[Kn,1 ], PKn,1 Ri 0 R10 0 and E[Kn,1 s0 R1 ] = s0 p0 /{1 − s0 (1 − p0 )} · {1 − s0 (1 − p)}/{1 − s0 (1 − p0 )}, and E[ i=1 s s ]= {s s0 p0 }/{1 − ss0 (1 − p)} · [{1 − s0 (1 − p)}/{1 − s0 (1 − p0 )}]2 . Thus, substituting these into (18) gives E[Gn (p, s)Gn (p0 , s0 )] = ss0 p02 (1 − s)(1 − s0 )(1 − p){1 − s0 (1 − p)}/[{1−s(1−p)}{1−s0 (1−p0 )}2 {1− ss0 (1 − p)}]. Finally, it follows from the continuous mapping theorem that f [Gn ] ⇒ f [G].



pn (p) − p| → 0 almost surely by Glivenko-Cantelli. Second, Gn ⇒ G Proof of Lemma 2: (i) First, supp∈I |e by Theorem 1(ii). Third, (D(J × S) × D(J)) is a separable space. Thus, (Gn , pen ( · )) ⇒ (G, · ) by theorem 3.9 of Billingsley(1999). Fourth, |G(p, s) − G(p0 , s0 )| ≤ |G(p, s) − G(p0 , s)| + |G(p0 , s) − G(p0 , s0 )|, and each term of the RHS can be made as small as desired by letting |p − p0 | and |s − s0 | tend to zero, as G ∈ C(J × S) a.s. Finally, note that for each (p, s), Wn (p, s) = Gn (e pn (p), s). Therefore, Wn − Gn = Gn (e pn ( · ), · ) − Gn ( · , · ) ⇒ G − G = 0 by a lemma of Billingsley (1999, p. 151) and the four facts just shown. This implies that sup(p,s)∈J×S |Wn (p, s) − Gn (p, s)| → 0 in probability, as desired. (ii) We write pen (p) as pen for convenience. By the mean value theorem, for some p∗n (p) (in I) between p ∈ J and pen , |[{se pn }/{1 − s(1 − pen )} − sp/{1 − s(1 − p)}] − {s(1 − s)(e pn − p)}/{1 − s(1 − p)}2 | = 2s2 (1 − s)(e pn − p)2 /{1 − s(1 − p∗n (p))}3 , where supp∈J |p∗n (p) − p| → 0 a.s. by Glivenko-Cantelli. Also, fn (p) fn (p)s2 (1 − s)(e 1 M s2 (1 − s) M pn − p)2 2 sup n(e pn − p) , ≤ √ sup sup √ sup ∗ 3 ∗ 3 n p,s {1 − s(1 − pn (p))} n{1 − s(1 − pn (p))} n p p p,s fn (p) and s2 (1 − s){1 − s(1 − p∗n (p))}−3 are uniformly bounded by 1 and 1/(1 − se)3 respecwhere n−1 M tively, with se := max[|s|, s¯]; and n(e pn − p)2 = OP (1) uniformly in p. Thus,  fn (p)  M se pn sp s(1 − s) (e pn − p) √ = oP (1). sup − − {1 − s(1 − p)}2 n {1 − s(1 − pen )} {1 − s(1 − p)} p,s fn (p) − p| = oP (1), and the Given these, the weak convergence of Hn follows immediately, as supp |n−1 M √ function of p defined by n(e pn − p) weakly converges to a Brownian bridge, permitting application of the lemma of Billingsley (1999, p. 151). These facts also suffice for the tightness of Hn .

31

Next, the covariance structure of H follows from the fact that for each (p, s) and (p0 , s0 ) with p0 ≤ p, √ fn (p)s(1 − s)(e E[{M pn − p)}/ n{1 − s(1 − p)}2 ] = 0, and " # fn (p)s(1 − s) (e fn (p0 )s0 (1 − s0 )(e M pn − p) M p0n − p0 ) ss0 pp0 2 (1 − s)(1 − s0 )(1 − p) √ √ E = , {1 − s(1 − p)}2 {1 − s0 (1 − p0 )}2 n{1 − s(1 − p)}2 n{1 − s0 (1 − p0 )}2 which is identical to E[H(p, s)H(p0 , s0 )]. (iii) To show the given claim, we first derive the given covariance structure. For each (p, s) and (p0 , s0 ), E[Wn (p, s)Hn (p0 , s0 )] = E[E[Wn (p, s)|X1 , ..., Xn ]Hn (p0 , s0 )], where the equality follows because Hn is measurable with respect to the smallest σ–algebra generated by {X1 , ..., Xn }. Given this, PM fn (p) e we have E[Wn (p, s)|X1 , ..., Xn ] = n−1/2 i=1 pn /{1 − s(1 − pen )}] = [E[sRn,i (p) |X1 , ..., Xn ] − se P fn (p) M n−1/2 i=1 [sp/ {1 − s(1 − p)} − se pn /{1 − s(1 − pen )}] = −Hn (p, s). Thus, E[E[Wn (p, s)|X1 , ..., Xn ]Hn (p0 , s0 )] = − E[Hn (p, s)Hn (p0 , s0 )]. Next, we have that E[G(p, s)H(p0 , s0 )] = limn→∞ E[Wn (p, s)Hn (p0 , s0 )] by Lemma 2(i). Further, E[H(p, s)H(p0 , s0 )] = limn→∞ E[Hn (p, s)Hn (p0 , s0 )]. It follows that E[G(p, s)H(p0 , s0 )] = −E[H(p, s)H(p0 , s0 )]. e n , Hn )0 and apply example 1.4.6 of van der Vaart and Wellner (1996, p. 31) to Next, we consider (G e n = Wn +Hn = Gn +Hn +oP (1), and that Gn and Hn are each tight, show weak convergence. Note that G e n is tight, too. Further, Gn and Hn have continuous limits by Theorem 1(ii) and Lemma 2(ii). Thus, so G e n must weakly converge to if the finite-dimensional distributions of Gn + Hn have weak limits, then G the Gaussian process Ge with the covariance structure (8). We may apply the Lindeberg-Levy CLT to show this unless Gn + Hn ≡ 0 almost surely. That is, for each (p, s) with s 6= 0, E[Gn (p, s) + Hn (p, s)] = 0, and E[{Gn (p, s) + Hn (p, s)}2 ] = E[Gn (p, s)2 ] − E[Hn (p, s)2 ] + o(1) ≤ E[Gn (p, s)2 ] + o(1) = s2 p2 (1 − s)2 (1 − p)/ {1 − s2 (1 − p)}{1 − s(1 − p)}2 + o(1), which is uniformly bounded, so that for each (p, s) with s 6= 0, the sufficiency conditions for the Lindeberg-Levy CLT hold. The first equality above follows by applying Lemma 2(ii). If s = 0, then Gn ( ·, 0) + Hn ( · , 0) ≡ 0, so that the probability e n now limit of Gn ( ·, 0) + Hn ( · , 0) is zero. Given these, the finite-dimensional weak convergence of G follows from the Cram´er-Wold device. e n is asymptotically independent of Hn because E[G e n (p, s)Hn (p0 , s0 )] = E[Wn (p, s) Next, note that G e n , Hn )0 Hn (p0 , s0 )]+E[Hn (p, s)Hn (p0 , s0 )] = 0 by the covariance structure given above. It follows that (G e H)0 by example 1.4.6 of van der Vaart and Wellner (1996, p. 31). To complete the proof, take ⇒ (G, e n − Hn , Hn )0 = (Wn , Hn )0 , and apply the continuous mapping theorem. (G



e n ⇒ G. e This and the Proof of Theorem 2: (i, ii, and iii) The proof of Lemma 2(iii) establishes that G continuous mapping theorem imply the given claims.



Proof of Lemma 3: (i) First, as shown in the proof of Lemma 3(ii) below, Fˆn (y( · )) converges to F (y( · )) in probability uniformly on I, where for each p, y(p) := inf{x ∈ R : F (x) ≥ p}. Second, Gn ⇒ G by 32

Theorem 1(ii). Third, (D(J×S)×D(J)) is a separable space. Therefore, it follows that (Gn , Fˆn (y( · ))) ⇒ ¨ n ( · , · ) = Gn (Fˆn (y (G, F (y( · ))) by theorem 3.9 of Billingsley (1999). Fourth, G ∈ C(J × S). Finally, G ¨ n ( · , · ) − Gn ( · , · ) = Gn (Fˆn (y( · )), · ) − Gn ( · , · ) ⇒ G − G = 0, where the weak ( · )), · ), so that G ¨ n (p, s) − Gn (p, s)| = convergence follows from the lemma of Billingsley (1999, p. 151). Thus, sup(p,s) |G oP (1).

¨ n (p, s) permits the representation H ¨ n (p, s) = {M ˆ n (p)/n}{√n[se (ii) First, the definition of H pn /{1− √ pn /{1 − s(1 − pen )}−sp/{1 − s(1 − p)}] s(1−e pn )}−sp/{1 − s(1 − p)}]}. Second, it follows that n[se ⇒ −s(1 − s)B00 (p)/{1 − s(1 − p)}2 by Lemma 2(ii) and Theorem 5(i) below. Third, if Fˆn ( · ) converges ˆ n (p) converges to p in probability uniformly in p, because for each p, to F ( · ) in probability, then n−1 M P ˆ n (p) is defined as n 1 ˆ ˆ ¨ M t=1 {Fn (Yt ) 0, we have {ω ∈ ˆ n ) < y} ⊂ {ω ∈ Ω : ht (θ ∗ ) < y + |ht (θ ˆ n ) − ht (θ ∗ )|} = {ω ∈ Ω : ht (θ ∗ ) < y + Ω : ht (θ ˆ n ) − ht (θ ∗ )|, |ht (θ ˆ n ) − ht (θ ∗ )| < ε1 } ∪ {ω ∈ Ω : ht (θ ∗ ) < y + |ht (θ ˆ n ) − ht (θ ∗ )|, |ht (θ ˆn) − |ht (θ ˆ n ) − ht (θ ∗ )| ≥ ε1 }. Second, for the ht (θ ∗ )| ≥ ε1 } ⊂ {ω ∈ Ω : ht (θ ∗ ) < y + ε1 } ∪ {|ht (θ ˆ n ) < y} ⊃ {ω ∈ Ω : ht (θ ∗ ) < y − ε1 } \ {ω ∈ Ω : |ht (θ ∗ ) − same y and ε1 , {ω ∈ Ω : ht (θ ˆ n )| > ε1 }. These two facts imply that n−1 Pn 1{h (θ ) 0 and ε2 > 0, there is an n∗ such that if n > n∗ , P P P(n−1 nt=1 1{|ht (θ∗ )−ht (θˆ n )|>ε1 } ≥ δ) ≤ ε2 . This follows because P(n−1 nt=1 1{|ht (θ∗ )−ht (θˆ n )|>ε1 } ≤ P P ˆ n )| > ε1 ) ≤ ε2 , where δ) ≤ (nδ)−1 nt=1 E(1{|ht (θ∗ )−ht (θˆ n )|>ε1 } ) = (δn)−1 nt=1 P(|ht (θ ∗ ) − ht (θ the first inequality follows from Markov’s inequality, and the last inequality follows from the fact that ˆ n )| ≤ Mt kθ ˆ n − θ ∗ k = oP (1) uniformly in t by A2 and A3(ii). It follows that for any |ht (θ ∗ ) − ht (θ ε1 > 0, F (y − ε1 ) + oP (1) ≤ Fˆn (y) ≤ F (y + ε1 ) + oP (1). As ε1 may be chosen arbitrarily small, it follows that Fˆn (y) converges to F (y) in probability as desired.



ˆn = G ¨n + H ¨ n = Gn + Hn + oP (1) by Lemmas 3(i) and 3(ii). Proof of Theorem 3: (i, ii, and iii) G e n ⇒ Ge by Theorem 2(ii). Thus, G ˆ n ⇒ G, e which, together with the continuous Further, Gn + Hn = G 

mapping theorem, implies the desired result.

33

The following Lemmas collect together further supplementary claims needed to prove the weak convergence of the EGR test statistics under the local alternative. As before, we use the notation p = F (y) for brevity and suppose that Rn,i (p) is defined by observations starting from Yn,t+1 , unless otherwise noted. L EMMA B1: Given conditions A1, A2(i), A3, A5, and H`1 , P − 2j (i) for each y and k = 2, 3, ..., E[1{Yn,t+k
{1 − F (y)}). In what follows, we assume that p ∈ J, unless otherwise noted. L EMMA B3: Given conditions A1, A2(i), A3, A5, and H`1 , R∞ (i) supy |α(y)| ≤ ∆, where α(y) := y D(y, x)dF (x); (ii) for each p and k = 1, 2, ..., (15) holds, and rk (p, Yn,t ) = OP (n−1 ) uniformly in p; (iii) for each p and k = 1, 2, ..., (16) holds, and rn,k (p) = O(n−1 ) uniformly in p; (iv) for each k = 1, 2, ..., hn,k (p) → hk (p) and rn,k (p) → rk (p) uniformly in p a.s. −P, qn (p)); (v) for each p ∈ I such that p > n1 , if we let pen := F (e en,i (p) = k|e P(R pn ) = (1 − pen )k−1 pen + n−1/2

rn,k (e pn ) hn,k (e pn ) + ; −1 −1 Fn (F (e pn )) Fn (F (e pn ))

(vi) for each (p, s) ∈ I × S such that p > n1 ,   √ se pn ν(e pn , s) en,i (p) R n E[s |e pn ] − = + oP (1), 1 − s(1 − pen ) Fn (F −1 (e pn )) where ν(p, s) :=

ps2 (1−s)w(p) {1−s(1−p)}2

+

s(1−s) {1−s(1−p)}

R F −1 (p) −∞

(19)

(20)

C(p, y)dF (y).

L EMMA B4: Given conditions A1, A2(i), A3, A5, and H`1 , if Rn,i (p) is defined by observations starting from Yn,t+1 and p > n1 , then (i) for each p and k = 1, 2, ..., P(Rn,i (p) = k|Fn,t ) = p(1 − p)k−1 + n−1/2 p−1 hk (p) + p−1 rk (p) + n−1 bk (p, Yn,t−1 ) + oP (n−1 ), (21)

34

where bk (p, Yn,t−1 ) := p−1

Ry

−∞ hk (p, x)dD(x, Yn,t−1 )

− p−2 hk (p)D(y, Yn,t−1 );

(ii) for each p and k = 1, 2, ..., P(Rn,i (p) = k) = p(1 − p)k−1 + n−1/2 p−1 hk (p) + p−1 rk (p) + R n−1 bk (p) + o(n−1 ), where bk (p) := bk (p, z)dF (z). L EMMA B5: Given conditions A1, A2(i), A3, A5, and H`1 , if Rn,i (p) is defined by observations starting from Yn,t+1 and p > n1 , then (i) for each p and k, m = 1, 2, ..., P(Rn,i (p) = k|Fn,t−m )−P(Rn,i (p) = k) = n−

m+1 2

Bk,m (p, Yn,t−m )

− m+1 2

), where Bk,1 (p, Yn,t−1 ) := bk (p, Yn,t−1 ) − bk (p), and for m = 2, 3, ..., Bk,m (p, Yt−m ) := +oP (n R R R m+1 R ... bk (p, z)dD(z, x1 )...dD(xm−2 , Yn,t−m ) − n− 2 ... bk (p, z)dD(z, x1 )...dD(xm−2 , x)dF (x); (ii) for each p and k, `, m = 1, 2, ..., P(Rn,i (p) = k|Rn,i−m (p) = `) = P(Rn,i (p) = k) + OP (n−(m+1)/2 ); (iii) for each p and k, `, m = 1, 2, ..., P(Rn,i (p) = k, Rn,i−m (p) = `) = P(Rn,i (p) = k)P(Rn,i−m (p) = `) + O(n−(m+1)/2 ). L EMMA B6: Given conditions A1, A2(i), A3, A5, and H`1 , for each (p, s), (i) E[Wn (p, s)|e pn ] = ν(p, s) + oP (1); (ii) E[Hn (p, s)|e pn ] = −ps(1 − s)Q(F −1 (p))/{1 − s(1 − p)}2 + oP (1); (iii) E[Wn (p, s)2 |e pn ] = p2 s2 (1 − p)(1 − s)2 /{1 − s(1 − p)}2 {1 − s2 (1 − p)} + ν(p, s)2 + oP (1); (iv) E[Hn (p, s)2 |e pn ] = p3 s2 (1 − p)(1 − s)2 /{1 − s(1 − p)}4 + p2 s2 (1 − s)2 Q(F −1 (p))2 / {1 − s(1− p)}4 + oP (1). L EMMA B7: Given conditions A1, A2(i), A3, A5, and H`1 , for each (p, s), fn (p) R en,i (p) e fn (p, s) = ν(p, s)+oP (1), where W fn (p, s) := n−1/2 PM −E[sRn,i (p) |e pn ]); (i) Wn (p, s)−W i=1 (s (ii) E[Sn,i (p, s)|Fn,t−m ] = OP (n−

m+1 2

), where Sn,i (p, s) := sRn,i (p) − E[sRn,i (p) ], and Rn,i (p) is

the run defined by observations starting from Yn,t+1 . A

e n (p, s) ∼ N (µ(p, s), s2 p2 (1− L EMMA B8: Given conditions A1, A2(i), A3, A5, and H`1 , for each (p, s), G s)4 (1 − p)2 /[{1 − s(1 − p)}4 {1 − s2 (1 − p)}]). L EMMA B9: Given conditions A1, A2(i), A3, A5, and H`1 , (i) Wn − ν ⇒ G; and (ii) Hn − (µ − ν) ⇒ H. Remark: (a) For brevity, we omit deriving the asymptotic covariance structure of Wn and Hn under H1` as e n (p, s). this can be obtained in a manner similar to that obtaining the asymptotic variance of G (b) Given the fact that G and H are in C(J × S), they are tight, so that lemma 1.3.8(ii) of van der Vaart and Wellner (1996, p. 21) implies that Wn and Hn are tight.

35

Proof of Theorem 6: Given the weak convergence in Lemma B8, the desired result follows by the tightness implied by Lemma B9(i) (see Remark(b)) and the fact that (Wn , Hn )0 is tight by lemma 1.4.3 of van der 

Vaart and Wellner (1996, p. 30).

References A NDREWS , D. (1993): “Tests for Parameter Instability and Structural Change with Unknown Change Point,” Econometrica, 61, 821–856. A NDREWS , D. (2001): “Testing When a Parameter is on the Boundary of the Maintained Hypothesis,” Econometrica, 69, 683–734. A NDREWS , D. AND P LOBERGER ,W. (1994): “Optimal Tests When a Nuisance Parameter is Present only under the Alternative,” Econometrica, 62, 1383–1414. BAI , J. (1996): “Testing for Parameter Constancy in Linear Regressions: an Empirical Distribution Function Approach,” Econometrica, 64, 597–622. B ICKEL , P. AND W ICHURA , M. (1971): “Convergence Criteria for Multiparameter Stochastic Processes and Some Applications,” The Annals of Mathematical Statistics, 42, 1656–1670. B IERENS , H. (1982): “Consistent Model Specification Tests,” Journal of Econometrics, 20, 105–134. B IERENS , H. (1990): “A Consistent Conditional Moment Test of Functional Form,” Econometrica, 58, 1443–1458. B IERENS , H. AND P LOBERGER , W. (1997): “Asymptotic Theory of Integrated Conditional Moment Tests,” Econometrica, 65, 1129–1151. B ILLINGSLEY, P. (1968, 1999): Convergence of Probability Measures. New York: Wiley. B RETT, C. AND P INKSE , J. (1997): “Those Taxes Are All over the Map! A Test for Spatial Independence of Municipal Tax Rates in British Columbia,” International Regional Science Review, 20, 131–151. B ROWN , R., D URBIN , J. AND E VANS , J. (1975): “Techniques for Testing the Constancy of Regression Relationships over Time,” Journal of the Royal Statistical Society, Series B, 37, 149–163. C HAN , N.-H., C HEN , J., C HEN , X., FAN , Y., AND P ENG , L. (2009): “Statistical Inference for Multivariate Residual Copula of GARCH Models,” Statistica Sinica, 10, 53–70.

36

C HEN , X. AND FAN , Y. (2006): “Estimation and Model Selection of Semiparametric Copula-Based Multivariate Dynamic Model under Copula Misspecification,” Journal of Econometrics, 135, 125–154. C HU , J., H ORNIK , K., AND K UAN , C.-M. (1995a): “MOSUM Tests for Parameter Constancy,” Biometrika, 82, 603–617. C HU , J., H ORNIK , K., AND K UAN , C.-M. (1995b): “The Moving-Estimates Test for Parameter Stability,” Econometric Theory, 11, 699–720. C RAINICEANU , C. AND VOGELSANG , T. (2007): “Nonmonotonic Power for Tests of a Mean Shift in a Time Series,” Journal of Statistical Computation and Simulation, 77, 457-476. DAVIES , R. (1977): “Hypothesis Testing When a Nuisance Parameter is Present Only under the Alternative,” Biometrika, 64, 247–254. DAVYDOV, Y. (1973): “Mixing Conditions for Markov Chains,” Theory of Probability and Its Applications, 18, 312–328. DARLING , D. (1955): “The Cram´er–Smirnov Test in the Parametric Case,” The Annals of Mathematical Statistics, 28, 823–838. D ELGADO , M. (1996): “Testing Serial Independence Based on the Empirical Distribution Function,” Journal of Time Series Analysis, 17, 271–286. D ENG , A. AND P ERRON , P. (2008): “A Non-local Perspective on the Power Properties of the CUSUM and CUSUM of Squares Tests for Structural Change,” Journal of Econometrics, 142, 212-240. D IEBOLD , F., G UNTHER , T., AND TAY, A. (1998): “Evaluating Density Forecasts with Applications to Financial Risk Management,” International Economic Review, 76, 967–974. D ODD , E. (1942): “Certain Tests for Randomness Applied to Data Grouped into Small Sets,” Econometrica, 10, 249–257. D ONSKER , M. (1951): “An Invariance Principle for Certain Probability Limit Theorems,” Memoirs of American Mathematics Society, 6. D UFOUR , J.-M. (1981): “Rank Tests for Serial Dependence,” Journal of Time Series Analysis, 2, 117–128. D URBIN , J. (1973): “Weak Convergence of the Sample Distribution Function When Parameters are Estimated,” The Annals of Statistics, 1, 279–290. FAMA , E. (1965): “The Behavior of Stock Market Prices,” The Journal of Business, 38, 34–105. 37

FAN , Y. AND L I , Q. (2000): “Kernel-Based Tests Versus Bierens’ ICM Tests,” Econometric Theory, 16, 1016–1041. F ELLER , W. (1951): “The Asymptotic Distribution of the Range of Sums of Independent Random Variables,” The Annals of Mathematical Statistics, 22, 427–432. G OODMAN , L. (1958): “Simplified Runs Tests and Likelihood Ratio Tests for Markov Chains,” Biometrika, 45, 181–97. G RANGER , C. (1963): “A Quick Test for Serial Correlation Suitable for Use with Nonstationary Time Series,” Journal of the American Statistical Association, 58, 728–736. G RENANDER , U. (1981): Abstract Inference. New York: Wiley. H ALLIN , M., I NGENBLEEK , J.-F., AND P URI , M. (1985): “Linear Serial Rank Tests for Randomness against ARMA Alternatives,” Annals of Statistics, 13, 1156–1181. H ECKMAN , J. (2001): “Micro Data, Heterogeneity, and the Evaluation of Public Policy: Nobel Lecture,” The Journal of Political Economy, 109, 673–746. H ENZE , N. (1996): “Empirical Distribution Function Goodness-of-Fit Tests for Discrete Models,” Canadian Journal of Statistics, 24, 81–93. H ONG , Y. (1999): “Hypothesis Testing in Time Series via the Empirical Characteristic Function: A Generalized Spectral Density Approach,” Journal of the American Statistical Association, 84, 1201–1220. H ONG , Y. AND W HITE , H. (2005): “Asymptotic Distribution Theory for Nonparametric Entropy Measures of Serial Dependence,” Econometrica, 73, 837–901. JAIN , N. AND K ALLIANPUR , G. (1970): “Norm Convergent Expansions for Gaussian Processes in Banach Spaces,” Proceedings of the American Mathematical Society, 25, 890–895. J OGDEO , K. (1968): “Characterizations of Independence in Certain Families of Bivariate and Multivariate Distributions,” Annals of Mathematical Statistics, 39, 433-441. K ARLIN , S. AND TAYLOR , H. (1975): A First Course in Stochastic Processes. San Diego: Academic Press. KOCHERLAKOTA , S. AND KOCHERLAKOTA , K. (1986): “Goodness-of-Fit Tests for Discrete Distributions,” Communications in Statistics–Theory and Methods, 15, 815–829.

38

K RIVYAKOV, E., M ARTYNOV, G., AND T YURIN , Y. (1977): “On the Distribution of the ω 2 Statistics in the Multi-Dimensional Case,” Theory of Probability and Its Applications, 22, 406–410. K UAN , C.-M. AND H ORNIK , K. (1995): “The Generalized Fluctuation Test: a Unifying View,” Econometric Review, 14, 135–161. L O E` VE , M. (1978): Probability Theory II. New York: Springer-Verlarg. M OOD , A. (1940): “The Distribution Theory of Runs,” The Annals of Mathematical Statistics, 11, 367– 392. P HILLIPS , P. (1998): “New Tools for Understanding Spurious Regressions,” Econometrica, 66, 1299– 1326. P INKSE , J. (1998): “Consistent Nonparametric Testing for Serial Independence,” Journal of Econometrics, 84, 205–231. ¨ P LOBERGER , W. AND K R AMER , W. (1992): “The CUSUM Test with OLS Residuals,” Econometrica, 60, 271–285. ¨ P LOBERGER , W., K R AMER , W., AND KONTRUS , K. (1989): “A New Test for Structural Stability in the Linear Regression Model,” Journal of Econometrics, 40, 307–318. ROBINSON , P. (1991) “Consistent Nonparametric Entropy-Based Testing,” Review of Economic Studies, 58, 437–453. ROSENBLATT, M. (1952) “Remarks on a Multivariate Transform,” Annals of Mathematical Statistics, 23, 470–472. RUEDA , R., P E´ REZ -A BREU , V. AND O’R EILY, F. (1991): “Goodness-of-Fit Test for the Poisson Distribution Based on the Probability Generating Function,” Communications in Statistics–Theory and Methods , 20, 3093–3110. S EN , P. (1980): “Asymptotic Theory of Some Tests for a Possible Change in the Regressional Slope Occurring at an Unknown Time Point,” Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 52, 203–218. S KAUG , H. AND T JØSTHEIM , D. (1996): “Measures of Distance between Densities with Application to Testing for Serial Independence,” Time Series Analysis in Memory of E. J. Hannan, ed. P. Robinson and M. Rosenblatt. New York: Springer-Verlag, 363–377.

39

S TINCHCOMBE , M. AND W HITE , H. (1998): “Consistent Specification Testing with Nuisance Parameters Present Only under the Alternative,” Econometric Theory, 14, 295–324. S UKHATME , P. (1972): “Fredholm Determinant of a Positive Definite Kernel of a Special Type and Its Applications,” The Annals of Mathematical Statistics, 43, 1914–1926. VAN DER

VAART, A. AND W ELLNER , J. (1996): Weak Convergence and Empirical Processes with Ap-

plications to Statistics. New York: Springer-Verlag. WALD , A. AND W OLFOWITZ , J. (1940): “On a Test Whether Two Samples are from the Same Population,” Annals of Mathematical Statistics, 2, 147–162.

40

Table I: A SYMPTOTIC C RITICAL VALUES OF THE T EST S TATISTICS Statistics \ Level p T1,n (S1 )

p T∞,n (S1 )

1%

5%

10%

p = 0.1

0.2474

0.1914

0.1639

0.3

0.5892

0.4512

0.5

0.8124

0.7 0.9

Statistics \ Level

1%

5%

10%

p = 0.1

0.2230

0.1727

0.1420

0.3828

0.3

0.5004

0.3842

0.3225

0.6207

0.5239

0.5

0.6413

0.4886

0.4092

0.9007

0.6841

0.5763

0.7

0.6065

0.4632

0.3889

0.7052

0.5329

0.4478

0.9

0.3066

0.2356

0.1973

p =0.1

0.7454

0.5677

0.4750

p Te1,n (S1 )

p Te∞,n (S1 )

p =0.1

0.7483

0.5683

0.4750

0.3

1.3517

1.0237

0.8582

0.3

1.3239

1.0069

0.8441

0.5

1.6846

1.2818

1.0728

0.5

1.5909

1.1990

1.0091

0.7

1.7834

1.3590

1.4000

0.7

1.5019

1.1360

0.9617

0.9

1.3791

1.0486

0.8839

0.9

0.7912

0.6028

0.5060

s = -0.5

0.3114

0.2439

0.2152

s = -0.5

0.2197

0.1785

0.1587

-0.3

0.1698

0.1330

0.1164

-0.3

0.1124

0.0905

0.0803

-0.1

0.0514

0.0402

0.0351

-0.1

0.0313

0.0254

0.0225

0.1

0.0466

0.0361

0.0315

0.1

0.0262

0.0210

0.0187

0.3

0.1246

0.0957

0.0836

0.3

0.0631

0.0504

0.0446

0.5

0.1769

0.1356

0.1183

0.5

0.0780

0.0625

0.0552

s = -0.5

0.8091

0.6885

0.6160

s = -0.5

0.6229

0.5319

0.4799

-0.3

0.4349

0.3650

0.3267

-0.3

0.3107

0.2668

0.2412

-0.1

0.1282

0.1072

0.0960

-0.1

0.0864

0.0734

0.0670

0.1

0.1124

0.0939

0.0840

0.1

0.0714

0.0604

0.0547

0.3

0.2898

0.2416

0.2154

0.3

0.1718

0.1454

0.1306

0.5

0.3962

0.3304

0.2948

0.5

0.2175

0.1869

0.1684

T1,n (S1 )

0.4571

0.3547

0.3124

Te1,n (S1 )

0.3080

0.2440

0.2187

T∞,n (S1 )

2.2956

1.9130

1.7331

Te∞,n (S1 )

2.0411

1.7101

1.5615

T1,n (S2 )

0.1219

0.0955

0.0836

Te1,n (S2 )

0.0725

0.0590

0.0523

T∞,n (S2 )

0.8091

0.6885

0.6160

Te∞,n (S2 )

0.6229

0.5319

0.4799

s T1,n

s T∞,n

s Te1,n

s Te∞,n

41

Table II: L EVEL S IMULATION AT 5% L EVEL ( IN PERCENT, 10,000 ITERATIONS ) DGP 1.1 Statistics \ n

100

300

500

p = 0.1

4.05

4.71

5.09

0.3

5.02

4.95

4.61

0.5

5.22

4.74

4.87

0.7

5.34

4.90

5.31

0.9

4.35

6.63

6.07

p = 0.1

3.86

4.44

4.49

0.3

3.54

4.65

4.59

0.5

5.42

4.40

5.05

0.7

7.00

5.03

5.08

0.9

4.21

6.27

4.71

s = -0.5

4.36

4.72

4.26

-0.3

4.32

4.77

4.08

-0.1

4.23

4.71

3.83

0.1

4.19

4.64

3.68

0.3

4.01

4.08

3.25

0.5

3.85

3.63

2.55

s = -0.5

4.90

6.07

5.95

-0.3

4.92

6.11

5.86

-0.1

5.15

6.29

6.13

0.1

4.89

6.22

5.94

0.3

5.11

5.76

6.06

0.5

5.05

5.18

5.57

Te1,n (S1 ) e T∞,n (S1 ) Te1,n (S2 )

3.71

4.21

3.60

4.91

6.40

6.65

4.06

4.65

4.12

Te∞,n (S2 )

4.77

5.94

5.74

p (S1 ) Te1,n

p Te∞,n (S1 )

s Te1,n

s Te∞,n

Rn

a

STn

HWn a

4.7

a

5.1

a

6.5

These results are those given in Hong and White (2005). Their number of replications is 1,000.

42

Table III: P OWER S IMULATION AT 5% L EVEL ( IN PERCENT, 3,000 ITERATIONS ) DGP 1.2

DGP 1.3

DGP 1.4

DGP 1.5

Statistics \ n p e p = 0.1 T1,n (S1 )

100

200

100

200

100

200

100

200

20.67

33.17

29.50

45.00

27.53

47.47

5.80

9.27

0.3

33.27

59.37

8.43

10.57

8.77

12.07

9.77

15.40

0.5

36.60

65.17

5.33

5.47

5.20

5.80

18.77

32.93

0.7

34.13

60.30

9.33

11.37

5.37

6.10

76.17

97.33

0.9

23.77

38.07

30.30

51.57

15.40

23.97

75.37

97.10

p Te∞,n (S1 )

s Te1,n

s Te∞,n

6.60

6.67

7.10

6.33

5.03

5.37

3.63

4.73

0.3

10.57

23.97

3.60

5.40

3.70

5.40

5.70

11.00

0.5

24.07

42.87

5.77

6.13

4.90

6.33

16.43

26.70

0.7

29.53

41.67

9.80

6.87

6.93

4.20

72.97

92.77

0.9

22.47

22.63

27.30

32.67

14.43

13.30

72.30

89.87

s = -0.5

62.56

99.83

13.53

90.90

8.40

11.63

81.73

99.30

-0.3

67.70

99.86

14.66

92.06

9.33

12.96

80.30

99.33

-0.1

70.06

99.73

15.66

92.73

10.53

15.30

76.86

98.90

0.1

71.30

99.70

16.46

93.40

11.86

18.10

69.66

97.73

0.3

70.30

99.60

17.30

93.46

13.93

23.50

58.50

94.00

0.5

67.36

99.46

19.93

93.33

18.63

32.80

44.90

82.73

s = -0.5

43.80

79.50

7.83

13.46

5.70

6.10

74.46

98.73

-0.3

49.66

83.96

8.50

13.66

6.60

6.76

71.40

98.03

-0.1

55.63

87.80

9.83

15.33

8.03

9.16

66.43

96.53

0.1

59.70

89.06

11.20

17.66

10.10

14.43

55.06

91.33

0.3

61.40

88.30

14.33

24.86

15.46

25.60

40.76

79.13

0.5

58.40

84.83

22.36

37.23

23.60

46.33

22.43

50.30

Te1,n (S1 ) Te∞,n (S1 )

59.73 27.30

90.93 61.56

11.96 5.83

23.13 10.70

9.40 4.70

13.73 6.46

14.60 18.13

27.33 34.33

Te1,n (S2 ) Te∞,n (S2 )

97.03

99.86

16.36

26.36

75.60

86.06

82.30

92.40

44.03

80.50

7.20

13.60

5.56

7.26

13.76

18.56

13.8

25.4

26.4

52.2

15.0

7.2

59.8

75.4

12.4

22.0

61.2

90.0

27.8

52.0

81.6

98.4

14.0

27.0

37.6

67.6

20.6

35.2

69.6

95.6

Rn

a

STn a HWn a

p = 0.1

a

These results are those given in Hong and White (2005). Their number of replications is 500.

43

Table IV: P OWER S IMULATION AT 5% L EVEL ( IN PERCENT, 3,000 ITERATIONS ) DGP 1.6 Statistics \ n p (S1 ) Te1,n

p Te∞,n (S1 )

DGP 1.8

DGP 1.9

100

200

100

200

100

200

100

200

p = 0.1

1.30

6.83

1.00

5.33

2.77

3.73

27.83

47.70

0.3

9.93

16.33

18.90

33.67

17.73

32.57

39.47

64.23

0.5

8.33

10.87

9.27

11.37

31.77

59.80

43.13

68.00

0.7

30.87

57.17

18.07

28.73

33.60

58.43

39.97

62.13

0.9

26.43

41.27

24.00

39.03

22.37

34.63

26.30

37.50

p = 0.1

3.73

5.73

4.33

4.30

3.77

3.67

9.50

14.53

0.3

6.00

10.47

8.93

16.23

5.83

12.80

22.23

44.33

0.5

8.47

9.90

8.10

9.83

20.47

36.93

34.60

59.23

0.7

33.63

47.20

16.67

18.17

30.03

39.93

37.83

54.90

0.9

25.80

29.60

22.53

22.70

20.63

19.97

24.83

29.80

s = -0.5

25.13

60.93

21.96

51.46

50.13

83.03

56.36

80.73

-0.3

25.13

60.10

24.13

55.93

54.26

86.16

57.43

81.10

-0.1

22.40

56.20

25.16

58.93

55.93

87.53

57.80

81.46

0.1

20.13

52.10

25.10

59.96

56.20

87.26

58.46

81.96

0.3

16.90

43.73

23.10

56.46

53.33

84.90

59.13

82.13

0.5

12.36

32.30

17.53

47.60

48.93

80.23

59.50

82.20

s = -0.5

23.13

54.23

20.80

43.36

40.46

75.83

55.70

80.06

-0.3

22.03

50.46

21.50

46.80

45.56

79.86

57.20

80.83

-0.1

20.50

45.50

23.03

49.93

49.60

82.36

57.83

81.53

0.1

17.43

38.26

21.96

48.36

51.20

82.73

58.56

82.00

0.3

14.16

30.10

20.50

45.70

49.70

80.60

59.33

82.40

0.5

9.83

21.40

14.90

38.26

43.86

71.43

59.53

82.56

Te1,n (S1 ) e T∞,n (S1 ) Te1,n (S2 )

25.90

59.36

19.43

46.10

47.96

81.50

56.33

79.96

22.36

50.56

14.50

29.10

26.36

55.80

50.26

75.60

22.80

56.40

23.63

57.23

54.90

86.40

58.70

81.40

Te∞,n (S2 )

22.80

53.70

19.66

41.50

39.63

76.36

54.60

79.73

31.8

65.2

24.6

80.8

14.2

34.6

60.2

84.0

34.8

72.8

25.8

86.8

13.4

23.8

55.8

79.8

34.0

74.0

25.6

85.4

17.0

26.2

60.8

84.6

s Te1,n

s Te∞,n

Rn

a

STn

a

HWn a

DGP 1.7

a

These results are those given in Hong and White (2005). Their number of replications is 500.

44

Table V: L EVEL S IMULATION AT 5% L EVEL ( IN PERCENT, 10,000 ITERATIONS ) Statistics \ n p e p = 0.1 T1,n (S1 ) 0.3 0.5 0.7 0.9 p e T∞,n (S1 ) p = 0.1 0.3 0.5 0.7 0.9 s e T1,n s = -0.5 -0.3 -0.1 0.1 0.3 0.5 s e s = -0.5 T∞,n -0.3 -0.1 0.1 0.3 0.5 Te1,n (S1 ) Te∞,n (S1 ) Te1,n (S2 ) Te∞,n (S2 ) Mn REn RRn SupWn AvgWn ExpWn RECU SU Mn O(N )LSCU SU Mn

100 4.12 5.12 4.76 5.81 3.71 3.86 5.56 5.17 7.44 3.58 4.37 4.27 3.98 3.90 3.46 3.34 4.42 4.49 4.64 4.41 4.47 4.78 3.69 5.22 3.94 4.87 2.89 7.89 9.28 4.77 5.81 5.34 1.65 2.68

DGP 2.1 300 4.88 5.22 5.51 5.63 6.63 6.38 4.56 5.34 5.70 6.25 4.66 4.82 4.57 4.46 3.95 3.31 5.96 5.73 5.87 5.66 5.53 4.95 4.34 6.65 4.33 5.53 4.43 4.14 4.30 4.41 5.29 5.35 3.07 4.20

45

500 5.02 5.07 5.33 5.19 5.92 6.04 6.10 5.60 4.88 7.43 4.32 4.40 4.06 3.78 3.30 2.72 6.13 6.12 6.27 6.28 5.84 5.21 3.50 6.50 3.98 6.02 4.75 3.11 2.74 4.57 5.10 5.04 3.68 4.01

100 4.25 5.20 5.02 5.14 4.35 4.33 5.68 5.60 6.89 4.14 4.18 4.29 4.18 4.02 3.86 3.50 4.63 4.64 4.62 4.53 4.79 4.78 3.62 5.00 3.71 4.89 3.98 71.90 64.39 2.93 2.60 1.78 3.41 28.91

DGP 2.2 300 5.01 5.12 5.04 4.51 6.87 6.44 4.82 5.06 4.54 6.47 4.71 4.75 4.47 4.29 3.96 3.52 6.16 6.07 5.94 5.65 5.27 4.95 3.90 5.98 4.41 5.62 5.33 70.70 60.78 0.94 1.69 2.35 3.86 30.22

500 5.02 5.23 5.38 5.32 6.02 6.32 6.38 5.77 5.15 7.49 4.21 4.13 3.97 3.67 3.04 2.57 6.12 6.06 6.19 6.04 5.77 5.22 3.46 6.97 4.09 6.25 5.62 31.01 66.15 1.31 2.22 3.13 3.64 55.50

Table VI: P OWER S IMULATION AT 5% L EVEL ( IN PERCENT, 3,000 ITERATIONS ) DGP 2.3

DGP 2.4

DGP 2.5

DGP 2.6

DGP 2.7

Statistics \ n p e p = 0.1 T1,n (S1 )

100

200

100

200

100

200

100

200

100

200

14.20

22.73

18.13

29.60

22.20

39.93

16.70

24.70

35.67

59.00

0.3

19.06

31.86

27.47

48.43

20.53

37.60

7.80

10.37

22.87

39.23

0.5

21.00

37.70

29.03

55.43

21.83

37.77

5.80

7.77

19.43

35.07

0.7

19.23

31.03

29.03

51.70

22.33

38.20

7.67

10.17

27.37

48.37

0.9

10.70

14.66

19.37

32.77

22.63

39.23

17.30

24.07

43.27

71.47

p = 0.1

4.10

4.30

5.90

5.53

5.50

6.80

6.07

5.03

7.57

9.47

0.3

4.43

8.70

7.70

21.80

6.87

16.43

4.77

8.40

7.20

18.10

0.5

11.36

17.56

18.23

33.47

13.80

20.50

5.93

6.50

12.63

20.10

0.7

15.56

16.53

25.27

44.40

18.83

32.30

7.30

9.83

22.47

41.37

0.9

10.23

8.10

17.37

39.73

20.33

44.47

15.27

25.77

40.10

74.80

s = -0.5

28.76

51.46

47.36

83.30

37.53

67.60

9.20

17.23

47.83

76.43

-0.3

33.46

59.10

53.60

86.73

42.40

73.60

10.60

19.63

51.56

81.03

-0.1

39.40

66.50

57.76

89.13

46.46

77.33

11.96

21.56

54.83

82.93

0.1

46.20

73.23

60.73

90.73

48.80

80.13

13.36

23.63

57.36

84.66

0.3

50.73

79.46

61.06

91.00

49.83

81.00

14.53

24.16

58.26

85.16

0.5

58.43

85.13

60.23

90.00

50.03

80.10

15.66

26.00

59.20

84.03

s = -0.5

22.60

43.50

33.90

68.30

22.96

49.46

6.10

10.56

31.66

59.56

-0.3

28.70

53.10

41.16

75.53

28.20

56.63

7.06

12.56

36.50

64.93

-0.1

37.13

62.40

47.10

80.76

33.66

63.53

8.93

14.90

40.60

69.56

0.1

44.30

70.36

51.76

83.30

37.33

67.33

9.96

15.90

43.76

71.96

0.3

51.06

77.03

55.50

84.86

41.23

70.16

11.73

18.43

47.63

72.53

0.5

59.70

83.50

54.63

83.56

43.00

69.96

13.60

22.43

49.13

73.06

Te1,n (S1 ) e T∞,n (S1 ) Te1,n (S2 )

32.73

56.70

46.53

80.13

36.06

66.73

9.46

14.96

45.93

46.86

12.76

24.50

22.13

46.10

15.63

30.06

5.50

7.70

19.06

38.26

42.90

67.73

58.76

89.06

45.83

76.33

12.26

19.83

54.86

84.70

Te∞,n (S2 )

23.93

44.30

35.00

69.13

23.63

49.40

6.03

9.60

29.43

60.00

Mn

97.13

100.0

100.0

100.0

12.93

23.03

1.97

2.70

35.96

61.43

REn

99.16

100.0

100.0

100.0

100.0

100.0

94.96

100.0

79.50

89.13

RRn

93.33

100.0

99.97

100.0

99.96

100.0

87.90

99.93

69.43

81.73

p Te∞,n (S1 )

s Te1,n

s Te∞,n

SupWn

98.26

100.0

100.0

100.0

100.0

100.0

85.63

97.86

94.89

99.03

AvgWn

98.26

99.96

100.0

100.0

100.0

100.0

73.76

92.60

94.87

99.02

ExpWn

98.76

100.0

100.0

100.0

100.0

100.0

85.73

97.70

90.19

98.14

RECU SU Mn

87.13

99.40

6.77

9.92

9.30

11.43

14.73

19.10

15.60

20.20

O(N )LSCU SU Mn

99.16

100.0

19.67

24.29

22.43

26.60

15.50

22.03

83.03

92.40

46

Table VII: P OWER S IMULATION AT 5% L EVEL ( IN PERCENT, 3,000 ITERATIONS ) DGP 2.8

DGP 2.9

DGP 2.10

DGP 2.11

DGP 2.12

Statistics \ n p e p = 0.1 T1,n (S1 )

100

200

100

200

100

200

100

200

100

200

18.13

29.13

18.13

27.06

14.86

23.66

4.10

12.33

24.03

45.53

0.3

27.47

49.87

25.50

43.26

20.06

37.43

37.83

63.23

20.53

34.53

0.5

29.03

55.77

26.33

47.90

21.56

41.06

46.40

74.57

18.93

29.27

0.7

29.03

50.97

25.46

45.06

22.70

39.90

35.87

63.67

15.20

24.10

0.9

19.37

32.33

21.10

27.73

16.06

26.13

0.43

0.03

0.93

0.20

p = 0.1

5.90

5.27

5.43

4.93

5.10

4.23

4.23

7.13

24.27

46.60

p Te∞,n (S1 )

s Te1,n

s Te∞,n

0.3

7.70

20.83

7.86

15.83

6.43

14.50

31.27

43.87

23.07

30.87

0.5

18.23

32.73

16.40

28.16

13.10

23.96

37.17

65.13

16.87

26.87

0.7

25.27

44.43

22.43

28.73

19.76

25.56

38.13

56.77

17.60

20.17

0.9

17.37

39.57

19.90

15.06

14.80

14.23

0.40

0.07

0.90

0.53

s = -0.5

46.63

80.86

42.33

75.43

37.63

67.90

64.23

90.10

27.63

47.73

-0.3

53.33

85.20

46.93

80.03

42.76

73.26

64.20

90.30

25.16

44.40

-0.1

57.06

87.26

50.73

83.43

45.43

75.76

61.46

89.30

22.10

40.00

0.1

59.73

88.63

53.53

85.30

47.36

77.46

55.23

86.33

17.76

33.00

0.3 0.5

60.66 60.13

89.20 88.20

54.50 53.70

85.40 83.86

47.53 46.26

77.53 75.40

44.73 31.13

79.60 64.43

11.93 7.06

23.70 13.96

s = -0.5

34.30

66.83

29.60

62.03

26.00

52.93

55.86

84.30

22.80

36.33

-0.3

40.60

73.86

34.66

69.00

31.70

60.06

55.60

84.50

21.10

35.70

-0.1

46.56

80.56

40.30

74.23

37.00

65.90

54.46

83.53

19.46

33.33

0.1

50.96

83.23

45.03

77.23

39.66

69.10

46.73

79.30

15.00

28.23

0.3

53.66

84.20

47.76

78.23

42.50

70.40

37.33

70.43

10.70

21.10

0.5

52.36

81.86

47.96

76.26

41.66

67.43

19.46

47.96

5.80

10.36

Te1,n (S1 ) Te∞,n (S1 )

46.26

81.20

40.16

75.83

35.63

66.20

58.73

88.66

26.10

47.30

21.13

47.66

17.56

41.93

16.06

35.43

47.30

77.06

21.93

39.50

Te1,n (S2 ) e T∞,n (S2 )

57.43

87.66

50.60

84.80

46.83

75.46

59.86

89.90

21.66

39.76

33.00

66.73

28.73

62.60

25.46

53.83

56.13

85.46

22.53

37.47

Mn

100.0

13.67

9.80

12.20

8.10

9.40

0.66

0.86

2.36

2.73

REn

100.0

100.0

78.06

60.00

36.23

3.12

7.77

4.91

80.70

89.16

RRn

99.97

99.96

77.86

59.73

34.33

30.93

8.85

5.24

71.16

79.93

SupWn

100.0

100.0

98.16

63.26

41.86

46.06

3.73

3.46

3.82

3.64

AvgWn

100.0

100.0

92.80

50.16

34.16

32.63

4.35

3.63

3.34

3.15

ExpWn

100.0

100.0

98.86

62.66

43.50

42.40

3.56

3.19

1.76

1.87

RECU SU Mn

6.77

9.20

7.03

9.50

5.46

8.50

0.23

0.39

1.26

3.10

OLSCU SU Mn

19.67

23.36

17.80

20.40

13.63

15.70

0.43

0.39

28.43

35.13

47