• Keine Ergebnisse gefunden

On two Random Models in Data Analysis

N/A
N/A
Protected

Academic year: 2022

Aktie "On two Random Models in Data Analysis"

Copied!
112
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)
(2)
(3)

Analysis

Dissertation

zur Erlangung des

mathematisch-naturwissenschaftlichen Doktorgrades

„Doctor rerum naturalium“

derGeorg-August-Universität Göttingen im PromotionsstudiengangMathematical Sciences derGeorg AugustUniversity School of Science (GAUSS)

vorgelegt von

David James

aus Fulda

Göttingen, 2016

(4)

Prof. Dr. D. RussellLuke(NAM3) Mitglieder der Prüfungskommission

Referent: Prof. Dr. FelixKrahmer1 Korreferent: Prof. Dr. MatthiasHein2

Weitere Mitglieder: Prof. Dr. D. RussellLuke(NAM3) Prof. Dr. GerlindPlonka-Hoch(NAM3) Prof. Dr. AnjaSturm(IMS4)

Prof. Dr. StephanWaack(IFI5)

1Fakultät für Mathematik,

(Lehrstuhl M15, Technische Universität München)

2Fakultät für Mathematik und Informatik

(Fachrichtung Informatik, Universität des Saarlandes)

3Institut für Numerische und Angewandte Mathematik,

4Institut für Mathematische Stochastik,

5Institut für Informatik,

(jeweils Fakultät für Mathematik und Informatik,Georg-August-Universität Göttingen)

Tag der mündlichen Prüfung: 12. Januar 2017

(5)
(6)
(7)

is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable.

(8)
(9)

DouglasAdams, The Restaurant at the End of the Universe

(10)
(11)

In this thesis, we study two random models with various applications in data analysis.

For our first model, we investigate subspaces spanned by biased random vectors. The underlying random model is motivated by applications in computational biology, where one aims at computing a low-rank matrix factorization involving a binary factor. In a random model with adjustable expected sparsity of the binary factor, we show for a large class of random binary factors that the corresponding factorization problem is uniquely solvable with high probability. In data analysis, such uniqueness results are of particular interest; ambiguous solutions often lack interpretability and do not give an insight into the structure of the underlying data. For proving uniqueness in this random model, small ball probability estimates are a key ingredient. Since to the best of our knowledge, there are no such estimate suitable for our application, we prove an extension of the famous Lemma ofLittlewoodandOfford. Hereby, we also discover a connection between the matrix factorization problem at hand and the notion ofSper- ner families.

In the second part of this thesis, we will investigate a model for randomized ultrasonic data in nondestructive testing. Here, we aim at accelerating the data acquisition pro- cess by superposing ultrasonic measurements with random time shifts. To this end, we will first study the effects of randomized ultrasonic measurements in the context of the Synthetic Aperture Focusing Technique (SAFT), a widely used defect imaging method.

By adapting SAFT to our random data model, we will significantly improve its perform- ance for randomized data. In this way, for sparse defects and with high probability, we achieve better defect reconstructions as with SAFT applied to deterministic ultrasonic data acquired in the same amount of time.

(12)
(13)

After three and a half years of working on this thesis, I want to thank everyone who made it possible.

First of all, I would like to thank my advisor FelixKrahmerfor his genuine support and many inspiring discussions. Whenever I felt lost, he was there to help me back on my way. Thank you for your confidence and guidance. Moreover, I am indebted to my co- supervisor RussellLuke, who was very supportive and always had a sympathetic ear.

Many thanks to MatthiasHein for helpful discussions and challenging (but solvable) problems, and for taking the position as co-referee without hesitation.

Moreover, I highly acknowledge the financial support of theFederal Ministry of Edu- cation and Research(BMBF), and the Research Training Group (GRK 2088)Discovering structure in complex data: Statistics meets Optimization and Inverse Problems for travel support and interesting workshops.

I owe my thanks to GerlindPlonka-Hochand all my (former) colleagues from theMath- ematical Signal and Image Processingresearch group. You provided a warm and pleasant atmosphere, interesting talks and discussions in our research seminar, and a regular supply of tea and cake. Besides, I want to thank the research group forApplied Numer- ical Analysis and Optimization and Data Analysis at TU Munich. During my numerous visits, I always felt very welcome.

For providing experimental data and parts of their software, I would like to thank Oliver Nemitzand his colleagues fromSalzgitter Mannesmann Forschung GmbH. In this con- text, I am also very thankful for fruitful and encouraging discussions with MartinSpies and HansRiederfromFraunhofer Institute for Nondestructive Testing IZFP. You all con- tributed to my understanding of nondestructive testing.

I highly appreciate the proofreading by RobertBeinert, SinaBittens, RenatoBudinich, and ChristianKruschel. My special thanks go to RobertBeinert, who was also of great help for typesetting this thesis.

I want to thank my family, my parents, Marion and Peter, and my sister Lisa, for en- couraging and supporting me, especially in the last months of my work on this thesis.

Last but not least, I thank my wife Steffi, and my children, Linus and Ella, for their love, support, and endless patience. You are my true source of inspiration and motivation.

(14)
(15)

Abstract . . . i

Acknowledgement . . . iii

I. Subspaces Spanned by Biased Random Vectors 1 1. Introduction . . . 1

2. From the Span of Binary Matrices toSpernerFamilies . . . 8

3. The Span of Random Binary Matrices and the Lemma of Littlewoodand Offord . . . 17

4. BoundingPk,n andP±,k,n using Cardinality Estimates . . . 32

5. BoundingPk,n andP±,k,n using the LYM-inequality . . . 39

II. Ultrasonic Nondestructive Testing with Random Measurements 53 6. Introduction . . . 53

7. Model . . . 55

8. Synthetic Aperture Focusing Technique . . . 62

9. Iterative Synthetic Aperture Focusing Technique . . . 73

10. Numerical Results . . . 77

Bibliography . . . 89

Curriculum vitae . . . 93

(16)
(17)
(18)
(19)

Subspaces Spanned by Biased Random Vectors

1. Introduction

The Span of Random Binary Matrices

Bernoullirandom matrices have recently gained a lot of attention. Arguably, the most prominent problem in this field is to estimate the probability that ann ×n Bernoulli random matrix is singular asn goes to infinity. There has been tremendous progress in proving the conjecture that this probability is dominated by the probability that two columns or rows coincide [KKS95, BVW10, TV07]. A closely related problem con- cerns the investigation of the span ofBernoulli random matrices. Motivated by an application in neural networks [KS87], this problem was first investigated byOdlyzko in [Odl88]. He found that the probability that the linear span of the columns of a rect- angularN×nBernoullimatrix does not contain any{±1}-vector besides its columns is dominated by the probability of the corresponding event for just three of its columns.

Theorem 1.1 (Odlyzko[Odl88]). Let T be an N × n random matrix whose entries are independent copies of a Bernoulli random variable ϵ with P[ϵ = 1] = 1/2 and P[ϵ =−1]=1/2. If

n ≤ 1− 10 log(N)

! N,

then the probabilityP that there exists a vectorx ∈ Rn with at least two non-vanishing entries such thatT x is a{±1}-vector can be bounded from above by

P ≤ 4 n 3

! 3 4

N

+O* ,

7 10

!N

+ -, asN goes to infinity.

(20)

The result, however, only treats the case where all entries ofT are unbiased, i.e., they attain the values±1with equal probability. Motivated by applications related to matrix factorization (see below), we aim to transfer this result to the case of biasedBernoulli random variables; that is, random variables where the values±1are attained with un- equal probabilities. We will show that the asymptotic behavior, i.e., the dominance by the probability that there exists a linear combination of three columns resulting in a {±1}-vector, carries over to biasedBernoullirandom matrices. The main result of this chapter reads as follows:

Theorem 1.2. LetT be anN ×nrandom matrix whose entries are independent copies of aBernoullirandom variableϵ withP[ϵ = 1]=pandP[ϵ = −1]= 1−p. If there exists δ ∈(0,1)with

min{p,1−p} ≥N(1δ),

then there exists an absolute constantC >0depending only onδ such that for

n ≤ 1− C log(N)

! N,

the probabilityP that there exists a vectorx ∈Rn with at least two non-vanishing entries such thatT x is a{±1}-vector can be bounded from above by

P ≤ 4 n 3

!

(1−p(1−p))N +o

(1−p(1−p))N , asN goes to infinity.

Note that forp = 1/2, we recover the asymptotic behavior of Theorem 1.1. Our result also covers the observation that the probability that there exists a vectorx with two or more non-vanishing entries such thatT x ∈ {±1}N is dominated by the corresponding event for just three columns ofT. To see this, one can check that (1−p(1−p))N is exactly the probability thatT x ∈ {±1}N for a vectorxinRnwith the entries1,1and−1 on its support of length3. We will later see that for vectorsx whose support set is of cardinality larger than4or equal to2, this probability is of higher order.

A NewLittlewood-Offord-type Inequality

The proof of Theorem 1.1 relied on estimating small ball probabilities of the form

P

n

X

j=1

ϵjxj=1

, (1.1)

(21)

where x is a vector of non-vanishing entries andn ≥ 2. In the unbiased case, the probability in (1.1) can be treated via the Lemma ofLittlewood and Offord, which was proven byErdősin [Erd45].

Theorem 1.3 (Littlewood-Offord[Erd45]). Let x ∈ Rn be a vector with minj |xj| ≥ c > 0and letϵj,1 ≤ j ≤ n be independent copies of aBernoullirandom variableϵtaking the values+1and−1with equal probability. Then for any open intervalI of length at most2c, it holds that

P

n

X

j=1

ϵjxj ∈I ≤ n

n/2

!

2n. (1.2)

In contrast to the original result ofLittlewoodandOfford[LO43], which was only optimal up to a logarithmic factor, the estimate in(1.2)is sharp; one can easily verify that the right hand side of (1.2) is indeed achieved. For that, choose all entries ofxto have the same modulusc > 0andIto be the open interval of length2ccentered aty=0 forneven ory =±cfornodd.Erdős’ proof is based on a connection between random sums as in (1.2) andSpernerfamilies in combination withSperner’s Lemma [Spe28]. In the case where all entries ofxare positive, a generalization of the Lemma ofLittlewood andOffordto biased random variables can be proven using the LYM-inequality [Bol65, Lub66, Meš63, Yam54] instead ofSperner’s Lemma. The LYM-inequality was proven by Bollobás,Lubell,Meshalkin, andYamamoto; it is named by the initials of the latter three.

Theorem 1.4 (BiasedLittlewood-Offord[LL70]). Letx ∈Rnbe a real vector with minjxj ≥c > 0, let furtherϵj,1 ≤ j ≤ nbe independent copies of aBernoullirandom variableϵwithP[ϵ =1]=pandP[ϵ = −1]=1−p, and letI ⊂ Rbe an open interval of length at most2c. Then

P Xn

j=1

ϵjxj ∈I ≤ max

0kn

n k

!

pk(1−p)nk. (1.3)

As in Theorem 1.3, the estimate in(1.3)also is sharp for vectorsxwith constant entries, but it is not applicable if one aims to bound a probability like the one in(1.1). This is due to two reasons: While the assumption in Theorem 1.4 that all entries ofxhave a positive sign can easily be dropped in the case ofp = 1/2, the same is not true in the unbiased

(22)

case. Additionally, the problem of estimating the absolute value of the sum in(1.1)can be solved forp = 1/2using Theorem 1.4 via a union bound, but the same method does not give meaningful probability estimates when considering highly biasedBernoulli random variables withpclose to0or1. In order to handle the first issue, the following lemma was proven by Costello et al. in [CV08] usingChebyshev’s inequality.

Lemma 1.5 ([CV08]). Letx ∈ Rn be arbitrary withminj|xj| ≥ c > 0, let furtherϵj, 1≤j ≤nbe independent copies of aBernoullirandom variableϵwithP[ϵ=1]=pand P[ϵ =−1]= 1−pand letIbe an open interval of length at most2c. Then there exists an absolute constantC >0such that

P

n

X

j=1

ϵjxj ∈I ≤ C

√nµ,

whereµ =min{p,q}.

In contrast to the Theorem 1.4, Lemma 1.5 also allows us to treat vectorsxwith varying signs.

Note that this bound is not meaningful for smalln or values ofp which are close to0 or1, due to the constant factorC, which is not sharp. The following tighter variant of Lemma 1.5 can be derived from Theorem 1.4.

Corollary 1.6. Letx ∈Rnwithminj |xj| ≥c >0have exactlyn+strictly positive andn strictly negative entries and letϵj,1≤j ≤nbe independent copies of aBernoullirandom variableϵ with P[ϵ = 1] = p andP[ϵ = −1] = 1−p. LetI ⊂ R be an arbitrary open interval of length at most2c and letn¯ ≤max{n,n+}. Then

P

n

X

j=1

ϵjxj ∈I ≤ max

0kn¯

¯ n k

!

pk(1−p)n¯k.

Corollary 1.6 follows from Theorem 1.4 mainly by conditioning on the random vari- ables whose coefficients have the less frequent sign. Nevertheless, since we did not find this bound in the literature, a proof will be provided in Section 5. Note that Theorem 1.6 still does not give meaningful results when used together with a union bound to estimate(1.1)whenp is close to0or 1. A main result of this chapter is the following symmetric version of Corollary 1.6, which takes into account that, for biasedBernoulli

(23)

random variables, the sum in (1.1) typically does not attain the values±1with equal probability. As we will see, the proof is considerably more involved than the proof of Corollary 1.6.

Theorem 1.7. Letx ∈Rnwithminj |xj| ≥c > 0haven+strictly positive andnstrictly negative entries and letϵj,1≤ j ≤nbe independent copies of aBernoullirandom variable ϵwith P[ϵ = 1] =p andP[ϵ = −1] = 1−p. LetI ⊂ R be an open interval of length at most2cand letn¯ ≤max{n,n+}. Then

P

n

X

j=1

ϵjxj∈I ≤ max

0kn¯

¯ n k

!

pk(1−p)n¯k+pn¯k(1−p)k .

The inequality is tight, as equality is achieved for oddnand vectorsxwith constant entries. While it is neither a strict inequality for evennnor for vectorsx with varying sign, it still gives meaningful estimates for these cases even whenn is small or the Bernoullirandom variablesϵj are extremely biased.

Matrix Factorization with Binary Components

Low rank matrix factorization is an important tool in data analysis, which allows us to represent data as linear combinations of a small number of building blocks, often referred to ascomponents. In matrix factorization with binary components, one aims to factor a given data matrixD ∈RN×ninto the productBA, whereB∈ {0,1}N×ris a binary matrix, andA ∈Rr×n is an arbitrary matrix whose rows sum to1andr ≪ min{N,n}. To be more precise, matrix factorization with binary components considers the problem findB ∈ {0,1}N×r andA∈Rr×n, AT1r =1n,such thatD=BA, (1.4) where 1r, 1n denote the vectors of lengthr,n, respectively, with all entries equal to 1. Motivated by numerous applications, such as blind source separation in wireless communications with binary source signals [Vee97], network inference from gene ex- pression data [LBY+03, TCX12], unmixing of cell mixtures from DNA methylation sig- natures [HAK+12], or clustering with overlapping [BKG+05, SBK03], it gained a lot of attention in recent years. Similar factorization problems involving binary matrices have for example been studied in [SSU03, KB08, MGNR06, ZLDZ07, MMG+08]. Note that, if we additionally demand that all entries ofAare non-negative, the problem(1.4)is an instance of the non-negative matrix factorization problem, see, e.g., [PT94, LS99].

(24)

In [SHL13],Slawski,HeinandLutsikproposed an algorithm to solve this problem by computing the intersection of the affine hull of the data matrixD, i.e.,

aff(D)= 



Dxx ∈Rn,

n

X

j=1

xj =1



, (1.5)

with the set of vertices{0,1}N. Their algorithm provably finds the solution to(1.4)ifA has full rank, the columns ofBare affinely independent, i.e.,∀x ∈Rr,P

jxj =0,Bx =0 implies thatx=0, and the uniqueness condition

aff(B)∩ {0,1}N =

B:,1, . . . ,B:,n (1.6)

is satisfied. Here,B:,j,1≤j ≤ndenotes thejthcolumn ofB. By a direct calculation, we can see that the union of both conditions onBis equivalent to

∄x ∈Rr, Xr

j=1

xj =1, kxk0 ≥2 : Bx ∈ {0,1}N, (1.7)

where kxk0 denotes the number of nonzero entries ofx. Combining this observation with properties of modulated symmetricSperner-2families, a notion that we will in- troduce in Definition 2.14 below, allows us to prove the following result.

Theorem 1.8. LetBbe anN×nrandom matrix whose entries are independent copies of aBernoullirandom variableϵ withP[ϵ = 0] = pandP[ϵ = 1] = 1−p. If there exists δ ∈(0,1)with

min{p,1−p} ≥N(1δ),

then there exists an absolute constantC >0depending only onδ such that for

n ≤ 1− C log(N)

! N, it holds that

P∃x ∈Rr,

r

X

j=1

xj =1, kxk0≥ 2 : Bx ∈ {0,1}N

≤ 4 n 3

!

(1−p(1−p))N +o

(1−p(1−p))N , (1.8) asN goes to infinity.

(25)

As we will see later, Theorem 1.8 is a direct consequence of Theorem 1.2. Together with(1.7), it implies that the matrix factorization problem with binary components can be solved for a large class of random matrices. Note that the parameterpin Theorem 1.8 now also allows us to model sparse matricesBwith just a few non-zero entries, which often occur in practice in the matrix factorization problem(1.4).

Organization of the Chapter

In Section 2, we will first consider deterministic versions of Theorem 1.2 and Theorem 1.8, and show a deep relation between both problems and the notion ofSperner-k fam- ilies, which we will also introduce in that section. In Section 3, we will pass on to the random setting and consider Theorem 1.2 in terms of probabilities involving Sperner families. Afterwards, we aim to bound these probabilities in Section 4 and Section 5, which also contains the proofs of Theorem 1.6 and Theorem 1.7.

Notation

Throughout this chapter,[n]will denote the integers from1ton, and2nwill denote the power set of[n]. The symmetric differenceA∆B of two setsA,B ⊂ [n]is defined by A∆B := (A\B)∪(B\A). For two setsA,B, we will writeA∪·B instead of the union A∪B if the two sets are disjoint. Similarly, forN setsAi, S

·

i[N]Ai will denote the the union of the setsAi if they are pairwise disjoint. SubsetsA ⊂ 2n will be referred to as families and will be denoted by calligraphic capital letters. When dealing with matrices, we denote, for anN ×nmatrixT and arbitrary setsR ⊂ [N]andC ⊂ [n], by TR,C the submatrix ofT which arises by restrictingT to the rows indexed byRand the columns indexed byC. Furthermore, we will writeT:,C instead ofT[N],C andTR,:instead ofTR,[n], and we will denote byTi,j the entry ofT with row indexi ∈ [N]and column indexj ∈[n]. The restriction of ann-dimensional vectorxto its entries indexed by a set J ⊂ [n]will be denoted byxJ. For an arbitraryn-dimensional vectorx, we will denote bysupp(x) ⊂ [n]the set containing all indicesj ∈[n]such that|xj| >0and refer to it as the support ofx. We call a vectorx ∈Rns-sparseif|supp(x)| =s; theℓ0-norm ofx is defined via kxk0 := |supp(x)|. The sign patternsgn(x)of an arbitraryx ∈ Rn with non-vanishing entries will refer to the vector in{±1}ndefined by

sgn(x)j = 

1 xj >0,

−1 xj <0.

For two sequences (an)nN and (bn)nN, we write an = o(bn) if an/bn → 0 asn →

∞. In order to to highlight that two random variablesX,Y have the same probability distribution, we will writeX ∼Y.

(26)

2. From the Span of Binary Matrices to Sperner Families

In this section, we will establish the connection between the deterministic version of Theorem 1.2 forN ×n matricesT with values in {±1} and a special class of families A⊂ 2n, which was first studied bySpernerin [Spe28]. Our result generalizes the con- nection betweenSpernerfamilies and random sums as in Theorem 1.3 and Theorem 1.4, which was discovered byErdősin [Erd45], in multiple ways. Later, our generalization will allow us to prove our main results, Theorem 1.2 and Theorem 1.7, and also yields a uniqueness condition for matrix factorization with binary components. We first recall the definition of aSperner-kfamily.

Definition 2.1 (Sperner-kfamily [Spe28]). We call any familyA ⊂2naSperner-k familyif there does not exist a chain ofk+1setsA1, . . . ,Ak+1Awith

A1 (A2 (· · · (Ak+1. (2.1) For notational brevity and historical reasons,Sperner-1families will simply be called Spernerfamilies.

Example 2.2. The family A1 =

({1,2},{2,3},{1,3})

⊂ 23 is a Sperner family, since no set contained inA1 is a proper subset of another set contained inA1. The family A2= A1∪(

{1,2,3})

is not aSpernerfamily. It holds for instance that{1,2} ⊂ {1,2,3}. A2is however aSperner-2family, as the longest chain of inclusionsA1 (· · · (Akwith setsA1, . . . ,AkA2is of lengthk=2.

WhileSperneronly consideredSpernerfamilies fork = 1,Sperner-k families are well known to connect to this basis case via the following lemma.

Lemma 2.3. A familyA ⊂ 2n is aSperner-k family if and only if it is the union ofk Spernerfamilies.

Proof. If the familyAis the union ofk Spernerfamilies but not aSperner-k family, there must exist a chain ofk+1subsetsA1, . . . ,Ak+1Awith

A1 (A2( · · ·(Ak+1,

and the pigeonhole principle implies that at least two of them have to be contained in the sameSperner-1family, which yields a contradiction.

(27)

For the reverse implication, letA = {A1, . . . ,A} be aSperner-k family. Without loss of generality we may assume that |Aj| ≤ |Aj+1| forj ∈ [l −1]. We will now construct k Sperner-1families in an iterative way. Let A10, . . . ,Ak0be empty families. For each j ∈[ℓ]and eachi ∈[k]define iteratively

Aij =

Aij1 if

∃B ∈Aij1s. th.B (Aj

, Aij1∪ {Aj} if

∄B ∈Aij1 s. th.B (Aj

Aj <Aij for1≤i ≤i−1 . By construction, each of the familiesAi,i ∈ [k]is aSperner-1family. We now claim thatA=Sk

i=1Ai. Suppose for contradiction that there exists a setBk+1Awhich is not assigned to any familyAi,i ∈[k]. Hence, there existsBkAksuch thatBk (Bk+1. Since for arbitraryi ∈[k],i ,1and arbitraryBi ∈Ai, there must exist a setBi1 ∈Ai1 withBi1 (Bi, we can therefore construct a chain of inclusions as in(2.1), contradicting the assumption thatAis aSperner-kfamily. This completes the proof.

Note that Lemma 2.3 also implies that everySperner-kfamily also is aSperner-ℓfamily forℓ ≥ k. To establish the connection betweenSpernerfamilies and binary matrices, we will introduce the following operation.

Definition 2.4. For anyA⊂[n]and anyξ ∈ {±1}n, define themodulation Aξ = (

j|j ∈A,ξj =1)

∪(

j|j<A,ξj =−1)

⊂ [n];

for any familyA ⊂2n,A={A1, . . . ,Am}, denote byAξ the family given by Aξ = (

Aξ1, . . . ,Aξm) .

Remark 2.5. By definition, it holds for arbitraryξ ∈ {±1}nthat

ξ ={j|ξj =−1}.

Also note that for any A ⊂ [n], it holds thatA1 = Ac, where −1 denotes the n- dimensional vector with constant entries equal to −1. For this reason, we will denote byA1the family of all sets complementing the sets ofA.

Remark 2.6. By definition, the operation (·)ξ for a sign-pattern ξ ∈ {±1}n is union compatible, i.e., for two familiesA,B⊂ 2n, it holds that

(A∪B)ξ =AξBξ

(28)

and

(A\B)ξ =Aξ \Bξ.

Example 2.7. For the family A1 ⊂ 23 as in Example 2.2 andξ = (−1,1,−1)T, it holds that

A1ξ = (

{1,2}ξ,{2,3}ξ,{1,3}ξ)

= (

{2,3},{1,2},∅) ,

which is not aSpernerfamily, since∅ ⊂ {1,2}.

We now present some useful properties of the operation defined in Definition 2.4.

Proposition 2.8. LetA⊂[n]andξ ∈ {±1}nbe arbitrary. Then Aξ =A∆{j|ξj =−1}.

Consequently, forA,B ⊂[n]and anyξ ∈ {±1}n, it holds that Aξ∆Bξ =A∆B

and

(Aξ)ν =Aξ ν = (Aν)ξ, whereξν ∈ {±1}ndenotes the entrywise product ofξ andν.

Remark 2.9. It is a direct consequence of Definition 2.4, that for arbitrary sign pattern ξ ∈ {±1}n, all properties in Proposition 2.8 concerning a setA ⊂ [n] can be lifted to analogous properties of a familyA⊂ 2n.

Proof of Proposition 2.8. First, we observe that, for anyA ⊂ [n]andξ ∈ {±1}n, the setAξ can be written as

Aξ = (

j|j ∈A,ξj =1)

∪(

j|j<A,ξj =−1)

=A\ {j|ξj =−1} ∪ {j|ξj =−1} \A

=A∆{j|ξj =−1}=A∆Nξ, where

Nξ := {j|ξj =−1}

for arbitrary ξ ∈ {±1}n. This establishes the first claim of the proposition. By the associativity and commutativity of the symmetric difference, it follows for anyA,B ⊂ [n]andξ ∈ {±1}n that

Aξ∆Bξ = (A∆Nξ)∆(B∆Nξ)=A∆B∆(Nξ∆Nξ)=A∆B∆∅=A∆B,

(29)

which establishes the second claim of the proposition. Furthermore, we observe that for anyA⊂[n]and anyξ,ν ∈ {±1}nit holds that

(Aξ)ν =(A∆Nξ)∆Nν =A∆(Nξ∆Nν)

=A∆{j|eitherξj =−1orνj =−1}=A∆{j|(ξν)j =−1}

=A∆Nξ ν =Aξ ν.

Since the entrywise product is commutative, the last claim now follows by interchan-

ging the roles ofξ andν.

Based on the notion of modulation, we now introduce a variant ofSperner-kfamilies, which will play an essential role in the proof of our main results.

Definition 2.10 (SymmetricSperner-k family). LetA ⊂ 2n be arbitrary. For even k, we callAasymmetricSperner-kfamily ifAadmits a decomposition of the form

A=

k/2

[

l=1

AlAl1

,

whereAl ⊂2nis aSpernerfamily for eachl ∈[k/2].

For oddk, we callAasymmetricSperner-k family if

A=A0

k/2

[

l=1

AlAl1 ,

whereAl ⊂ 2n is aSpernerfamily for each0≤l ≤ ⌊k/2⌋andA0additionally satisfies thatA0 =A01.

Note that by Lemma 2.3, every symmetricSperner-k family is indeed aSperner-k family.

Example 2.11. Considering again the familyA1 = (

{1,2},{2,3},{1,3})

⊂ 23defined in Example 2.2, it follows that

A3=A1A11 =(

{1,2},{2,3},{1,3})

∪(

{3},{1},{2}) is a symmetricSperner-2family.

The next lemma establishes a link between{±1}-valued matrices andSpernerfamilies.

It generalizes the observations ofErdősin [Erd45].

(30)

Lemma 2.12. LetTbe anN×nbinary matrix with values in{±1}andx ∈Rn,minj |xj| ≥ c > 0be a vector such thatT x ∈VN, whereV is the union ofk open intervals of length at most2c. LetA ⊂2nbe the family containing the sets

Ai ={j|Ti,j =1}, i∈[N]. (2.2) ThenAsgn(x)is aSperner-kfamily. IfV additionally is symmetric, i.e.,V =−V, it follows thatAsgn(x)is a subfamily of a symmetricSperner-kfamily.

Proof. Setξ := sgn(x). We may assume thatk < N, since otherwise the first asser- tion of the lemma is trivial. Suppose for contradiction thatA = {Aξ1, . . . ,Anξ} is not a Sperner-k family. Then, after a possible permutation of the indices, it must hold that Aξ1 (Aξ2 (· · · (Aξk

+1. We define for1≤i≤k+1, yi :=(T x)i =*.

, X

jAi

xj − X

jAci

xj+/

- ∈V. (2.3)

Since all entries ofx which make a positive contribution to this sum are contained in Aξi and all entries ofxwhich make a negative contribution to this sum are contained in (Aci)ξ =Aiξ, one can write

yi =*.. ,

X

jAξi

|xj| − X

jAiξ

|xj|+// -

∈V.

Recall thatV is the union ofk intervals of length at most2c. Therefore, by the pigeon- hole principle, there must bev <wsuch thatyv,yware contained in the same interval.

This, in turn, implies that |yv −yw| < 2c. AsAvξ ( Aξw, there exists a non-empty set S ⊂[n]\Aξv such thatAξv ∪S =Aξw, and thusAwξ ∪S =Avξ. It follows that

yw = X

jAwξ

|xj| − X

jAwξ

|xj|

= X

jAvξ

|xj|+ X

jS

|xj| − X

jAvξ

|xj|+ X

jS

|xj|

=*.. ,

X

jAξv

|xj| − X

jAvξ

|xj|+// -

+2X

jS

|xj|

=yv+2X

jS

|xj|,

(31)

which translates to

yw −yv =2X

jS

|xj| ≥ 2|S|min

j |xj| ≥2c,

contradicting our finding that |yv −yw| < 2c. The family Aξ therefore must be a Sperner-k family, which proves the first part of the lemma.

It remains to show that, ifVis symmetric, thenAξ is contained in a symmetricSperner- kfamily. To see this, letY = {y1, . . . ,yk}be the set of the centers of thekopen intervals ofV. We can assume without loss of generality that they are distinct and thatY =−Y is a symmetric set. Therefore, there exists a permutationπ of[k]such thatyi =−yπ(i) andπ has at most one fixpoint, i.e., a1-cycle corresponding toyi = 0. All remaining cycles have length2. Hence, ifk is even, one can write

V =

k/2

[

=1

V∪(−V), (2.4)

where the setsV,ℓ ∈ [k/2]are open intervals of length at most2c whose centers are contained in the positive real axis. Ifk is odd, one can write

V =V0

k/2

[

=1

V∪(−V), (2.5)

whereV0is an open interval of length at most2ccentered at zero and the setsV,ℓ∈[k]

are intervals of length at most2c whose centers are contained in the positive real axis.

The same decomposition can now be applied to the matrixT, and forℓas in(2.4)or (2.5), we denote byT(ℓ) the submatrix ofT containing the maximum number of rows t ofT such thatht,xi ∈ V. By permuting the rows of each of the matricesT(ℓ) and possibly adding further rows, we may assume thatT(ℓ) = −T(ℓ). Denote byA(ℓ)the family which arises by applying the construction described in(2.2) to the matrixT(ℓ). We now claim that−T(ℓ) = T(ℓ) also implies thatA = A1. This directly follows from the observation that, for arbitraryU ⊂RandB ⊂[n], multiplying

*., X

jB

xj − X

jBc

xj+/

- ∈U,

by(−1)corresponds to exchanging the roles ofBandBc =B1.

The first part of the proof now implies thatAξ is aSperner-1family for eachℓ∈ ⌊k/2⌋ and ξ = sgn(x). Since we only applied row permutations or added further rows in order to construct the matricesT(ℓ)fromT, the decomposition in(2.4),(2.5)resp., now

(32)

translates to

A ⊂

k/2

[

=1

A∪(A) =

k/2

[

=1

A∪(A1), in the case wherek is even, and

A⊂ A0

k/2

[

=1

A∪(A1),

in the case wherekis odd. As the operation(·)ξ is union compatible, see Remark 2.6,

this completes the proof.

The assertions of Lemma 2.12 can also be transferred to binary matrices with values in{0,1}.

Lemma 2.13. LetBbe anN×nbinary matrix with values in{0,1}andx ∈Rn,minj |xj| ≥ c > 0be a vector such thatBx ∈VN, whereV is the union ofk open intervals of length at mostc. LetA⊂2nbe the family containing the sets

Ai = (

j|Bi,j =1)

, i∈[N]. (2.6)

ThenAsgn(x)is aSperner-kfamily. If, in addition, the set

V:=V − 1 2

Xn

j=1

xj (2.7)

is symmetric, it follows thatAsgn(x)is a subfamily of a symmetricSperner-kfamily.

Proof. LetT be theN ×mmatrix defined by Ti,j := 

−1 Bi,j =0, 1 Bi,j =1.

Since, in matrix form,

B = 21(1N×n+T), (2.8)

where1N×n is theN ×nmatrix where all entries are equal to1, for arbitraryx ∈Rn it follows that

(Bx)i = 1 2*.

,

n

X

j=1

xj +Ti,:x+/

- .

(33)

With2V := {2v|v ∈V}, which is the union ofk open intervals of length at most2c, it therefore holds for arbitraryi∈[N]that

(T x)i ∈2V ⇔ (Bx)i ∈*.

, V +

Xn

j=1

xj+/

-

. (2.9)

The result now directly follows from Lemma 2.12 by noting that2V on the right hand

side of(2.9)is symmetric if and only if(2.7)holds.

We will now introduce a second variant ofSpernerfamilies.

Definition 2.14. A familyA⊂ 2nis amodulatedSperner-kfamilyif there exist a sign patternξ ∈ {±1}nand aSperner-kfamilyB ⊂2nsuch thatA=Bξ. IfBis a symmetric Sperner-k family, we callAamodulated symmetricSperner-kfamily.

We will now prove a result which lays the foundation to the proof of Theorem 1.2. It is based on the following definition.

Definition 2.15. For arbitraryA⊂ 2n andJ ⊂[n], letA⊓J ⊂2J be defined as A⊓J ={A∩J|A∈A}.

Corollary 2.16. LetT be anN ×nbinary matrix with values in{±1}andA ⊂ 2n be the family defined in Lemma 2.12. If there exist ans-sparse vectorx ∈Rnwith

jsuppmin(x)|xj| ≥c

and a setV ⊂ R which is the union ofk open intervals of length at most2c such that T x ∈Vn, then there exists a setJ ⊂[n],|J|=ssuch thatA⊓J is a modulatedSperner-k family. IfV is symmetric, it follows thatA⊓Jis a modulated symmetricSperner-kfamily.

Proof. We will present the proof only for the symmetric case; the general case is similar.

Suppose that there exist ans-sparsex ∈Rn with

jsuppmin(x)|xj| ≥c

(34)

and a symmetric setV ⊂ Rwhich is the union ofkopen intervals of length at most2c such thatT x ∈Vn. Set J = supp(x). ThenT:,JxJ ∈ VN and Lemma 2.12 implies that (A⊓J)ξJ is a subfamily of a symmetricSperner-kfamily. Since((A⊓J)sgn(xJ))sgn(xJ)= A⊓J (Proposition 2.8), it follows thatA⊓J is a modulated symmetric Sperner-kfamily.

Remark 2.17. Note that for arbitrarys-sparsex ∈Rn, any discrete setV with|V|=k is a subset of

[

yV

(y−c,y+c),

wherec = minj|xj|. Therefore, the assertion of Corollary 2.16 also holds fors-sparse x ∈Rnand discrete setsV with|V|=k.

With Corollary 2.16, we are able to derive the following condition implying(1.7). It can be read directly from the matrixBwithout considering the affine hull.

Theorem 2.18. LetB be anN ×nbinary matrix with values in{0,1}and letA ⊂ 2n be the family containing the sets

Ai = (

j|Bi,j =1)

, i ∈ {1, . . . ,N}. If none of the families

{A⊓J,|J ⊂ {1, . . . ,n},2≤ |J| ≤n} is a subfamily of a modulated symmetricSperner-2family, then

∄x ∈Rr, Xr

j=1

xj =1, kxk0≥ 2 : Bx ∈ {0,1}N. (2.10)

Proof. We can write the affine hullaff(B)as

aff(B) =



Bxx ∈Rn, Xn

j=1

xj =1

= [n

s=0

affs(B),

where

affs(B)= 



Bxx ∈Rn, kxk0=s,

n

X

j=1

xj =1

 .

(35)

In order to show(2.10), it suffices to prove

affs(B)∩ {0,1}N =∅ for allswith2≤s ≤n. (2.11) Lets ≥ 2be arbitrary, and letT ∈ {±1}N×n as in the proof of Lemma 2.13. Corollary 2.16 together with Remark 2.17 now implies that there do not exist ans-sparse vector x ∈Rnand a symmetric setV with|V| =2such thatT x ∈VN. In particular,

∅= {T x|x ∈Rn, kxk0=s} ∩ {±1}N =affs(B)∩ {0,1}N;

the last equality holds by(2.9). Asswas arbitrary, this now implies(2.11)and completes

the proof.

Remark 2.19. From now on, we will only consider {±1}-valued binary matrices. By Lemma 2.13, all results for{±1}-valued binary matrices carry over to{0,1}-valued binary matrices by adjusting the right hand sides accordingly.

3. The Span of Random Binary Matrices and the Lemma of Littlewood and Offord

In this section, we pass from the setting of deterministic binary N ×n matrices to Bernoullirandom matrices. Similarly as in the previous section, they induce random sets, which we will callBernoullirandom sets.

Definition 3.1. • ABernoullirandom vectorϵ(n)of parameterp ∈(0,1)is a ran- dom vector whose entriesϵj,j ∈[n]are independent copies of aBernoulliran- dom variable taking the values1,−1with probabilityp,(1−p), respectively.

• ABernoullirandom matrix E(N,n) of parameterp ∈ (0,1) is anN ×n random matrix where each row is an independent copy of aBernoulli random vector ϵ(n)of parameterp.

• For any finite set J, theBernoullirandom set S(J) of parameterp ∈ (0,1)is a random subset ofJ, such that for anyA⊂ J,

Pf

S(J)=Ag

=p|A|(1−p)|J|−|A|. (3.1) That is, each element is included with probabilityp.

(36)

Remark 3.2. For all random variables described above, we will sometimes omit the upper indices if they are clear from the context. Furthermore, we writeqinstead of1−p andS(n)instead ofS([n]).

The connection betweenBernoullirandom vectors and matrices andBernoulliran- dom sets is evident from their definition.

Remark 3.3. Letϵ(n) be aBernoullirandom vector with parameterp. Then the ran- dom set

S(n) =(

j|ϵj =1)

⊂ [n]

is aBernoulli random set with the same parameter; we will callS(n) theBernoulli random set corresponding toϵ(n). Also note that, since for arbitrary finite set J and a Bernoullirandom setS(J)of parameterpit holds that

X

A2J

Pf

S(J)=Ag

=

|J|

X

s=1

|J| s

!

psq|J|−s = (p+q)|J| =1,

it follows that (3.1) actually defines a probability distribution.

With the next lemma we can transfer the problem of bounding small ball probabilities as in(1.1)to the domain ofBernoullirandom sets and (symmetric)Sperner-kfamilies;

it generalizes the ideas used byErdősto prove the Lemma ofLittlewoodandOfford in [Erd45].

Lemma 3.4. Letϵ(n)be aBernoullirandom vector with parameterpand letS(n)be the correspondingBernoullirandom set. IfV is the union ofk open intervals of length at most2c andx ∈Rnis an arbitrary vector withminj[n]|xj| ≥c >0, then

Pf

(n),xi ∈Vg

≤ max

A2n ASperner-k

Pf

S(n)Asgn(x)g

≤Pk,n(p),

where we set

Pk,n(p) := max

ξ∈ {±1}n max

A2n ASperner-k

Pf

S(n)Aξg .

IfV is symmetric, we only need to consider symmetricSperner-k families; we denote the corresponding probability byP±,k,n(p).

(37)

Proof. We will only prove the theorem in the case whereV is symmetric; the general case follows analogously. Forx ∈Rnwithminj |xj| ≥c, letEbe the matrix whose rows are all vectorse ∈ {±1}nwithhe,xi ∈V. For the familyB={Bi|i ∈[N]}with

Bi = (

j|Ei,j =1)

⊂ [n],

Lemma 2.12 now implies thatBsgn(x) is a subfamily of a symmetricSperner-k family.

Since by construction, it holds that

he,xi ∈V ⇔ eis a row ofE ⇔ {j|ej =1} ∈B, it follows with Remark 3.3 that

Pf

(n),xi ∈Vg

=Pf

S(n)Bg

. (3.2)

Bearing in mind that the familyBsgn(x)is a subfamily of a symmetricSperner-kfamily and that

Aξξ

=AforA ⊂2nandξ ∈ {±1}n, we can bound(3.2)from above by Pf

S(n)Bg

=Pf

S(n)∈ (Bsgn(x))sgn(x)g

≤ max

A2n AsymmetricSperner-k

Pf

S(n)Asgn(x)g

≤ max

ξ∈ {±1}n max

A2n AsymmetricSperner-k

Pf

S(n)Aξg .

This completes the proof.

Remark 3.5. As in Remark 2.17, Lemma 3.4 implies for aBernoullirandom vectorϵ(n) of parameterp, arbitrarys-sparsex ∈ Rn, and arbitrary discrete setV ⊂ R, |V| = k that

Pf

(n),xi ∈Vg

≤ max

A2J ASperner-k

Pf

S(J)Asgn(x)g

≤Pk,s(p),

whereJ =supp(x). IfV is symmetric, we similarly obtain Pf

(n),xi ∈Vg

≤ max

A2J AsymmetricSperner-k

Pf

S(J)Asgn(x)g

≤P±,k,s(p).

The quantitiesPk,n(p)andP±,k,n(p) will play an important role in the remainder of the chapter. A key distinction between the general and the symmetric case is that for the latter case, the following basic montonicity property is no longer true in general, see Remark 3.11 below.

(38)

Lemma 3.6. For arbitrary integerskandm ≤n and arbitraryp ∈(0,1), it holds that Pk,m(p) ≥ Pk,n(p).

Proof. By induction, it is enough to prove the statement form = n−1. For arbitrary A⊂ 2n, define

A0={A⊂[n−1]|A∈A} ⊂2n1 and A1 ={A⊂ [n−1]|A∪ {n} ∈A} ⊂2n1. IfAis aSperner-kfamily, then bothA0 ⊂ 2n1andA1 ⊂2n1areSperner-kfamilies.

Now letS(n)andS(n1)beBernoullirandom sets with parameterp,ξ ∈ {±1}nbe a sign pattern andν ∈ {±1}n1the restriction ofξ to the firstn−1entries. Ifξn =1, we have

Pf

S(n)Aξg

=q·Pf

S(n1)A0ν

g

+p·Pf

Sn1A1ν

g ≤pPk,n1 +qPk,n1 =Pk,n1. (3.3) Ifξn =−1, inequality(3.3)holds true with interchanged roles ofpandq. This completes

the proof.

The next definition is required in order to be able to transfer the ideas of Corollary 2.16 to the random matrix case.

Definition 3.7. LetFk,n ⊂ 22n be the set of allmaximal modulated Sperner-kfamilies, that is, the set of all modulatedSperner-k familiesA ⊂ 2n which are not a proper subfamily of any other modulatedSperner-k family.

Furthermore, denote by F±,k,n ⊂ 22n the set of all maximal modulated symmetric Sperner-kfamilies.

Remark 3.8. Definition 3.7 also enables us to rewrite the probabilityPk,n(p) in terms of the setFk,n, since for aBernoullirandom setS(n)with parameterpit holds that

Pk,n(p) = max

ξ∈ {±1}n max

A2n ASperner-k

Pf

S(n)Aξg

= max

AFk,n

Pf

S(n)Ag ,

and similarly forP±,k,n(p)andF±,k,n.

For arbitraryp ∈ (0,1) we will now compute the probabilitiesP1,2(p) andP±,2,2(p), which we will need later.

(39)

Lemma 3.9. For arbitraryp ∈ (0,1), it holds that

P±,2,2(p)=P1,2(p)=p2+q2.

Proof. Letp ∈ (0,1) be arbitrary and letFbe the set of allSperner familiesA ⊂ 22. Then

F= (

∅,{∅},{{1}},{{2}},{{1,2}},{{1},{2}}) .

Direct calculations yield that the set of all modulatedSperner families of subsets of {1,2}is given by

F =F∪(

{∅,{1,2}}) .

Consequently, the set of all maximal modulatedSpernerfamilies is given by F1,2= (

{{1},{2}},{∅,{1,2}}) .

Next, we will computeF±,2,2. By Definition 2.10, Definition 2.14 and Remark 2.6, every modulated symmetricSperner-2family of subsets of[n]is of the form

A∪A1ξ

=AξAξ,

whereA ⊂2nis aSperner-1family andξ ∈ {±1}n is a sign pattern. Since forA∈F1,2, one hasA=A1and hence

F±,2,2= (

A∪A1A ∈F1,2)

=F1,2,

we can conclude thatP±,2,2(p) =P1,2(p). It remains to show thatP1,2(p) =p2+q2. For that, note that

P1,2(p) = max

AF1,2

Pf

S(2)Ag

=max(

2pq,p2+q2)

=p2+q2,

where the last equality is implied by

p2+q2−2pq =(p−q)2 ≥0.

This completes the proof.

(40)

Lemma 3.10. We have

|F±,2,3| =4, and for arbitraryp ∈(0,1), it holds that

P±,2,3(p)=1−pq.

Remark 3.11. While the probabilitiesPk,n(p)are non-increasing with respect ton(Lemma 3.6), the same does not necessarily hold true forP±,k,n(p). Lemma 3.9 and Lemma 3.10 imply for arbitraryp ∈(0,1)that

P±,2,3(p)−P±,2,2(p) =1−pq−(p2+q2)= (p+q)2−pq−(p2+q2)=pq >0.

Proof of Lemma 3.10. Since the families A1= (

{1},{2},{3})

and A11 = (

{2,3},{1,3},{1,2}) areSpernerfamilies, the set

F= (

A1ξA1ξξ ∈ {±1}3)

consists of modulated symmetricSperner-2families. We claim thatF=F±,2,3. To this end, observe that by union compatibility, it holds for arbitraryξ ∈ {±1}3that

A1A11

ξ

=

23\ {∅,{1,2,3}}ξ

= (23)ξ \ {∅,{1,2,3}}ξ = (23)\ {∅,{1,2,3}}ξ. Since{∅ξ|ξ ∈ {±1}n}=2n, it therefore follows that

F= (

23\ {A,Ac}A∈23 )

= (

23\ {A,Ac}A∈23, |A| ≤ 1 )

, (3.4)

where the last equality holds by symmetry of the set{A,A1}. Consequently, all the modulated symmetricSperner-2families contained inFare maximal. Indeed, for any A ∈ F, adding some A ∈ 23 \A results in a family which is not symmetric, and adding both missing sets results in23, which is not a modulated symmetricSperner-2 family. The same arguments also show that there are no modulated symmetricSperner- 2families of cardinality larger than 6. For a modulated symmetric Sperner-2family A ∈ 23 of cardinality smaller than6, its complement must also be symmetric, which shows thatAis a subfamily of someA ∈F. Hence,A cannot be maximal and one hasF±,2,3 =F. Since eachA ∈ 23with|A| ≤ 1yields a different set23\ {A,Ac},(3.4)

Referenzen

ÄHNLICHE DOKUMENTE

What is the PRF (Pulse Repetition Frequency)? Why there is a lower limit      3P .

Because the ocean floor bathymetry is known to exhibit anisotropies and to be correlated on several length scales due to the various geological processes contributing to its

11) X-shaped median apodeme on the frontal region: (0) absent; (1) present. Absent in Zorotypus weidneri. 12) Clypeus: (0) uniformly sclerotized; (1) with ante- and postclypeus.

In this paper a study on the dynamics of the Rhine outflow plume in the proximity of the river mouth was presented. The study is based on the analysis of 41 SAR images acquired over

In short, the exaggerated level of SWB for a life domain (due to the focusing illusion) cannot affect the levels of SWB for the other life domains not relevant to the

As for the conductivity sensor, the result of calibration shows that a set of coefficient for the conversion from the frequency to the conductivity decided at the time of the

The contribution (i.e. focusing: strong vs. weak) adverbs such as anche, soprattutto, proprio make to the information structure of the utterance of which they are part is

In this work, we present a computationally efficient exploration algorithm which utilizes field belief models based on Gaussian Processes, such as Gaussian Markov random fields