On boolean combinations forming piecewise testable languages

(1)

On Boolean Combinations forming Piecewise Testable Languages

Tomáˇs Masopustâ,1,∗, Michaël Thomazo^b,2

aInstitute of Theoretical Computer Science and Center of Advancing Electronics Dresden (cfaed), TU Dresden, Germany and Institute of Mathematics, Czech Academy of Sciences, Czech Republic

bInria, France

Abstract

A regular language isk-piecewise testable (k-PT) if it is a Boolean combination of languages of the formL_a₁_a₂...a_n = Σ^∗a1Σ^∗a2Σ^∗· · ·Σ^∗anΣ^∗, whereai∈Σand 0≤n≤k. Given a finite automatonA, if the languageL(A)is piecewise testable, we want to express it as a Boolean combination of languages of the above form. The idea is as follows. If the language isk-PT, then there exists a congruence∼_kof finite index such thatL(A)is a finite union of∼_k-classes.

Every such class is characterized by an intersection of languages of the fromL_u, for|u| ≤k, and their complements.

To represent the∼_k-classes, we make use of the∼_k-canonical DFA. We identify the states of the∼_k-canonical DFA whose union forms the languageL(A)and use them to construct the required Boolean combination. We study the computational and descriptional complexity of related problems.

1. Introduction

A regular languageLover an alphabetΣispiecewise testable(PT) if it is a finite Boolean combination of languages of the formL_a₁_a₂_...a_n =Σ^∗a₁Σ^∗a₂Σ^∗· · ·Σ^∗a_nΣ^∗, wherea_i∈Σandn≥0. If the language is piecewise testable, then it is a finite Boolean combination of languages of the formL_u, where the length ofu∈Σ^∗is at mostk. In this case, the language is calledk-piecewise testable(k-PT).

In this paper, we study the problem of translating an automaton representing a piecewise testable language into a Boolean combination of languages of the formL_u. The motivation comes from the simplification of XML Schema, since such expressions resemble XPath-like expressions used in the BonXai schema language. The reader is referred to Martens et al. [21] for more details. Since every piecewise testable language isk-PT for somek≥0, and ak-PT language is also(k+1)-PT, we focus on the Boolean combination of languagesL_u, where the length ofuis bounded by the minimalkfor which the language isk-PT. From this point of view, we are interested in translating an automaton to the form of a generalized regular expression (a regular expression allowing the operation of complement). Generalized regular expressions can be non-elementary more succinct than classical regular expressions [6, 29, 9] and not much is known about these transformations [7]. There are many different Boolean combinations describing the same language, and it is not clear which of them is the best representation. The choice significantly depends on applications. We are interested in those Boolean combinations that resemble the disjunctive normal form of logical formulas rather than in the most concise representation.

The basic idea to perform this translation can be outlined as follows. LetLbe a language overΣ(represented by its minimal DFA) and let the equivalence relation∼_k onΣ^∗ be defined byu∼_kvifu andvhave the same sets of (scattered) subwords up to lengthk, denoted bysub_k(u) =sub_k(v). ThenLis piecewise testable if and only if there exists a nonnegative integerksuch that∼_k⊆∼_L, where∼_Lis the Myhill congruence [24], that is, everyk-PT language is a finite union of∼_k-classes. As shown, e.g., by Kl´ıma [17], the∼_k-classes can be described by languages of the form[w]∼_k=^T_u∈sub

k(w)L_u∩^T_u/_∈sub

k(w),|u|≤kL_u,whereL_udenotes the complement ofL_u. The high-level approach is thus:

∗Corresponding author.

Email addresses:tomas.masopust@tu-dresden.de(Tom´aˇs Masopust),michael.thomazo@inria.fr(Micha¨el Thomazo)

1Research supported by the German Research Foundation (DFG) in Emmy Noether grant KR 4381/1-1 (DIAMOND).

2Research supported by the Alexander von Humboldt Foundation

(2)

1. Check whether the regular languageLis piecewise testable.

2. If so, compute the minimalk≥0 for whichLisk-piecewise testable.

3. Compute the finite number of representatives of the equivalence classes that form the union of the languageL, express them as above and form their union.

We study the computational and descriptional complexity of this approach, provide an overview of related results, and formulate several open problems.

The complexity of the first step has been studied in the literature. Simon [26] proved that PT languages are exactly those regular languages whose syntactic monoid isJ-trivial, which gives decidability. Stern [28] showed that the problem is decidable in polynomial time for languages represented by DFAs and Cho and Huynh [5] proved NL- completeness for DFAs. Later, Trahtman [31] showed that the problem is solvable in time quadratic with respect to the number of states of the DFA and linear with respect to the size of the alphabet, and Kl´ıma and Pol´ak [19] gave an algorithm that is quadratic in the size of the input alphabet and linear in the number of states of the DFA. For languages represented by NFAs, the problem is PSPACE-complete [11].

The second step gives rise to thek-piecewise testabilityproblem formulated as follows:

INPUT: an automaton (DFA or NFA)A

OUTPUT: YESif and only ifL(A)isk-piecewise testable

The problem is trivially decidable for anykbecause there are only finitely manyk-PT languages over the alphabet ofA. We investigate and overview the computational complexity of this problem. The upper bound complexity for DFAs has been independently solved in [10, 18, 23]. The co-NP upper bound on thek-piecewise testability problem for DFAs first appeared in [10] without proof, formulated in terms of separability.³ In this paper, we recall (without proof) the result of [18] showing that the problem is co-NP-complete for DFAs ifk≥4. We then focus on the complexity of the problem fork<4. In particular, for the input given as the minimal DFA, the problem is trivial for k=0, belongs to AC⁰fork=1 (Theorem 6), and is NL-complete fork=2,3 (Theorems 13 and 18). For NFAs, we show that the problem is PSPACE-complete for anyk≥0 (Theorem 20).

There is an interesting observation by Kl´ıma and Pol´ak [19] that if the depth of a minimal DFA recognizing a PT language isk, then the language isk-PT. (Bounds for finite languages and upward and downward closures have recently been investigated by Karandikar and Schnoebelen [16].) The observation reduces Step 2 to solving a finite number ofk-piecewise testability problems, since the upper bound onkis given by the depth of the minimal DFA equivalent toA. The opposite implication does not hold, therefore we investigate the relationship between the depth of an NFA andk-piecewise testability of its language. We show that, for everyk≥0, there exists ak-PT language with an NFA of depthk−1 and with the minimal DFA of depth 2^k−1 (Theorem 27). Although it is well known that DFAs can be exponentially larger than NFAs, a by-product of our result is that all the exponential number of states of the DFA form a simple path, which is, in our opinion, a result of interest by its own. In addition, the reverse of the NFAs constructed in the proof is deterministic, partially ordered and locally confluent. Therefore, our result also provides a further insight into the complexity of the reverse of piecewise testable languages previously studied in [4, 14].

The last step of the approach requires to compute those∼_k-classes, whose union forms the languageL, and to express them as the intersection of languages of the formL_uor its complements. To identify these equivalence classes, we make use of the∼_k-canonical DFA, whose states correspond to∼_k-classes. We construct the∼_k-canonical DFA and compute its accepting states by intersection with the input automaton. The accepting states then represent the

∼_k-classes forming the languageL. The∼_k-canonical DFA can be effectively constructed. Moreover, although the precise size of the∼_k-canonical DFA is not known, see the estimations in [15], we show that the tight upper bound on its depth is ^k+n_k

−1, wherenis the cardinality of the alphabet (Theorem 31).

This paper is an extended version of paper [23] presented at the DLT 2015 conference, containing full proofs and updated with the latest results and open problems. After introducing the necessary notions (Section 2), we introduce the approach on an example (Section 3), before studying the complexity of the k-piecewise testability problem for DFAs (Section 4) and NFAs (Section 5). We finish by investigating the depth of minimal DFAs (Section 6).

3The result is a consequence of a proof that is omitted in the conference version.

(3)

2. Preliminaries and Definitions

We assume that the reader is familiar with automata theory [20]. The cardinality of a setAis denoted by|A|and the power set ofAby 2^A. An alphabetΣis a finite nonempty set. The free monoid generated byΣis denoted byΣ^∗. A word overΣis any element ofΣ^∗; the empty word is denoted byε. For a wordw∈Σ^∗, alph(w)⊆Σdenotes the set of all letters occurring inw, and|w|_adenotes the number of occurrences of letterainw. A language overΣis a subset ofΣ^∗. For a languageLoverΣ, letL=Σ^∗\Ldenote the complement ofL.

Anondeterministic finite automaton(NFA) is a quintupleA = (Q,Σ,·,I,F), whereQis a finite nonempty set of states,Σis an input alphabet,I⊆Qis a set of initial states,F⊆Qis a set of accepting states, and·:Q×Σ→2^Qis the transition function that can be extended to the domain 2^Q×Σ^∗by induction. The languageacceptedbyA is the set L(A) ={w∈Σ^∗|I·w∩F6=/0}. We sometimes omit·and write simplyIwinstead ofI·w. Apathπfrom a stateq₀ to a stateq_nunder a worda₁a₂· · ·a_n, for somen≥0, is a sequence of states and input symbolsq₀a₁q₁a₂. . .qn−1a_nq_n such thatqi+1∈q_i·ai+1, for alli=0,1, . . . ,n−1. Pathπisacceptingifq₀∈Iandq_n∈F. We writeq₀−−−−−→^a¹^a²^···aⁿ q_n to denote that there exists a path fromq0toq_nunder the worda1a2· · ·a_n. A path issimpleif all states of the path are pairwise different. The number of states on the longest simple path ofA, starting in the initial state, decreased by one (the number of transitions on the path) is called thedepthof automatonA, denoted bydepth(A).

The NFAA isdeterministic(DFA) if|I|=1 and|q·a|=1 for everyq∈Qanda∈Σ. The transition function· is then a map fromQ×ΣtoQthat can be extended to the domainQ×Σ^∗by induction. Two states of a DFA are distinguishableif there exists a word wthat is accepted from one of them and rejected from the other. A DFA is minimalif all its states are reachable and pairwise distinguishable.

The reachability relation≤on the set of states is defined byp≤qif there is a wordw∈Σ^∗such thatq∈p·w.

The NFAA ispartially orderedif the reachability relation≤is a partial order. For two statespandqofA, we write p<qifp≤qandp6=q. A statepismaximalif there is no stateqsuch thatp<q. Partially ordered automata are sometimes calledacyclic. In this terminology, a cycle is a nontrivial loop, since self-loops are allowed in partially ordered automata.

LetA = (Q,Σ,·,i,F)be a DFA, and letΓ⊆Σ. The DFAA isΓ-confluent if, for every stateq∈Qand every pair of wordsu,v∈Γ^∗, there exists a wordw∈Γ^∗such that(qu)w= (qv)w. The DFAA isconfluentif it isΓ-confluent for every subalphabetΓofΣ. The DFAA islocally confluentif, for every stateq∈Qand every pair of lettersa,b∈Σ, there exists a wordw∈ {a,b}^∗such that(qa)w= (qb)w.

An NFAA = (Q,Σ,·,I,F)can be turned into a directed graph G(A)with the set of vertices Q, where a pair (p,q)∈Q×Qis an edge inG(A)if there is a transition fromptoqinA. ForΓ⊆Σ, we define the directed graph G(A,Γ)with the set of verticesQby considering all those transitions that correspond to letters inΓ. For a statep, let Σ(p) ={a∈Σ|p∈p·a}denote the set of all letters under which the NFAA has a self-loop in state p. LetA be a partially ordered NFA. If for every statepofA, state pis the unique maximal state of the connected component of G(A,Σ(p))containingp, then we say that the NFA satisfies theunique maximal state (UMS) property.

We adopt the notationL_a₁_a₂···a_n =Σ^∗a₁Σ^∗a₂Σ^∗· · ·Σ^∗a_nΣ^∗from [19]. Furthermore, for two wordsv=a₁a₂· · ·a_n andw∈L_v, we say thatvis asubwordof wor thatvcan beembeddedintow, denoted by v4w. Fork≥0, let sub_k(v) ={u∈Σ^∗|u4v,|u| ≤k}. For two wordsw₁,w₂, we definew₁∼_kw₂if and only ifsub_k(w₁) =sub_k(w₂). If w₁∼_kw₂, we say thatw₁andw₂arek-equivalent. Note that∼_kis a congruence with finite index.

The∼_k-canonical DFAis the DFAA = (Q,Σ,·,[ε],F), whereQ=

[w]|w∈Σ^≤k ,[w] ={w⁰|w⁰∼_kw}, and the transition function·is defined so that, for a state[w]and a lettera,[w]·a= [wa].

Let∼_Ldenote the Myhill congruence [24].

Fact 1(Simon [26]). A regular language L is k-PT if and only if∼_k⊆∼_L. Moreover, L is then a finite union of∼_k classes.

Fact 1 says that ifLisk-PT, then any twok-equivalent words either both belong toLor neither does. In terms of a minimal DFA, twok-equivalent words end up in the same state.

Fact 2. Let L be a language recognized by the minimal DFAA. The following is equivalent.

1. The language L is PT.

2. The minimal DFAA is partially ordered and (locally) confluent [19].

3. The minimal DFAA is partially ordered and satisfies the UMS property [31].

(4)

2

1 b

a

2 b 1 0

a

a b

a,b

Figure 1: NFAArecognizing the languageL(left) and its reverse with added sink state 0 (right)

3. Example

Before we investigate the individual steps of the approach, we provide a simple example demonstrating it. LetLbe the language recognized by the NFA depicted in Figure 1 (left). Since the automaton is nondeterministic, neither [31]

nor [19] applies to decide whetherLis piecewise testable. In Theorem 25, we generalize Fact 2 to NFAs, which then gives thatLis piecewise testable. Another way how to see this is to notice that the reverse of the NFA forL, depicted in Figure 1 (right), is deterministic. Since, by definition,Lisk-PT if and only ifL^R={a_n. . .a₂a₁|a₁a₂. . .a_n∈L}

isk-PT, the results of [19, 31] can be used to decide whetherLis piecewise testable. The upper bound onkgiven by the depth of the DFA [19] gives thatLis 2-PT. It could be that the language is 1-PT. However, the characterization of 1-PT languages in Lemma 5 shows that it is not the case. Thus, the languageLis piecewise testable and the minimal kfor which it isk-PT isk=2.

ε

a

b

ab aa

bb ba

ab,bb aa,ab,ba aa,ab

aa,ab,bb

ba,aa

ba,ab,bb ba,bb

aa,ba,bb

aa,ab,ba,bb a

b

a b

a

b a b

a,b

a b

a

b a

b

a

b

Figure 2: The∼₂-canonical DFACover the binary alphabet{a,b}

Having this information, we can construct the∼₂-canonical DFAC over the binary alphabet{a,b}as depicted in Figure 2. The states ofC correspond to the∼₂-classes and are labeled by their maximal elements with respect to the relation of embedding4. The initial state ofC is the class[ε]and all its states are accepting. The label of state [w]is the set of maximal elements of the setsub₂(w). For instance, becausesub₂(aab) ={ε,a,b,aa,ab}, the label of state[aab]is{aa,ab}. The choice to have all states accepting is made because we now compute the product of the automataC andA. The result, depicted in Figure 3, says that languageLis a union of the following four∼₂

(5)

ε|2

ε|1 a|1 aa|1

b|2 ab|2 aa,ab|2

a b

Figure 3: Product ofCandArestricted to reachable and co-reachable states

classes: [ε],[b],[ab],[aab]. By [17], these classes are characterized by the intersections of the required languages or their complements. Namely,L= [ε]∪[b]∪[ab]∪[aab], where[ε] =L_a∩L_b∩L_aa∩L_ab∩L_ba∩L_bb=L_a∩L_b, [b] =L_b∩La∩Laa∩L_ab∩L_ba∩L_bb=L_b∩La∩L_bb,[ab] =La∩L_b∩L_ab∩Laa∩L_ba∩L_bb=L_ab∩Laa∩L_ba∩L_bb, and[aab] =La∩L_b∩Laa∩L_ab∩L_ba∩L_bb=Laa∩L_ab∩L_ba∩L_bb. We used a simple observation formulated below as Lemma 3 to reduce the number of elements in the intersection.

Lemma 3. If u4v, then L_v⊆L_uand L_u⊆L_v.

The reader can notice that[ab]∪[aab] =L_ab∩L_ba∩L_bb. Thus,

L= (L_a∩L_b)∪(L_b∩L_a∩L_bb)∪(L_ab∩L_ba∩L_bb).

To justify our choice of a 2-PT language, we point out that if we considered a 3-PT language, then the size of the∼₃-canonical DFA over a binary alphabet would contain 68 states [15] and it would not be possible to present it here in a reasonable form. It is an open question whether it is possible to avoid the use of the∼_k-canonical DFA. One natural way how to reduce the complexity is to rather build the canonical DFA on-the-fly during the computation of its product with the input automatonA rather than precomputing it and storing it in memory. In this way the algorithm would construct only a relevant part of the canonical DFA, compare the Figures 2 and 3.

4. Complexity ofk-Piecewise Testability for DFAs

Thek-piecewise testability problem for DFAsasks whether, given a minimal DFAA, the languageL(A)isk- PT. It has been independently proved to be in co-NP in [10, 18, 23]. We recall the result of [18] that also proves co-NP-completeness ifk≥4.

Theorem 4(Kl´ıma, Kunc, Pol´ak [18]). For k≥4, the k-piecewise testability problem for DFAs is co-NP-complete.

It is shown in [18] that the problem remains co-NP-complete even if the parameterk≥4 is given as part of the input, and that it is decidable in polynomial time if the alphabet is fixed.

We now study the complexity of the problem fork≤3.

0-Piecewise Testability. LetA be a minimal DFA over an alphabetΣ. The languageL(A)is 0-PT if and only if it has a single state, that is, it recognizes eitherΣ^∗or /0. Thus, it is decidable inO(1)whetherL(A)is 0-PT.

1-Piecewise Testability. We show that the 1-PT problem belongs to AC⁰, which is a strict subset of LOGSPACE.

There is an infinite hierarchy of classesΣ_i(Π_i) in AC⁰based on the number of alternating levels of disjunctions and conjunctions. Specifically,Σ_i(Π_i) is the class of problems solvable by uniform families of unlimited fan-in circuits of constant depth and polynomial size withialternating levels of AND and OR gates (with NOT gates only in the input) and with the output gate being an OR gate (an AND gate) [1]. To prove the result, we make use of the following characterization lemma.

Lemma 5. LetA = (Q,Σ,·,i,F)be a minimal DFA. Then L(A)is 1-PT if and only if (i) for every p∈Q and a∈Σ, pa=q implies that qa=q, and (ii) for every p∈Q and a,b∈Σ, pab=pba.

(6)

Proof. Assume thatL(A)is 1-PT, and letpbe a state ofA. SinceA is minimal,pis reachable and there existsw such thatiw=p. It holds that alph(wa) =alph(waa), i.e.,wa∼₁waa, thus bothwaandwaaleadA to the same state, that is,paa=pa. Similarly, alph(wab) =alph(wba)implies thatpab=pba.

On the other hand, we show that for any word w,iw=ia₁a₂. . .a_n, where alph(w) ={a₁,a₂, . . . ,a_n}. For any a,b∈Σand any stateq,qab=qbaimplies thatiw=ia^k₁¹a^k₂². . .a^k_nⁿ, wherekiis the number of occurrences ofaiinw.

By (i),iw=ia1a2. . .an. Thus, ifw1∼₁w2, theniw1=iw2. By Fact 1, this shows thatL(A)is 1-PT.

We can now prove the following theorem.

Theorem 6. To decide whether a minimal DFA recognizes a 1-PT language is in AC⁰.

Proof. To prove the theorem, consider Lemma 5 and notice that the properties can be expressed as aΠ3formula

^

(p,a,q)

[¬(p,a,q)∨(q,a,q)]∧ ^{^}

(p,a,r),(r,b,q)

"

¬(p,a,r)∨ ¬(r,b,q)∨^_

s

((p,b,s)∧(s,a,q))

# ,

where(p,a,q)∈Q×Σ×Qis true if and only if there is a transition from state pto statequnderain the minimal DFA. The corresponding family of circuits is of polynomial size with respect to the size of the automaton, namely of sizeO(|Q|²· |Σ|), and of constant depth, which proves the theorem.

Open Problem 7. Is the 1-PT problemΠ3-hard in the AC⁰hierarchy?

As a consequence of Lemma 5, we have that a minimal DFA of a 1-PT language has at most 2^|Σ|states, and this bound is tight as shown in Example 28 below.

Corollary 8. If a minimal DFA overΣhas more than2^|Σ|states, then its language is not 1-PT.

2-Piecewise Testability. We now show that to decide whether a minimal DFA recognizes a 2-PT language is NL- complete. This complexity coincides with the complexity of deciding whether a regular language is PT, that is, whether there exists akfor which the language isk-PT.

We first need the following lemma stating that for any twok-equivalent words ending up in two different states, there exist other two equivalent words ending up in two different states, such that one word is a subword of the other and the words differ only by a single letter. Our proof follows the lines of Simon’s original paper [27, Section 2].

Lemma 9. LetA = (Q,Σ,·,i,F)be a minimal DFA. For every k≥0, if w₁∼_kw2and iw16=iw2, then there exist two words w and w⁰such that w∼_kw⁰, w⁰is obtained from w by adding a single letter at some place, and iw6=iw⁰. Proof. Letw₁,w₂be two words such thatw₁∼_kw₂andiw₁6=iw₂. By Theorem 6.2.6 in [25], there is a wordw₃ such that w₁ andw₂ are subwords of w₃and w₁∼_kw₂∼_k w₃. Eitherw₁ andw₃, or w₂ and w₃, do not end up in the same state of the automaton. Letv,v⁰ ∈ {w₁,w₂,w₃} be such that v is a subword ofv⁰ and iv6=iv⁰. Let

v=u₀,u₁, . . . ,u_n=v⁰ be a sequence such thatu_j+1 is obtained fromu_j by adding a letter at some place. Such a

sequence exists sincevis a subword ofv⁰. Thus, there is jsuch thatu_j andu_j+1end up in two different states and u_j+1is obtained fromu_j by adding a letter at some place. Settingw=u_j andw⁰=u_j+1completes the proof, since sub_k(v)⊆sub_k(w)⊆sub_k(w⁰)⊆sub_k(v⁰) =sub_k(v).

We now prove a characterization of 2-PT languages similar to that of Lemma 5.

Lemma 10. LetA = (Q,Σ,·,i,F)be a minimal partially ordered and confluent DFA. The language L(A)is 2-PT if and only if for every a∈Σand every state p such that iw=p for some word w with|w|_a≥1, pua=paua, for every u∈Σ^∗.

Proof. (⇒)By contraposition – assume that there existu,w∈Σ^∗and a statepsuch thatiw=p,wcontainsa, and pua6=paua. Letw=w₁aw₂witha∈/alph(w₁). We show thatw₁aw₂ua∼₂w₁aw₂aua, by showing that they have the same set of subwords of length at most 2. The subwords ofw₁aw₂uaare subwords ofw₁aw₂aua. Conversely, the subwords ofw₁aw₂auathat are potentially not subwords ofw₁aw₂uaare of two shapes:cawherec∈alph(w₁aw₂)or

(7)

adwhered∈alph(ua). For anyc∈alph(w₁aw2), ifca4w1aw2aua, thenca4w1aw2ua. Similarly ford∈alph(ua) andad4w1aw2aua. Thusw1aw2ua∼₂w1aw2aua. Sincei·wua6=i·waua, the minimality ofA gives that there exists a wordvsuch thatwuav∈L(A)if and only ifwauav∈/L(A). Since∼₂is a congruence,wuav∼₂wauav, which violates Fact 1; hence,L(A)is not 2-PT.

(⇐)Letw₁andw₂be two words such thatw₁∼₂w₂. We show thatiw₁=iw₂. By Lemma 9, it is sufficient to show this direction for two wordswandw⁰such thatw⁰is obtained fromwby adding a single letter at some place.

Thus, letabe the letter, and let

w=a₁. . .a_ka_k+1. . .a_nandw⁰=a₁. . .a_kaa_k+1. . .a_n for 0≤k≤n. Letw_m,_j=a_ma_m+1. . .a_j. We distinguish two cases.

(A) Assume thatadoes not appear inw_1,k. Thenamust appear inw_k+1,n. Consider the first occurrence ofain w_k+1,n. Thenw_k+1,n=u₁au₂, whereadoes not appear inu₁. LetB=alph(u₁a). ThenB⊆alph(u₂), because if there is noainw_1,ku₁, any subwordax, forx∈B, that appears inw⁰=w_1,kau₁au₂must also appear in the subwordau₂of w=w_1,ku₁au₂.

Letu₂=x₁b₁x₂b₂x₃. . .x_`b_`x_`+1, whereB={b₁,b₂, . . . ,b_`}andb_jdoes not appear inx₁b₁x₂. . .x_j, j=1,2, . . . , `.

Letv=b₁b₂. . .b_`. Letz∈ {i·w_1,ku₁a,i·w_1,kau₁a}. We prove (by induction on j) that for every j=1,2, . . . , `, there exists a wordy_jsuch thatz·(b₁b₂. . .b_j)^Ry_j=z·x₁b₁x₂b₂x₃. . .x_jb_jx_j+1.Sinceb₁appears inu₁a, we use the assumption from the statement of the lemma to obtain(z·x₁b₁)·x₂= (z·b₁x₁b₁)·x₂, that is,y₁=x₁b₁x₂. Assume that it holds forj<k. We prove it for j+1. Again,bj+1appears inu₁aimplies that

z·x1b1x2b2x3. . .xjbjxj+1bj+1xj+2= ((z·x1b1x2b2x3. . .x_jbjxj+1)b_j+1)x_j+2

= ((z·b_j. . .b₂b₁y_j)b_j+1)x_j+2

=z·b_j+1b_j. . .b₂b₁y_jb_j+1x_j+2

where the second equality is by the induction hypothesis and the third is by the assumption from the statement of the lemma applied to the underlined part. Thus,y_j+1=y_jb_j+1x_j+2, which completes the inductive proof. In particular, there exists a wordysuch thati·w_1,ku₁av^Ry=i·wandi·w_1,kau₁av^Ry=i·w⁰.

Letz₁=i·w_1,ku₁aandz₂=i·w_1,kau₁a. We prove thatz₁·v^R=z₂·v^R, which then concludes the proof since it implies thati·w=i·w⁰. To prove this, we make use of the following claim.

Claim 11. For every a,b∈Σand every state p such that i·w=p and a and b appear in w, p·ab=p·ba.

Proof. By the assumption of the lemma, sinceaappears inw,p·ba=p·aba=q₁. Similarly, sincebappears inw, p·ab=p·bab=q₂. Thenq₂·a= (p·ab)a=q₁andq₁·b= (p·ba)b=q₂. Since the automaton is partially ordered, q₁=q₂.

We finish the proof by induction on the length ofv^R=b_`. . .b₂b₁by showing that the statez⁰_i=z_i·b_`. . .b₂b₁has self-loops underB,i=1,2. Letz_i−−−−−→^b^`^...b²^b¹ z⁰_i=q_i,`+1b_`q_i,`b`−1qi,`−1. . .q_i,2b₁q_i,1denote the path defined by the word v^Rfrom the statez_i,i=1,2.

Claim. Both states z⁰₁and z⁰₂have self-loops under all letters of B.

Proof. Indeed, qi,j·bj=qi,j+1·bjbj=qi,j+1·bj=qi,j, where the second equality is by the assumption from the statement of the lemma, sincebjappears inu1. Thus, there is a self-loop inqi,junderbj. Then,z⁰_i=qi,1=qi,1b1=z⁰_ib1. Now, for every j=2, . . . , `, we havez⁰_i=q_i,1=q_i,_j·bj−1. . .b₂b₁=q_i,_j·b_jbj−1. . .b₂b₁=q_i,_j·bj−1. . .b₂b₁b_j=z⁰_ib_j, where the third equality is because there is a self-loop inq_i,_j underb_j, and the fourth is by several applications of commutativity (Claim 11).

Thus, since no other states are reachable fromz⁰₁andz⁰₂underB, andz⁰₁andz⁰₂are reachable fromi·w_1,kby words overB, confluency of the automaton implies thatz⁰₁=z⁰₂, which completes the proof of part (A).

(B) Ifa=a_ifor somei≤k, we consider two cases. First, assume that for everyc∈Σ∪ {ε},cais a subword of w_1,kaimplies thatcais a subword ofw_1,k. Thenaais a subword ofw_1,k. Letw_1,k=w₃aw₄, whereadoes not appear inw₄. Letq=i·w₃a. By the assumption of the lemma,q=i·w₃a=i·w₃aa, hence there is a self-loop inqundera.

(8)

LetB=alph(w₄). Note thatB⊆alph(w₃), since ifxais a subword ofw1,ka, then it is also inw3a. By the self-loop underainqand commutativity (Claim 11),q·w4=q·aw4=q·w4a. Thus,i·w1,k=i·w1,ka.

Second, assume that there existscinw_1,k such thatca4w_1,ka is not a subword ofw_1,k. Then amust appear inw_k+1,n. Together, there existi≤k< jsuch thata_i=a_j=a. By the assumption of the lemma,i·w_1,kaw_k+1,_j= i·w_1,kw_k+1,_j, sincew_k+1,j=xa, for somex∈Σ^∗. This implies thati·w=i·w⁰.

This completes the proof of part (B) and, hence, the whole proof.

The previous result gives a PTIME algorithm to decide whether a minimal DFA recognizes a 2-PT language. To show that the problem is in NL, we need the following lemma providing a characterization of 2-PT languages that can be verified locally in nondeterministic logarithmic space.

Lemma 12. LetA = (Q,Σ,·,i,F)be a DFA. The following conditions are equivalent:

1. For every a∈Σand every state s such that iw=s for some w∈Σ^∗with|w|_a≥1, sua=saua, for every u∈Σ^∗. 2. For every a∈Σ and every state s such that iw=s for some w∈Σ^∗ with|w|_a≥1, sba=saba for every

b∈Σ∪ {ε}.

Proof. Condition 2 is a special case of Condition 1 foru=b. We prove the opposite direction by induction on the length ofu. Leta∈alph(w)such thatiw=s. Ifu=ε, we takeb=ε; otherwise,u=u⁰b. By the induction hypothesis, we havesu⁰a=sau⁰a. Thussua=su⁰ba= (su⁰)ba= (su⁰)aba= (su⁰a)ba= (sau⁰a)ba= (sau⁰)ba=saua.

We can now formulate the main result of this paragraph.

Theorem 13. To decide whether a minimal DFA recognizes a 2-PT language is NL-complete.

Proof. To check whether a minimal DFA isnotconfluent or doesnotsatisfy Condition 2 of Lemma 12 can be done in NL; the reader is referred to [5] for more details. Since NL=co-NL [13, 30], we have an NL algorithm to check 2-piecewise testability of a minimal DFA. NL-hardness follows from Lemma 14 below.

Lemma 14. For every k≥2, the k-PT problem is NL-hard.

Proof. To prove NL-hardness, we reduce the monotone graph accessibility problem (2MGAP), a special case of the graph reachability problem, known to be NL-complete [5]. An instance of 2MGAP is a graph(G,s,g), where G= (V,E)is a graph with the set of verticesV={1,2, . . . ,n}, the source vertexs=1 and the target vertexg=n, the out-degree of each vertex is bounded by 2 and for all edges(u,v),vis greater thanu(the vertices are linearly ordered).

We construct the automatonA = (V∪ {i,f₁,f₂, . . . ,f_k−1,d},Σ,·,i,{f_k−1})as follows. For every edge(u,v), we construct a transitionu·a_uv =vover a fresh lettera_uv. Moreover, we add the transitionsi·a=s,g·a= f₁ and f_j·a=f_j+1, j=1,2, . . . ,k−2, over a fresh lettera. The automaton is deterministic, but not necessarily minimal, since some of the states may not be reachable from the initial state, or some states may be equivalent. To ensure minimality of the constructed automaton, we add, for each statev∈V\ {s}, new transitions fromitovunder fresh letters, and for each statev∈V\ {g}, new transitions fromvto fk−1under fresh letters. All undefined transitions go to the sink stated.

Claim. The automatonA is deterministic and minimal, and L(A)is finite.

Proof. By construction, all states are reachable from the initial stateiand can reach (except the sink state) the unique accepting state fk−1. In addition, the automaton is deterministic and minimal, since every transition is labeled by a unique label (except for the transitionsia=sandga^k−1=fk−1labeled with the same letter), which makes the states non-equivalent. Finally,L(A)is finite because the monotonicity of the graph(G,s,g)implies that the automaton does not contain a cycle nor a self-loop (but the sink stated).

The following claim is needed to complete the proof.

Claim 15. Let w be a word overΣ. If every a fromΣappears at most once in w, that is,|w|_a≤1, then the language {w}is 2-PT.

(9)

1 2

3

4

5 b

a

a b

b a

a b

a,b

Figure 4: The minimal DFA recognizingL

Proof. Since the language {w} is PT, its minimal DFA is partially ordered and confluent. Then the condition of Lemma 10 is trivially satisfied, since, after the second occurrence of the same letter, the minimal DFA accepting{w}

is in the unique maximal non-accepting state.

We now finish the proof of Lemma 14 by showing thatL(A)isk-PT if and only ifgis not reachable froms.

Assume thatgis reachable froms. Letwbe a sequence of labels of a path fromstoginA. Thenawa^k−1belongs toL(A)andawa^kdoes not. However,awa^k−1∼_kawa^k, which proves thatL(A)is notk-PT.

If g is not reachable from s, then L(A) ={au₁,au₂, . . . ,au_`,u_`+1, . . . ,u_`+t} ∪ {w₁a^k−1,w₂a^k−1, . . . ,w_ma^k−1}, whereu_iandw_iare words overΣ\ {a}that do not contain any letter twice. Then the first part is 2-PT by Claim 15, as well as the second part fork=2. It remains to show that for anyk≥3, the second part ofL(A)isk-PT. Assume that w_ja^k−1∼_kw, for some 1≤j≤mandw∈Σ^∗. Thenw=v₁av₂a. . .av_kfor somev₁,v₂, . . . ,v_ksuch that|v₁. . .v_k|_a=0.

Since|w_j|_a=0 and, for any lettercofv₂· · ·vk−1(resp.v_k), the wordaca(resp.a^k−1c) can be embedded intow_ja^k−1, that is, intoa^k−1, we have thatv₂· · ·v_k=ε, i.e.,w=v₁a^k−1. Sincew_ja^k−1∼_kv₁a^k−1, we have thatw_ja∼_kv₁a, hence w_ja=v1a, andw_ja^k−1andwend up in the same state, which concludes the proof.

Remark 16 (on 1-PT and 2-PT). It was shown by Blanchet-Sadri [3] that 1-PT languages are characterized as the languages whose syntactic monoids satisfy the equationsx=x²andxy=yx, and 2-PT languages are characterized as those whose syntactic monoids satisfy the equationsxyzx=xyxzx and(xy)²= (yx)². It can be seen that these equations could be directly used to achieve NL algorithms. Our characterizations, however, improve these results and show that, for 1-PT languages, it is sufficient to verify the equationsx=x²andxy=yxon letters (generators), and that, for 2-PT languages, equationxyzx=xyxzxcan be verified on letters (generators) up to the elementy, which is a word (a general element of the monoid). Our results thus decrease the complexity of the problems. In addition, the partial order and (local) confluency can be checked instead of the equation(xy)²= (yx)².

The following example demonstrates an application of our characterization lemmas.

Example 17. Consider the languageLrecognized by the minimal DFA depicted in Figure 4. By [19, 31] or Theo- rem 25 below,Lis piecewise testable. Since the depth of the DFA is 3,Lis 3-PT [19]. Using Lemmas 5 and 12, the reader can verify that the language is 2-PT but not 1-PT. Furthermore, the technique studied in this paper and demon- strated in Section 3 results inL= [aab]∪[aabb]∪[aaba]∪[abba], where (using Lemma 3)[aab] =L_aa∩L_ab∩L_ba∩L_bb, [aabb] =L_aa∩L_ab∩L_bb∩L_ba,[aaba] =L_aa∩L_ab∩L_ba∩L_bb,[abba] =L_aa∩L_ab∩L_ba∩L_bb. By the standard De Mor- gan’s laws,L=L_aa∩L_ab∩ (L_ba∩L_bb)∪(L_ba∩L_bb)∪(L_ba∩L_bb)∪(L_ba∩L_bb)

=L_aa∩L_ba∩ {a,b}^∗=L_aa∩L_ab. 3-Piecewise Testability. In this paragraph, we make use of the known equations(xy)³= (yx)³,xzyxvxwy=xzxyxvxwy andywxvxyzx=ywxvxyxzxcharacterizing the variety of 3-PT languages [3] to show NL-completeness of the 3-piecewise testability problem. The hardness is shown in Lemma 14. For the membership, we make use of the closure of NL under complement. To show that one of these equations is not satisfied, we guess a fix number of states (at most 18) and step by step (in parallel) the transitions. For instance, to check thatxy=yx is not satisfied, we guess states q,p₁,p₂,r₁,r₂such that (i)r₁6=r₂and (ii)q−→^x p₁−→^y r₁andq−→^y p₂−→^x r₂. This requires several reachability checks, where we also ensure that the guessed paths fromqtop₁and fromp₂tor₂are under the same label,x, and similarly for the paths fromp₁tor₁andqtop₂undery. It can be done by guessing the transitions for the four-tuple of labels in parallel. Namely, in the first step, the algorithm guesses a tuple of transitions(q−→^a p⁰₁,q−→^b p⁰₂,p₁−→^b r⁰₁,p₂−→^a r₂⁰), which ensures that the related path labels begin with the same letter. It then continues until the paths satisfying (ii) are found. This method can easily be extended to any such an equation, thus we have the following.

(10)

Theorem 18. To decide whether a minimal DFA recognizes a 3-PT language is NL-complete.

Open Problem 19. Is there a better characterization for 3-PT languages similar to that of 1-PT and 2-PT languages?

5. Complexity ofk-Piecewise Testability for NFAs

Thek-piecewise testability problem for NFAsasks whether, given an NFAA, the languageL(A)isk-PT.

Theorem 20. The k-piecewise testability problem for NFAs is PSPACE-complete.

Proof. Hunt III and Rosenkrantz [12] have shown that a propertyPof languages over{0,1}such that (i)P({0,1}^∗) is true and (ii) there exists a regular language that is not expressible as a quotientx\L, for someLfor whichP(L)is true, is as hard as to decide “={0,1}^∗”. Sincek-piecewise testability is such a property (the class ofk-PT languages is closed under quotient) and universality is PSPACE-hard for NFAs, the result implies thatk-piecewise testability for NFAs is PSPACE-hard.

We now prove membership. To do this, we show a co-NP upper bound for DFAs and use it to prove the rest of the theorem. Letw₁,w₂be two words such thatw₁4w₂. Letϕ:{1,2, . . . ,|w₁|} → {1,2, . . . ,|w₂|}be a monotonically- increasing mapping induced by an embedding ofw₁intow₂, that is, the letter at the j^thposition inw₁coincides with the letter at theϕ(j)^thposition inw₂. Any suchϕ is called awitness (of the embedding) of w₁in w₂. If we speak abouta letter a of w₂that does not belong to the range ofϕ, we mean an occurrence ofainw₂whose position does not belong to the range ofϕ.

LetBbe an NFA overΣ, and letA be the minimal DFA obtained fromBby the standard subset construction and minimization.

Claim 21. If there are two words w1,w2that are k-equivalent and lead to two different states from the initial state of A, such that w1is a subword of w2, then there exists a w⁰₂that is k-equivalent to w1leading to the same state as w2

such that w⁰₂contains at mostdepth(A)more letters than w₁.

Proof. Considerw₁,w₂from the statement. Letϕbe a witness ofw₁inw₂. Letabe a letter ofw₂that does not belong to the range ofϕ. We denotew₂=w_aaw^c_a. Ifiw_aa=iw_a, theniw_aw^c_a=iw₂. Sincea6∈range(ϕ),w₁is a subword ofw_aw^c_a. Thus,sub_k(w₁)⊆sub_k(w_aw^c_a)⊆sub_k(w₂), which proves thatw₁andw_aw^c_aarek-equivalent. By induction on the number of letters inw₂that do not belong to the range of the given witness ofw₁inw₂and that do not trigger a change of state inA, one can show that there exists a word equivalent tow₁and leading to the same state asw₂ that does not contain any such letter. Note that ifA were not acyclic,L(B)would not be piecewise testable. This can be checked in PSPACE. Since in a run of an acyclic automaton there are at mostdepth(A)changes of states, this concludes the proof.

Claim 22. If L(A)is not k-PT, there are two words w1,w₂such that (i) w1and w2are k-equivalent, (ii) the length of w1is at most k|Σ|^k, (iii) w1is a subword of w2, and (iv) w1and w2lead to two different states from the initial state.

Proof. IfL(A)is notk-PT, then there arew1andw2that arek-equivalent and lead to two different states from the initial state. We show that fori∈ {1,2}, there exists w⁰_i such thatwi∼_kw⁰_i and the length ofw⁰_i is at most k|Σ|^k. Letw_i^j denote the prefix ofw_iof length j, for any jsmaller than the length ofw_i. Assume that there exists jsuch thatsub_k(w_i^j) =sub_k(w_i^j+1). Then the letter at the(j+1)^thposition ofw_ican be removed while keeping the same set of subwords of lengthk. Thus there existsw⁰_iequivalent tow_i such that any two different prefixes ofw⁰_iare not k-equivalent. Sincesub_k(w⁰_i^j)(sub_k(w⁰_i^j+1), such aw⁰_icontains at most∑^k_n=1|Σ|ⁿ≤k|Σ|^kletters.

To complete the proof, there are two cases. Eitherw⁰₁andw⁰₂lead to the same state: then, without loss of generality, w⁰₁andw₁lead to two different states, which proves the claim. Orw⁰₁andw⁰₂lead to two different states: then consider w⁰such thatw⁰∼_kw⁰₁, and bothw⁰₁andw⁰₂are subwords ofw⁰, which exists by [25, Theorem 6.2.6]. Without loss of generality,w⁰₁andw⁰fulfill the required conditions.

Claim 23. The k-piecewise testability problem for DFAs belongs to co-NP.

(11)

Proof. One can first check whetherA recognizes a PT language. By Claim 22, ifL(A)is notk-PT, there exist two k-equivalent wordsw1andw2, with the length ofw1being at mostk|Σ|^k,w1being a subword ofw2, andw1andw2

leading the automaton to two different states. By Claim 21, one can choosew₂of length at mostdepth(A)bigger than the length ofw₁. A polynomial certificate for non-k-piecewise testability can thus be given by providing suchw₁ andw₂, which are of polynomial length in the size ofA andΣ.

We now continue to prove the theorem. By Claim 23 and the fact that NPSPACE=PSPACE=co-PSPACE, we can guess and store a wordw₁of length at mostk|Σ|^kand enumerate and store all words of length at mostk. There are

∑^ki=1|Σ|ⁱ such words, which is polynomial, sincekis a constant. First, we mark all of these words that appear as subwords ofw₁. Then we guess (letter by letter) a wordw₂such thatw₁is a subword ofw₂(which can be checked by keeping a pointer tow₁) and such that the length ofw₂is at most|w₁|+2ⁿ=O(2ⁿ), wherenis the number of states of the NFA. With each guess of the next letter ofw₂, we correspondingly move all the pointers to all the stored subwords to keep track of all subwords ofw₂. We accept ifw₁andw₂have the same subwords,w₁is a subword ofw₂, andw₁ andw₂lead the minimal DFAA to two different states. Because of the space limits the minimal DFAA cannot be stored in memory, but must be simulated on-the-fly while the wordw2is being guessed. The state ofA defined by w2can then be compared with the state defined byw1.

Open Problem 24. What is the complexity ofk-piecewise testability for NFAs ifkis given as input?

6. Piecewise Testability and the Depth of NFAs

We now generalize the structural automata characterization of Fact 2 to NFAs. Then we investigate the relationship between the depth of an NFA and the minimalkfor which its language isk-PT and show that the upper bound onk given by the depth of the minimal DFA can be exponentially far from minimality.

6.1. The UMS property and NFAs

We say that an NFAA over an alphabetΣiscompleteif for every stateqofA and every lettera∈Σ, the setq·a is nonempty, that is, in every state, a transition under every letter is defined.

Theorem 25. A regular language is piecewise testable if and only if there exists a complete NFA that is partially ordered and satisfies the UMS property.

Proof. If a regular language is PT, then its minimal DFA is partially ordered and satisfies the UMS property by [31].

To prove the other direction, letA = (Q,Σ,·,I,F)be a complete partially ordered NFA that satisfies the UMS property. LetDbe the minimal DFA computed fromA by the standard subset construction and minimization. We represent every state ofDby a nonempty set of states ofA.

Claim 26. The minimal DFADis partially ordered.

Proof. LetX={p₁,p₂, . . . ,p_n}with p_i<p_j for i<j be a state ofD, and letw∈Σ^∗be such thatX·w=X. By induction onk=1,2, . . . ,n, we show thatp_iw={p_i}. Assume thatp_iw={p_i}for alli<k. We prove it fork. Since X=X w=∪ⁿ_i=1p_iw,p_k≤p_kwandp_iw={p_i}fori<k, we have thatp_k∈p_kw. Thus, alph(w)⊆Σ(p_k)and the UMS property ofA implies thatp_kw={p_k}. Therefore,p_ia={p_i}for everya∈alph(w)andi=1,2, . . . ,n. If, for any stateY ofDand any wordsw₁andw₂,X w₁=Y andY w₂=X, the previous argument gives thatX=Y, henceDis partially ordered.

Claim. The minimal DFADsatisfies the UMS property.

Proof. AsD is deterministic, for every stateX ofD,X is a maximal state ofG(D,Σ(X)). Assume, for the sake of contradiction, that there exist two different statesXandY in the same component ofDthat are maximal with respect to alphabetΣ(X). That is, there exist a stateZinDand two wordsuandvoverΣ(X)such thatX=ZuandY =Zv.

IfX\Y 6=/0, letx∈X\Y andz∈Zbe such thatx∈zu. Sincexdoes not belong toY, we have thatx∈/zv. Note that zv6=/0, sinceA is complete. Lety∈zvbe fixed, but arbitrarily. (IfX\Y =/0, then there isy∈Y\X. In this case, letz∈Z be such thaty∈zv. Theny∈/zu,zu6=/0, and we fix an arbitraryx∈zu.) In any case,x6=y. Sincex∈X,

(12)

0 1

2 a₁

a0

a0,a1

a2

a₂ 3 a₃ 2 a₂ 1 a₁ 0

a3

a₃ a2

a0,a₁,a₂ a0,a₁ a0

Figure 5: AutomataA2andA3.

y∈Y, andX andY are maximal with respect toΣ(X), a similar argument as in Claim 26 shows thatxa={x} and ya={y}for anya∈Σ(X). Thus, we have thatΣ(X)⊆Σ(x)∩Σ(y). By the UMS property ofA,xmust be reachable fromybyΣ(x), hencey≤x, andymust be reachable fromxunderΣ(y), hencex≤y. Therefore,y=x, which is a contradiction.

Thus, the minimal DFADis partially ordered and satisfies the UMS property. Fact 2 now completes the proof.

As it is PSPACE-complete to decide whether an NFA defines a PT language, it is PSPACE-complete to decide whether, given an NFA, there is an equivalent complete NFA that is partially ordered and satisfies the UMS property.

More details on these automata can be found in [22].

6.2. Exponential Gap between k-PT and the Depth of Minimal DFAs

It was shown in [19] that the depth of minimal DFAs does not correspond to the minimalkfor which the language isk-PT. Namely, an example of (4`−1)-PT languages with the minimal DFA of depth 4`², for ` >1, has been presented. We now show that there is an exponential gap between the minimalkfor which the language isk-PT and the depth of a minimal DFA.

Theorem 27. For every n≥1, there exists an n-PT language that is not(n−1)-PT, it is recognized by an NFA of depth n−1, and the minimal DFA recognizing it has depth2ⁿ−1.

Proof. For everyk≥0, we define the NFAAk= ({0,1, . . . ,k},{a₀,a1, . . . ,a_k},·,I_k,{0})withI_k={0,1, . . . ,k}and the transition function·consisting of self-loops underaiin all states j>iand transitions underaifrom stateito all states j<i. Formally,i·a_j=iifk≥ j>i≥0 andi·a_i={0,1, . . . ,i−1}ifk≥i≥1. AutomataA2andA3are shown in Figure 5. Note thatAkis an extension ofAk−1, in particular,L(Ak−1)⊆L(Ak).

We define the wordw_kinductively byw₀=a₀andw_`=w_`−1a_`w_`−1, for 0< `≤k. Note that|w_`|=2^`+1−1.

In [11], we have shown that every prefix ofw_kof odd length ends witha₀and, therefore, does not belong toL(Ak), while every prefix of even length belongs toL(Ak). For convenience, we briefly recall the proof here. The empty word belongs toL(A0)⊆L(Ak). Letvbe a prefix ofw_kof even length. If|v|<2^k−1, thenvis a prefix ofwk−1and, by the induction hypothesis,v∈L(Ak−1)⊆L(Ak). If|v|>2^k−1, thenv=wk−1a_kv⁰. The definition ofAkand the induction hypothesis then yield that there is a pathk−^w−−^k−1→k−^a→^k (k−1)−→^v⁰ 0. Thus,vbelongs toL(Ak).

Letdet(Ak)denote the minimal DFA recognizing the languageL(Ak)obtained fromAkby the standard subset construction and minimization.

Claim. For every k≥0, the depth ofdet(Ak)is2^k+1−1.

Proof. By induction onk. Fork=0,det(A0) = ({{0},/0},{a₀},·,{0},{0})has two states, accepts the single wordε, anda₀goes from the initial stateI₀={0}to the sink state /0. Thus, it has depth 1 as required. Consider the wordw_k= wk−1a_kwk−1fork>0. By the induction hypothesis, there exists a simple path of length 2^k−1 indet(Ak−1)defined by the wordwk−1starting from the initial state I_k={0,1, . . . ,k−1}and ending in state /0. Let Q0,Q1, . . . ,Q₂k−1

denote the states of that simple path in the order they appear on the path, that is,Q0=I_k,Q₂k−1=/0, andQi⊆Q0