• Keine Ergebnisse gefunden

5.3 Complexity of the AMG preprocess

5.3.2 Sparse matrix-matrix multiplication

Since it is a rarely found routine in matrix libraries, we shortly want to look at the way the product of two sparse matrices is computed in our library as well as the according time and space complexity.

Let A ∈ Rn×m, B ∈ Rm×k be arbitrary sparse matrices. As it is exposed in Section 3.7.2, the sparsity structure of AB, is computed and stored in a temporary data structure.

Generation of the product sparsity pattern

The algorithm for multiplying two sparse matrices depends on their ordering and orientation. For example, for two row wise oriented matrices, we just need random access on the rows of B, then the most efficient ordering of the loops is depicted in Algorithm 3.

Algorithm 3: Sparse matrix multiplication for two row-wise oriented matrices Input: Row wise oriented sparse matricesA∈Rn×m,B ∈Rm×k

Output: Sets ¯Ni(C), i= 1, . . . , n, representing the sparsity pattern of the productAB for i= 1, . . . , n do

for j∈N¯i(A) do for l∈N¯j(B)do

i(C) := ¯Ni(C)∪l

1

cil :=cil+aijbjl

2

end end end

The problems with, respectively the requirements for, the set ¯Ni(C) in line 1 are the following:

1. We don’t know in advance how big the set is going to be.

2. The data structure that implements the set has to ensure that it doesn’t store dublicates of an indexl(stemming from different rows j ofB).

3. At the end of the procedure, the indices in the data structure for ¯Ni(C) should be ordered increasingly.

Two possibilities come into question. One would be to use linked lists with ordered inserting:

the new insert position is determined by a binary search, leading to inserting costs of approximately O(log(#N¯i(C)

)). Another would be to usered-black trees.

We made the decision for the latter one, since the run-time for inserting is approximately the same as for linked lists3, however it promised to be easier to implement, since we can directly use a std::setresp. std::mapfrom the C++ STL . These data structures are implemented as red-black trees, and thus inserting a variable linto the set ¯Ni(C) costs O(log(#N¯i(C)

)). Theoperator[]() of std::mapallows us to combine the two lines 1 and 2 in Algorithm 3 into one line of code.

In the worst case, the sets ¯Nj(B) are disjoint for all j ∈ N¯i(A). Then the number of variables

Lemma 5.3.6. The time complexity of Algorithm 3 is bounded above by

O(log(Πni=1mi!)). (5.64)

Proof. With mi being the number variables added in the two inner loops for each row i of A, we immediately have:

Note that the space requirement isO(Pn

i=1mi) integer values.

In this thesis we are especially interested in the complexity of the Galerkin product PTAP (cf.

Definition 5.2.1), where A ∈ Rn×n is a matrix stemming from a finite element discretization and P ∈Rn×k, k < n, the according prolongation matrix.

From now on, we abbreviate ¯Nj := ¯Nj(A),Cj :=C∩N¯j and Fj :=F∩N¯j. If we consider direct interpolation only, then we have that Pj ⊂N¯j forj ∈F. For j ∈C the setPj even only consists of one element. If the splitting Algorithm 2 was applied successfully, i.e. after its termination, the setU is empty, then allF-variables are strongly negative coupled with at least oneC-variable. In any case, sinceF-variables are interpolated through surroundingC-variables, we have that

#N¯j(P)

yields

mi = # Ci

+X

j∈Fi

# Pj

≤rC+rFpmax≤rC +rFrC < rmax+rmax2 .

Inserting this in the result from the last lemma leads to an amount of

O(nlog((rmax+rmax2 )!)) (5.65)

for an upper bound. Taking a closer look on mi and setting ˜r:=rmax+ε=rC+rF, with someε >0 we might also estimate

rC +rFrC = ˜rq+ (˜r)2(q−q2)<(˜r+ (˜r)2)q,

with q = r˜rC. In practice, we often have ˜r ≈ rmax and the quotient q then is an indicator for the amount of coarsening. In this case, (5.65) becomes

O(nlog(((rmax+rmax2 )q)!)) =O(nρlogρ). (5.66) with the constant

ρ= (rmax+rmax2 )q. (5.67)

Now we would like to take a look on the productPTB, whereB =AP is already computed. Since we store the transpose of P as a column wise oriented matrix, we first examine the general algorithm for generating such a sparsity pattern. In order to formulate the method, we use the transpose neighbourhood (cf. 5.2.16).

Definition 5.3.7 (Transpose neighbourhood). ForA∈Rn×m we define the transpose neighbourhood of a variable i∈Ω ={1, . . . , m} as

iT = ¯NiT(A) :={j∈ {1, . . . , n} |aji6= 0}.

It is of course ¯NiT(A) = ¯Ni(AT). The following prodedure varies only in the way it exploits the iteration directions of its matrices – mathematically it is the same as the last one.

Algorithm 4: Sparse matrix multiplication for column-wise/row-wise oriented matrices Input: One column wise oriented sparse matrixA∈Rn×m,

one row wise oriented sparse matrixB ∈Rm×k

Output: Sets ¯Nj(C),j= 1, . . . , n, representing the sparsity pattern of the productC=AB for i= 1, . . . , mdo

for j∈N¯iT(A) do for l∈N¯i(B) do

j(C) := ¯Nj(C)∪l cil :=cil+aijbjl end

end end

As said above, similar considerations as for Algorithm 3 lead to the same costs for this algorithm since merely the loops are interchanged. If we assume, that the left matrix has at least one entry in every row, then for PTB,B =AP, this leads to a complexity of

Xk

i=1

O(log(ki!)) with ki := X

j∈N¯iT(P)

#N¯j(AP) .

The question now is, how big is the set ¯NiT(P) = ¯Ni(PT) ? Closely related is: how many F -variables interpolate from i? This number is bounded above byrF. Thus we have

iT(P)≤rF + 1,

and therefore

ki= X

j∈N¯i(P)

mj ≤(rF + 1)(rC+rFrC) = (rF + 1)2rC. And so the complexity is of order

O(klog(((rF + 1)2rC)!)), or in terms of ˜r and q:

O(klog(r!)), with r = (˜r(1−q) + 1)2q˜r.

And because of

r ≤r˜3(1−(1−q)3), we have the following result

Lemma 5.3.8. The time complexity of Algorithm 4 for generating the matrix product PTB with B =AP, A∈Rn×n where P ∈Rn×k is a prolongation matrix, is bounded above by

O(kρlogρ). (5.68)

with the constant ρ:= ˜r3(1−(1−q)3).

Since q <1 andk=n· # C

#

C

+#

F, we can summarize the results in

Proposition 5.3.9 (Complexity of the Galerkin product). The complexity of the Galerkin product PTAP as defined in (5.16), with A∈Rn×n andP ∈Rn×k is of order

O(n˜r3log ˜r3). (5.69)

Numerical Results