Sparse matrix-matrix multiplication - Complexity of the AMG preprocess

5.3 Complexity of the AMG preprocess

5.3.2 Sparse matrix-matrix multiplication

Since it is a rarely found routine in matrix libraries, we shortly want to look at the way the product of two sparse matrices is computed in our library as well as the according time and space complexity.

Let A ∈ R^n×m, B ∈ R^m×k be arbitrary sparse matrices. As it is exposed in Section 3.7.2, the sparsity structure of AB, is computed and stored in a temporary data structure.

Generation of the product sparsity pattern

The algorithm for multiplying two sparse matrices depends on their ordering and orientation. For example, for two row wise oriented matrices, we just need random access on the rows of B, then the most efficient ordering of the loops is depicted in Algorithm 3.

Algorithm 3: Sparse matrix multiplication for two row-wise oriented matrices Input: Row wise oriented sparse matricesA∈R^n×m,B ∈R^m×k

Output: Sets ¯Ni(C), i= 1, . . . , n, representing the sparsity pattern of the productAB for i= 1, . . . , n do

for j∈N¯i(A) do for l∈N¯j(B)do

N¯i(C) := ¯Ni(C)∪l

c_il :=c_il+a_ijb_jl

end end end

The problems with, respectively the requirements for, the set ¯Ni(C) in line 1 are the following:

1. We don’t know in advance how big the set is going to be.

2. The data structure that implements the set has to ensure that it doesn’t store dublicates of an indexl(stemming from different rows j ofB).

3. At the end of the procedure, the indices in the data structure for ¯Ni(C) should be ordered increasingly.

Two possibilities come into question. One would be to use linked lists with ordered inserting:

the new insert position is determined by a binary search, leading to inserting costs of approximately O(log(#N¯i(C)

)). Another would be to usered-black trees.

We made the decision for the latter one, since the run-time for inserting is approximately the same as for linked lists³, however it promised to be easier to implement, since we can directly use a std::setresp. std::mapfrom the C++ STL . These data structures are implemented as red-black trees, and thus inserting a variable linto the set ¯Ni(C) costs O(log(#N¯i(C)

)). Theoperator[]() of std::mapallows us to combine the two lines 1 and 2 in Algorithm 3 into one line of code.

In the worst case, the sets ¯Nj(B) are disjoint for all j ∈ N¯i(A). Then the number of variables

Lemma 5.3.6. The time complexity of Algorithm 3 is bounded above by

O(log(Πⁿ_i=1m_i!)). (5.64)

Proof. With mi being the number variables added in the two inner loops for each row i of A, we immediately have:

Note that the space requirement isO(Pn

i=1m_i) integer values.

In this thesis we are especially interested in the complexity of the Galerkin product P^TAP (cf.

Definition 5.2.1), where A ∈ R^n×n is a matrix stemming from a finite element discretization and P ∈R^n×k, k < n, the according prolongation matrix.

From now on, we abbreviate ¯Nj := ¯Nj(A),C_j :=C∩N¯j and F_j :=F∩N¯j. If we consider direct interpolation only, then we have that Pj ⊂N¯j forj ∈F. For j ∈C the setPj even only consists of one element. If the splitting Algorithm 2 was applied successfully, i.e. after its termination, the setU is empty, then allF-variables are strongly negative coupled with at least oneC-variable. In any case, sinceF-variables are interpolated through surroundingC-variables, we have that

#N¯j(P)

yields

m_i = # C_i

j∈Fi

# Pj

≤r_C+r_Fp_max≤r_C +r_Fr_C < r_max+r_max² .

Inserting this in the result from the last lemma leads to an amount of

O(nlog((r_max+r_max² )!)) (5.65)

for an upper bound. Taking a closer look on m_i and setting ˜r:=r_max+ε=r_C+r_F, with someε >0 we might also estimate

rC +rFrC = ˜rq+ (˜r)²(q−q²)<(˜r+ (˜r)²)q,

with q = ^r_˜_r^C. In practice, we often have ˜r ≈ r_max and the quotient q then is an indicator for the amount of coarsening. In this case, (5.65) becomes

O(nlog(((rmax+r_max² )q)!)) =O(nρlogρ). (5.66) with the constant

ρ= (r_max+r_max² )q. (5.67)

Now we would like to take a look on the productP^TB, whereB =AP is already computed. Since we store the transpose of P as a column wise oriented matrix, we first examine the general algorithm for generating such a sparsity pattern. In order to formulate the method, we use the transpose neighbourhood (cf. 5.2.16).

Definition 5.3.7 (Transpose neighbourhood). ForA∈R^n×m we define the transpose neighbourhood of a variable i∈Ω ={1, . . . , m} as

N¯i^T = ¯Ni^T(A) :={j∈ {1, . . . , n} |a_ji6= 0}.

It is of course ¯Ni^T(A) = ¯Ni(A^T). The following prodedure varies only in the way it exploits the iteration directions of its matrices – mathematically it is the same as the last one.

Algorithm 4: Sparse matrix multiplication for column-wise/row-wise oriented matrices Input: One column wise oriented sparse matrixA∈R^n×m,

one row wise oriented sparse matrixB ∈R^m×k

Output: Sets ¯Nj(C),j= 1, . . . , n, representing the sparsity pattern of the productC=AB for i= 1, . . . , mdo

for j∈N¯i^T(A) do for l∈N¯i(B) do

N¯j(C) := ¯Nj(C)∪l c_il :=c_il+a_ijb_jl end

end end

As said above, similar considerations as for Algorithm 3 lead to the same costs for this algorithm since merely the loops are interchanged. If we assume, that the left matrix has at least one entry in every row, then for P^TB,B =AP, this leads to a complexity of

i=1

O(log(k_i!)) with k_i := X

j∈N¯_i^T(P)

#N¯j(AP) .

The question now is, how big is the set ¯N_i^T(P) = ¯Ni(P^T) ? Closely related is: how many F -variables interpolate from i? This number is bounded above byr_F. Thus we have

N¯i^T(P)≤r_F + 1,

and therefore

k_i= X

j∈N¯i(P)

m_j ≤(r_F + 1)(r_C+r_Fr_C) = (r_F + 1)²r_C. And so the complexity is of order

O(klog(((r_F + 1)²r_C)!)), or in terms of ˜r and q:

O(klog(r^′!)), with r^′ = (˜r(1−q) + 1)²q˜r.

And because of

r^′ ≤r˜³(1−(1−q)³), we have the following result

Lemma 5.3.8. The time complexity of Algorithm 4 for generating the matrix product P^TB with B =AP, A∈R^n×n where P ∈R^n×k is a prolongation matrix, is bounded above by

O(kρlogρ). (5.68)

with the constant ρ:= ˜r³(1−(1−q)³).

Since q <1 andk=n· ^# C

F, we can summarize the results in

Proposition 5.3.9 (Complexity of the Galerkin product). The complexity of the Galerkin product P^TAP as defined in (5.16), with A∈R^n×n andP ∈R^n×k is of order

O(n˜r³log ˜r³). (5.69)

Numerical Results

Im Dokument Generic Programming and Algebraic Multigrid for Stabilized Finite Element Methods (Seite 105-109)