4. Mathematical and Statistical Description of the Adjustment Problem
4.3 Numbering Schemes and Reordering
4.3. Numbering Schemes and Reordering 39
Index Vector for Reordering A first quantity to derive is the so called index vectoripppf7→pppt, the entry at position i contains the index, the coefficient pppt(i) can be found inpppf, and thus pppt(i) = p
p
pf(ipppf7→pppt(i)). Given a matrix Ainpppf, its rows are reordered topppt via
Apppt =Apppf(ipppf7→pppt,:), (4.23)
its columns via
Apppt =Apppf(:,ipppf7→pppt), (4.24)
and for quadratic matrices rows and columns via
Apppt =Apppf(ipppf7→pppt,ipppf7→pppt), (4.25) using the well known MatLab/Octave like notation. The index vectoripppf7→pppt can be computed with the descriptive Alg. 4.1. which works for both cases, i.e. pppf⊆pppt orpppt⊆pppf.
Note that if pppf ⊆ pppt, the matrix to be reordered will be extended by 0 rows and/or columns to obtain the correct size ofpppt.size(), the index vector for this parameters not originally contained in pppf are set to values corresponding to these 0 rows and/or columns at the end of the matrices (variable ein Alg. 4.1).
For the case pppt ⊆ pppf, the index vector is extended to the dimension of pppf. The last entries i.e.
p p
pf.size()topppt.size()−1are set to the indices of the entry itself to guarantee, that the coefficients not contained in pppt are reordered to the end of the matrix. Applying the reordering to a matrix, these last rows/columns are deleted after the reordering.
Although Alg. 4.1 can be easily understood, it is not very efficient due to the “find” operation, i.e.
a search operation in a vector which might have millions of entries for huge dimensional parameter spaces. An alternative, i.e. a more efficient way can be performed via sorting the numbering scheme.
This much faster algorithm is summarized in Alg. 4.2. It reduces the complexity from O(n2/2)to O(nlog(n)). For a test case of two numbering schemes of520 000parameters, the runtime is reduced from 706 sto 0.2 s.
Permutation Vector for Reordering An alternative notation/operation, which is better suited for the block-cyclic distributed matrices, is a so called permutation or pivoting vector. In contrast to an index vector, where the column and or row interchanges are assumed to be performed si-multaneously, a permutation vector (at least as used here) contains a sequence of serial row and column permutations i.e. a sequential swapping of two rows/columns starting at begin (index 0) of the vector. In contrast to the index vector, already performed swapping operations are taken into account in the representation. An entry in the permutation vector at position i means that the row i is swapped with row ψψψ(i). To be more precise, the current content of row/column i is swapped with the current content of row/column ψψψ(i). Note that the content might change with every swapping operation. Now, the old entry of position i is in row ψψψ(i), thus the index vector needs to be updated. A remaining entry i in the subsequent elements ofipppf7→pppt has to be replaced by the entryψψψ(i). The procedure to convert an index vector to a permutation vector is summarized in Alg. 4.3.
A more efficient but less obvious algorithm is shown in Alg. 4.4. The basic idea is to avoid the search operation via introducing a second vector which stores the position of an entrykin the vector. This reduces the complexity fromO(n2/2)to O(2n). The runtime is reduced from 59 s to 0.01 s, for an example index vector with 520 000entries.
To apply a permutation vector to rows/and or columns, the operator ΨΨΨpppf7→pppt(·) is defined. This operator performs the serial permutations of rows (ΨΨΨrppp
f7→pppt) as given by the vector ψψψpppf7→pppt, the operator ΨΨΨcppp
f7→pppt performs the column interchanges and ΨΨΨr,cpppf7→pppt performs the interchanges for rows and columns.
Algorithm 4.1: Simple version to compute an index vector from two symbolic numbering schemes.
Data:
NumberingSchme pppfrom Symbolic numbering scheme source matrix is ordered in NumberingSchme pppinto Symbolic numbering scheme matrix should be reordered to // initialization of index vector
1
vector<size_t> ippp
from7→pppinto(pppinto.size(),0 )
2
// start value for fill in indices for parameters inpppinto but not inpppfrom, inserted at the end
3
size_te=pppfrom.size()
4
// loop over all parameters inpppinto
5
fork= 0 topppinto.size() do
6
// find index of parameterpppinto(i) inpppfrom
7
i=find(pppfrom.begin(), pppfrom.end(), pppinto(i) )
8
// if parameter found insert index i, otherwise fill in value outside ofpppfrom.size()
9
if i<pppfrom.size() then
10
ipppfrom7→pppinto(k) =i
11
else
12
ipppfrom7→pppinto(k) =e
13
e+ +
14
end
15
end
16
// special case ifpppinto ⊆pppfrom: extend index vector to size ofpppfrom 17
if pppinto.size()<pppfrom.size() then
18
ippp
from7→pppinto.resize(pppfrom.size())
19
// the remaining parameters are sorted to the end as they are not contained inpppinto,
20
// applied to a matrix, these last rows/columns will be removed
21
fork=pppinto.size() topppfrom.size() do
22
ipppfrom7→pppinto(k) =k
23
end
24
end
25
returnippp
from7→pppinto// index vector performing reordering from pppfromtopppinto
26
Example Tab. 4.1 gives a small scale example of reordering and permutation for two symbolic numbering schemes. Index and permutation vector are provided for both directions. In addition, the first steps of a sequential permutation are characterized.
4.3.3 Reordering of Block-cyclic Distributed Matrices
If the permutation vector between two numbering schemes is known, these permutations have to be applied to block-cyclic distributed matrices to reorder the matrices to the target numbering scheme.
The reason why the permutation vector was introduced is, that there exist a SCALAPACK helper routine, which is internally used for pivoting during the solution of a system of equations. The routinepdlapiv can be directly used to perform the reordering of a distributed matrix, where the input is a permutation vector (The usage of pdlaswp is also possible). The routine applies the permutation as given by ψψψpppf7→pppt to either rows or columns. Applying the function twice, first to permute rows and secondly to permute columns, both are reordered. Beside the standard input of the block-cyclic distribution of the matrix, the function requires the input of the permutation vector in a special form. The permutation vector (consisting of integers only) needs to be passed as a block-cyclic distributed vector of integers, such that the local entryψψψplpp
f7→pppt(i) contains the global row/column, the local row/columnihas to be swapped with. The serial integer vector is brought to a block-cyclic distributed vector as introduced for the block-cyclic distributed matrices in Sect. 3.3.
4.3. Numbering Schemes and Reordering 41
Algorithm 4.2: Fast version to compute an index vector from two symbolic numbering schemes.
Data:
NumberingSchme pppfrom Symbolic numbering scheme source matrix is ordered in NumberingSchme pppinto Symbolic numbering scheme matrix should be reordered to // initialization of index vector
1
vector<size_t> ippp
from7→pppinto(pppinto.size(),0 )
2
// start value for fill in indices for parameters inpppinto but not inpppfrom, inserted at the end
3
size_te=pppfrom.size()
4
// store current index of parameter in helping atribut_iof each individual parameter
5
p p
pfrom.setIndex()
6
// sort numbering schme with implemented sort function (operator<)
7
p p
pfrom.sort()
8
// loop over all parameters inpppinto
9
fork= 0topppinto.size() do
10
// find index of parameterpppinto(i) inpppfrom in sorted numbering scheme
11
i=find(pppfrom.begin(), pppfrom.end(), pppinto(i) )
12
// if parameter found, insert indexi, otherwise fill in value outside ofpppfrom.size()
13
if i<pppfrom.size() then
14
ipppfrom7→pppinto(k) =pppfrom.p(i).i()
15
else
16
ipppfrom7→pppinto(k) =e
17
e+ +
18
end
19
end
20
// special case ifpppinto⊆pppfrom: extend index vector to size ofpppfrom
21
if pppinto.size()<pppfrom.size() then
22
ippp
from7→pppinto.resize(pppfrom.size())
23
// the reamaining parameters are sorted to the end as they are not contained inpppinto,
24
// applied to a matrix, these last rows/columns will be removed
25
fork=pppinto.size() topppfrom.size() do
26
ipppfrom7→pppinto(k) =k
27
end
28
end
29
returnipppfrom7→pppinto// index vector performing reordering from pppfrom topppinto
30
The reordering operations are implemented in the member functions reorder, reorderCols and reorderRows of theDistributedMatrixclass in Listing 3.2.
Fig. 4.1 gives an overview about the required runtime for the reordering of rows and columns of distributed matrices of different dimension on different quadratic processor grids (for the distribution parameters default values ofbr =bc= 64were used) to get an idea of the order of magnitude. The index vector was randomly generated (random shuffle of an index vector). The main conclusions are:
i) the reordering of columns is much faster than the reordering of rows (factor of three to ten). This could be expected as the column access in memory is much faster using the column major order for matrices (cf. Sect. 2.2.1) for the locally stored matrices. ii) for matrices of dimension lower than 20 000×20 000 the reordering is performed in less than 1 s on all grids. For the reordering of columns, this even holds for matrices smaller than 80 000×80 000. Although there is no real scaling behavior of the reordering operations with the number of cores (cf. Fig. 4.1(b)), the most important thing is that the performance increases on larger compute core grids and does not drop to additional organizational requirements (at least for matrices above dimension10 000×10 000the scaling is above 1.0 for all cases analyzed).
Algorithm 4.3: Simple version to convert an index vector to a permutation vector.
Data:
NumberingSchme ipppfrom7→pppinto index vector to be converted to permutation vector // initialization of permutation vector
1
vector<size_t> ψψψpppfrom7→pppinto =ipppfrom7→pppinto
2
size_tp= 0;// loop over entries of index vector
3
fork= 0 to ippp
from7→pppinto.size() do
4
// find current index in the subsequent part of index vector
5
p=find(pppfrom.begin()+k, pppfrom.end(), k)
6
// if index found, replace it by its new positionψψψpppfrom7→pppinto(k)(after swapping)
7
if found then
8
ψψψpppfrom7→pppinto(p) =ψψψpppfrom7→pppinto(k)
9
end
10
end
11
returnψψψpppfrom7→pppinto// sequential permutation vector corresponding to ipppfrom7→pppinto
12
10+2 10+3 10+4 10+5
10−2 10−1 10+0 10+1 10+2
# rows and columns of matrix to be reordered (R=C)
measuredruntimeforreordering(s)
reordering of rows (grid: 16×16 = 256) reordering of columns (grid: 16×16 = 256) reordering of rows (grid: 24×24 = 576) reordering of columns (grid: 24×24 = 576) reordering of rows (grid: 32×32 = 1024) reordering of columns (grid: 32×32 = 1024) reordering of rows (grid: 48×48 = 2304) reordering of columns (grid: 48×48 = 2304)
(a) Mean absolute runtime measured for the reordering.
100+2 10+3 10+4 10+5
2 4 6 8 10
# rows and columns of matrix to be reordered (R=C)
scalingbehaviornormalizedto256cores[]
reordering of rows (grid: 24×24 = 576) reordering of columns (grid: 24×24 = 576) reordering of rows (grid: 32×32 = 1024) reordering of columns (grid: 32×32 = 1024) reordering of rows (grid: 48×48 = 2304) reordering of columns (grid: 48×48 = 2304)
(b) Scaling behavior of reordering operations.
Figure 4.1: Runtime analysis of the row (?) and column (◦) reordering operations (index vector randomly generated). The colors represent different dimensions of the processor grid.
The scaling is normalized to 256 cores and given for every matrix dimension.