results - Data mining in GRACE monthly solutions

7. replacement introducesomission error,δP^mon_y , which is computed as,

δp^mon_y =:|p^mon_y −p⁰_y^mon| . (3.10) For optimal threshold, theδp^mon_y must be smaller than the formal errorΣ^mon_y , which are also available on grace science team data server (f.c. Chapter 2).

8. compute ewh field of omission errorM(δp^mon_y )and find its maximummax(M(δp^mon_y )), 9. compute ewh field of the formal errorM(Σ^mon_y ) of and find its maximummax(M(Σ^mon_y )),

for ewh synthesis see Appendix A

10. compare the two maximum values and verify thatmax(M(δp^mon_y ))<max(M(Σ^mon_y )), if not, decrease the threshold and repeat the process from step5.

An iterative program checks the condition given in the step10.for the threshold selection proced-ure and reaches the optimal threshold values for each month. It is shown in Figproced-ure 3.4 that after the selection of optimal threshold value for every month, the maximum value of the omission er-ror field is less than the maximum value of the formal erer-ror field. The optimal threshold values of each month are given in Appendix B.

2002 2004 2006 2008 2010 2012 2014 2016 2018

Years 0.01

0.02 0.03 0.04 0.05

Water equivalent height (meter)

max. formal error max. omission error

Figure 3.4: Formal errorΣ^mon_y versus omission errorδP^mon_y at filter cap 500 km. Omission error remains below the formal error.

3.5 results

grace monthly coefficients are classified into essential and nonessential classes using the threshold values given in Table B, Appendix B. Figure 3.5 shows the number of essential coefficients in each month for the period from April 2002 to June 2017. It is clear from the figure that the essential class have the most 7141 coefficients in May. 2003 and the least 6204 in Nov. 2008, or in other words there are 937 coefficients which change their class, called unclassified.

2002 2004 2006 2008 2010 2012 2014 2016 2018 Years

6000 6200 6400 6600 6800 7000 7200

No. of essentional SH coefficients

Figure 3.5: Number of essential coefficients in every month, from Apr. 2002 to Jun. 2017.

The plots in Figure 3.6 shows the essential coefficients in sc format for three different months, i.e. minimum to maximum, from left to right. It shows that a region around the zonal coefficients starting from the degree∼15 to ∼75 of the order up to∼10 belong to the nonessential class, the coefficients with very minute information of the gravity variation. Rest of the coefficients around the nonessential class are essential coefficients, carrying the most information of the gravity vari-ation.

−90 −45 0 45 90

order

degree

a b

(a) minimum

−90 −45 0 45 90

order

degree

a b

(b) moderate

−90 −45 0 45 90

order

degree

a b

Figure 3.6: Essential class varying in size, from left to right, minimum to maximum.

Figure 3.7 shows the unclassified, essential and nonessential coefficients in sc format. Unclas-sified coefficients, in a year, belong to one group and in some other year belong to another group.

The unclassified coefficients will be analyzed using theknearest neighbor supervised technique in chapter 4. The coefficients in Clusters b. and c. does not change their clusters, whereas Figure 3.8 shows the number of coefficients 937, 6204 and 1136 in unclassified, essential and nonessential classes, respectively.

These results bring us to the end of the unsupervised classification, wherek-means identi-fies the existence of groups in the dataset, threshold classification confirms the results and yearly variance of coefficients enables to label the classes on the base of their information contents. Mean-while, a group of coefficients is also identified in the region between the two classes. These

coef-3.5. results

-90 -45 0 45 90

order

45 90

degree

u a b

Figure 3.7: u) Unclassified, a) essential and b) nonessential classes in sc format

u a b

1000 3000 5000 7000

Figure 3.8: Number of coefficients, 937, 6204 and 1136 in u) Unclassified, a) essential and b) nonessential classes, respectively.

ficients change their classes when the classification results of different months are analyzed. In the next chapter, the knearest neighbors (knn) supervised classification method is discussed to bifurcate the unclassified coefficients into essential and nonessential classes.

supervised classification 4

T

he objective of the classification process is to segregate the sh coefficients into two groups, i.e.

essential and nonessential. In the previous chapter clustering and threshold classification are successfully utilized to analyze the behavior of the sh coefficients and to divide the coefficients into two groups. Meanwhile, we have found a group of sh coefficients changing their class, which means that during the analysis of one year they belong to one group and during the analysis of some other year they belong to the other group. This group is called as unclassified class. Es-sential and nonesEs-sential coefficients can be used to train the classifier to classify the unclassified coefficients too. This kind of classification is calledsupervised classification. In the following sec-tions, supervised techniques are used to analyze the grace data. Followings are the supervised classification methods used in this study:

• knearest neighbor (knn)and

• artificial neural networks (ann).

In this chapter, knn segregates the unclassified coefficients into essential and nonessential classes and Chapter 6 using ann.

4.1 k nearest neighbor (knn)

If the classification problem belongs to a category where a sample of the classified dataset is avail-able without any information about the distribution the best way to classify the candidate point is to look its neighborhood. This kind of problems belongs to the field of nonparametric statist-ics (Cover and Hart, 1967). It is reasonable to believe that the points which are close to each other belong to the same class. knn has been successfully implemented for several classification tasks including pattern recognition, character recognition, and object and event recognition. Though knn is very simple and efficient it has some limitation and disadvantages related to memory

re-quirements and complexities. Many techniques are developed to overcome these issues (Bhatia and Vandana, 2010).

To start the process, we need a training dataset which has several points related to each target class. Each candidate point is classified on the bases of its distance from all training points. The process starts by calculating the distance between all training points and the given candidate point, sorting the distance in increasing order and counting the number of majority class of theknearest training points and finally giving the label of majority class to the candidate point. The algorithm as expressed by Kozma (2008), Mirkes (2011) and Sutto (2012) is given in the following list as,

a. a positive integerkis specified, along with a new sample,

b. select thek entries in the training database which are closest to the new sample, c. find the majority or most common classification of these entries,

d. label the sample with the majority class.

The critical issue is to specify the number ofkneighbors. To avoid the tie,kequals one more than the number of classes in the sample dataset is selected. In the following section, a brief description of the sample and target dataset is given; afterwards, in Section 4.3, results are presented.

Im Dokument Data mining in GRACE monthly solutions (Seite 39-44)