Dynamic Knowledge Acquisition from Complex Data with Rough Sets

(1)

Yanyong Huang

Dynamic Knowledge Acquisition from Complex Data with Rough Sets

Dissertation

Mathematik und

Informatik

(2)

with Rough Sets

DISSERTATION

zur Erlangung des akademischen Grades DOKTOR-INGENIEUR

der Fakultät für Mathematik und Informatik der FernUniversität in Hagen

von

Yanyong Huang,M.Sc.

Chengdu/China

Hagen, 2019

(3)

(4)

This dissertation is concerned with knowledge discovery from complex data in dynamic environments, where complex data refer to different types of object features and information sources, and dynamicity implies the variation of the objects, their features and of the feature values over time. Within the framework of rough sets, knowledge refers to the ability to classify data in the form of “IF-THEN”

decision rules, which may change with the variation of the complex data in dynamic environments.

For four kinds of typical complex data, rough sets-based incremental algorithms for dynamic knowledge acquisition are proposed and discussed:

(1) Probabilistic set-valued data have set-values with a probability distribution.

Considering addition and deletion of features in probabilistic set-valued data, an extended variable precision rough set model is built based on a novel binary relation, and matrix-based incremental algorithms are developed to build and update IF-THEN rules.

(2) Fuzzy data are feature values as indicated by fuzzy memberships. In terms of the simultaneous variation of objects and their features described by fuzzy data, a matrix representation of rough fuzzy sets is presented by defining a relation matrix associated with a novel matrix operator. A corresponding incremental method to update IF-THEN rules is proposed, which can update multiple relation matrix entries.

(3) Multi-source hybrid data originate from different information sources and have multiple types of features. Taking the simultaneous variation of objects, features and feature values in multi-source hybrid data into account, a novel multi-source composite rough set model is proposed by integrating multiple binary relations and a matrix-based incremental method to update IF-THEN rules, which can avoid the disclosure of decision rules.

(4) Multiple-source interval-valued data are collected from different information sources. Considering addition and deletion of sources for multi-source interval- valued data, incremental fusion mechanisms are discussed and suitable incremental fusion algorithms are developed, which can reduce ambiguities and uncertainties in the data and improve the quality of the knowledge acquired.

Theoretical analyses and experiments are conducted to verify the efficiency of the methods proposed. This study is beneficial to analyze the uncertainty problems of complex data due to their variety and veracity, and provides new methodologies for dynamic knowledge acquisition from complex data. Furthermore, the incremental methods presented can improve the processing speed of big data.

(5)

(6)

Diese Dissertation befasst sich mit dem Erwerb von Wissen aus komplexen Daten in dynamischen Umgebungen, wobei die Komplexität der Daten durch ihre verschiedenartigen Objektmerkmale und Quellen bedingt ist und die Dynamik die Veränderung der Objekte, Merkmale und Merkmalswerte über die Zeit impliziert. Im Rahmen grober Mengen wird unter Wissen die Fähigkeit verstanden, Daten in Form von IF-THEN Entscheidungsregeln zu klassifizieren, die sich in dynamischen Umgebungen mit der Variation der komplexen Daten ändern können. Für vier Arten typischer komplexer Daten werden auf groben Mengen basierende inkrementelle Algorithmen zum dynamischen Wissenserwerb vorgestellt und diskutiert:

(1) Die Werte probabilistischer mengenwertiger Daten sind wahrscheinlichkeitsverteilt. Unter Berücksichtigung der Hinzufüg- und Löschbarkeit von Merkmalen Mengenwertdaten werden ein erweitertes Grobmengenmodell variabler Genauigkeit auf der Grundlage einer neuartigen binären Beziehung erstellt und matrixbasierte inkrementelle probabilistischer Algorithmen zur Bildung und Aktualisierung von IF-THEN-Regeln entwickelt.

(2) Fuzzy-Daten beschreiben Merkmalswerte durch Fuzzy-Mitgliedschaften. In Bezug auf die gleichzeitige Variation von Objekten und ihrer durch Fuzzy-Daten charakterisierten Merk- male wird eine Matrixdarstellung grober Fuzzy-Mengen durch Definition einer, einem neuen Matrixoperator zugeordneten Relationsmatrix präsentiert. Eine entsprechende inkrementelle Methode zum Aktualisieren von IF-THEN-Regeln wird vorgeschlagen, mit der sich mehrere Relationsmatrixeinträge aktualisieren lassen.

(3) Hybriddaten stammen aus verschiedenen Informationsquellen und weisen mehrere Arten von Merkmalen auf. Zur Berücksichtigung der gleichzeitigen Variation von Objekten, Merkmalen und Merkmalwerten in Hybriddaten wird durch Integration mehrerer binärer Beziehungen und einer matrixbasierten inkrementellen Methode zur Aktualisierung der IF-THEN-Regeln ein neues zusammengesetztes, mehrere Quellen einbeziehendes Grobmen-genmodell vorgeschlagen, das die Offenlegung von Entscheidungsregeln ver- meiden kann.

(4) Intervallwerte werden aus verschiedenen Informationsquellen erfasst. In Anbetracht der Hinzufüg- und Löschbarkeit der Quellen solcher Daten werden inkrementelle Fusion- smechanismen diskutiert und geeignete inkrementelle Fusionsalgorithmen entwickelt, die Unklarheiten und Unsicherheiten in den Daten verringern und die Qualität des erworbenen Wissens verbessern können.

Theoretische Analysen und Experimente werden durchgeführt, um die Effizienz der vorgeschlagenen Methoden zu überprüfen. Die vorliegende Studie ist nützlich, um die Un- sicherheitsprobleme komplexer Daten aufgrund ihrer Vielfalt und Richtigkeit zu analysieren.

Sie stellt neue inkrementelle Methoden für den dynamischen Wissenserwerb aus komplexen Daten bereit, die die Geschwindigkeit der Verarbeitung großer Datenmengen erhöhen können.

(7)

(8)

I would like to express my sincere gratitude to my supervisors, Prof. Zhong Li and Prof. Tianrui Li. From research topic selection to thesis writing, even sentence formulas, they gave unwavering support and guidance throughout my PhD study.

Following them for years, their rigorous academic attitude, logical way of thinking and respectful personalities influence me deeply. These will be a great treasure throughout my life. Without their excellent supervision and continuous encouragement, the thesis would not have been finished on time. Thank all of you for your kind help!

I am especially grateful to all members of the Chair of Communication Networks in the University of Hagen. When I first met Professor Herwig Unger in 2018 in China, his profound knowledge, positive and optimistic spirit impress me deeply.

Thank to him for providing the chance as a PhD student in the University of Hagen.

Also, I am thankful to Dr. Panchalee Sukjit, Mrs Barbara Kleine, Mrs Jutta Düring, Dr. Mario Kubek and M.Sc. Supaporn Simcharoen. Without their help, I could not adapt to the environment as soon as possible.

I am extremely grateful to the whole Key Lab of Cloud Computing and Intelligent Technique of Sichuan Province, especially Chuan Luo, Hongmei Chen, Jie Hu, Anping Zeng, Shu Wang, Bin Wang, Shengong Ji, Hao Wang, Huaishao Luo, Peng Xie and Qianqian Huan, who gave me a lot of support at various time.

I wish to express my appreciation for the financial support from University of Hagen, which satisfies me daily needs and keeps me focus on my research.

Last but not least, Thanks to my parents, parents-in-law and brothers for their continued encouragement and generous support. I also would like to thank my wonderful wife Yanmei Cai for her understanding to my research and unconditional support. My two beautiful children, Yujie and Yuyue, deserve special thanks too.

Their smiles and love kept me going.

(9)

(10)

1 Introduction 1

1.1 Background . . . 1

1.2 Rough Sets-based Knowledge Discovery in Complex Data: A Review 2 1.3 Motivations . . . 7

1.4 Contributions . . . 10

1.5 Thesis Structure . . . 11

2 Preliminaries 13 2.1 Classical Rough Sets . . . 13

2.2 Extended Rough Sets . . . 14

2.3 Fuzzy Sets . . . 16

2.4 Chapter Summary . . . 18

3 Extended Variable Precision Rough Sets for Dynamic Probabilistic Set-valued Data 19 3.1 Probabilistic set-valued information systems (PSvIS) . . . 19

3.2 Matrix-based incremental approach for updating rough approximations 22 3.2.1 Matrix-based representation of rough approximations . . . . 22

3.2.2 Addition of attributes . . . 24

3.2.3 Deletion of attributes . . . 27

3.3 Algorithms for updating rough approximations in PSvIS . . . 30

3.3.1 The static algorithm . . . 30

3.3.2 The dynamic algorithm . . . 30

3.4 Experimental evaluations . . . 31

3.4.1 Performance comparisons between static and dynamic algorithms 34 3.4.2 Comparative experiments on the variation of attributes . . . 36

3.4.3 Efficiency comparisons on inserting different ratios of attributes 38 3.4.4 Experiments on the parameters . . . 40

4 Rough Fuzzy Sets for Dynamic Fuzzy Data 45 4.1 Incrementally updating rough fuzzy approximations in dynamic fuzzy decision systems (FDS) . . . 45

4.1.1 Matrix-based representation of rough fuzzy approximations . 46 4.1.2 Dynamic update rough fuzzy approximations with the variation of objects and attributes . . . 49

(11)

4.2 Algorithms for computing rough fuzzy approximations in FDS . . . 53

4.2.2 The dynamic algorithm . . . 53

4.3.1 Efficiency comparisons on inserting different ratios of data . . 56

4.3.2 Performance comparison with the growing sizes of data sets . 56 4.3.3 Comparisons with the related algorithms . . . 58

5 Multi-source Composite Rough Sets for Dynamic Multi-source Hy- brid Data 61 5.1 Multi-source Hybrid Information Systems (MHIS) . . . 61

5.2 Incrementally updating rough approximations in dynamic MHIS . . 64

5.2.1 Matrix approach for rough approximations . . . 64

5.2.2 Dynamic update rough approximations under the variation of objects, attributes and attribute values . . . 67

5.3 Algorithms for updating rough approximations in MHIS . . . 72

5.3.2 The incremental algorithm . . . 72

5.4.1 Performance comparison on adding different ratios of data . . 75

5.4.2 Comparisons with other related incremental algorithms . . . 78

5.4.3 Transmission efficiency comparison . . . 79

5.4.4 Parameter sensitivity analysis . . . 80

6 Incremental Fusion of Dynamic Multi-source Interval-valued Data 83 6.1 Interval-valued information fusion based on fuzzy granulation . . . . 84

6.1.1 Multi-source interval-valued information systems (MIvIS) . . 84

6.1.2 Fuzzy granulation-based fusion approach . . . 85

6.2 Incremental fusion mechanisms under the variation of data sources . 89 6.2.1 Addition of data sources . . . 90

6.2.2 Deletion of data sources . . . 93

6.3 Static and incremental fusion algorithms in MIvIS . . . 96

6.3.1 The static fusion algorithm . . . 96

6.3.2 The incremental fusion algorithm . . . 97

6.4.1 Classification performance . . . 100

6.4.2 Computational efficiency . . . 103

7 Conclusion and Future Research 107 7.1 Conclusions . . . 107

7.2 Future work . . . 109

(12)

Publications related to this thesis 123

Appendix 125

(13)

(14)

Introduction

In this chapter, the background and overall picture of the present study of rough sets-based knowledge discovery from complex data are provided. Besides, we give the motivations, the contributions and the thesis structure.

1.1 Background

With the rapid development of information technologies such as cloud computing, internet of things and mobile internet, the amount of data as well as different types and structures of data produced and accumulated by various industries and human show the explosive growth, which cause the advent of the era of Big Data [1, 2].

How to mine knowledge from big data has brought many challenges. On one hand, in big data environment, the data always show the uncertainty arisen from inaccuracy, incompleteness, impreciseness and fuzziness and the diversity including various types of data feature and modality, like the categorical, numerical, interval- valued, set-valued, distribution data, etc, as well as the text, audio and video, etc.

The uncertainty, complexity and diversity of data will result in the difficulty of knowledge discovery from big data [3]. On the other hand, these data may be collected by different information sources, located in diverse sites. For example, the meteorological data are collected by the radar detector, satellite detector and ground-based observation sites [4]. In addition, for the reason of updated and aged instruments, the data will change over time. It can be summarized three scenarios: (a) the addition or deletion of data samples (objects), (b) inserting into or removing from data features (attributes), and (c) the revision of feature values (attribute values). The number of data sources in multi-source data also increase or decrease over time, which result in the difficulty in information fusion and real-time decision making [5, 6]. Hence, in order to improve the efficiency of knowledge discovery in dynamic data environments, how to develop effective information fusion and knowledge acquisition methods from the complex data has become an import research topic in the fields of knowledge discovery and data mining.

As a novel information-processing paradigms in the domain of computational intelligence, granular computing (GrC) provides a framework for the data processing by partitioning the whole into parts based on the selection of suitable granularity and integrating parts into the whole based on the discard of irrelevant elements [7, 8].

(15)

GrC can provide multi-level and multi-view perspective for dealing with complex data and improve the efficiency of computation for coping with large-scale data [9,10].

It has become an important theory for processing big data. As an efficient granular computing method for solving complex problems, rough sets proposed by Pawlak in 1982 [11] is a mathematical tool for mining knowledge from the uncertain, imprecise and incomplete data. Since rough set based data analysis does not need any extra prior model assumptions like the probability distributions in statistical methods and the probability assignments in Dempster-Shafer theory of evidence, knowledge discovered from the data will be more objective. Nowadays, rough set theory has been successfully applied in many fields, such as artificial intelligence [12,13], data mining [14,15], intelligent information processing [16, 17] and so forth. Based on the binary relation, the rough set model describes an unknown concept by the utilization of a pair of precise concepts, named as lower and upper approximations. According to rough approximations, the universe combined by all objects can be partitioned into three pair-wise disjoint regions: the positive region, the negative region and the boundary region. We can get the following “IF-THEN” rules from these three regions: (1) if an object in the positive region, the object surely belongs to the given decision class; (2) if an object in the negative region, the object certainly does not belong to the given decision class; (3) if an object in the boundary region, the object properly belongs to the given decision class. Hence, it is very important and valuable to investigate the efficient method for the computation of rough approximations, which can be further applied into improve the performance of rule induction and attribute reduction in field of knowledge discovery. There are many research on knowledge discovery based on rough sets [18,19, 20]. However, these methods could not be directly applied into the dynamic data with complexity. First of all, the complex data always are characterized by various types of attributes or derived from different information sources. These rough set models based on some special binary relations are suitable for dealing with single type of data or single-source data. They could not be employed to the complex data. Secondly, in dynamic data environments, the structures of rough approximations will change over time. The corresponding decision rules will be updated. Therefore, it is emergent to develop the effective rough set approaches for knowledge discovery from complex data in dynamic environments. In what follows, we will review the studies on knowledge discovery in complex data based on rough sets.

1.2 Rough Sets-based Knowledge Discovery in Com- plex Data: A Review

The research object of rough sets is a data table (or an information system), whose rows are named as the objects, columns are named as the attributes and the entries of the table are the attribute values. According to the domains of attribute values, the types of attributes include categorical, ordinal, numerical, interval-valued, set-valued attributes and so on. Discovering knowledge from simple data with only one type of attributes by means of rough sets have been intensively investigated [21,22,23,24,25].

However, in real-world applications, the collected data show complexity in terms of the attribute types, namely, including multiple types of attributes. For instance, the

(16)

meteorological data include district (categorical attribute), temperature (numerical attribute), relative humidity (interval-valued attribute) and so forth. Furthermore, the data may be obtained from different information sources, which locate different sites. For example, the meteorological data are collected from radar detectors, satellite detectors and ground-based sites. In this dissertation, we concentrate on the complex data refer to with various types of attributes and information sources.

In what follows, we will survey on rough sets-based discovering knowledge from complex data in static and dynamic environments, respectively.

Static Complex Data Scenario

Many extended rough ret models are presented to analyze the complex data. In order to deal with the complex data with the categorical and numerical attributes, Hu et al. presented aδ neighborhood rough set model based on the value difference metric and the corresponding model is applied to select features in hybrid information systems [26]. According to the different fusion strategies “Seeking common ground while eliminating differences” and “Seeking common ground while eliminating d- ifferences”, Qian et al. presented two multi-granulation rough set models, named as, pessimistic multi-granulation rough set model and optimistic multi-granulation rough set model, respectively [27]. And, the proposed multi-granulation method is further employed to the incomplete information systems with the symbolic data and numerical data. The corresponding incomplete multi-granulation rough set model is presented based on the tolerance relation [28]. Jing et al. introduced a novel distance function to describe the tolerance neighborhood relation in heterogeneous incomplete information systems and an extended rough set model (variable precision tolerance neighbourhood rough sets) as well as the related properties are discussed [29]. From the constructions of fuzzy hybrid granules and rough approximations, Wei et al.

investigated the generalization properties with regards to Hu’s and Wang’s fuzzy rough set models, which can deal with hybrid data [30]. By the combination of probabilistic rough set model, Chen et al. presented a probabilistic composite rough set model to integrate multiple binary relationsw.r.t. different types of data [31]. To deal with the fuzzy data with missing attributes, Dai presented an extended fuzzy rough set model by the utilization of the tolerance relation based on the similarity function [32]. Zhang et al. proposed a composite rough set model to analyze the complex data with categorical, numerical, set-valued and missing attributes, and developed a matrix-based incremental algorithm for the maintenance of composite rough approximations under the variation of objects [33]. To deal with the incomplete composite information systems with categorical and criteria attributes, Li et al. proposed a novel extended composite rough set model by the integration of the dominance relation and the tolerance relation [34]. Zeng et al. presented a Gaussian kernel fuzzy rough set model based on the hybrid distance function, which can simultaneous deal with the symbolic, numerical, interval-valued data [35]. In order to deal with multi-modality data, Hu et al. transformed different types of data into multiple kernel matrices and fused these matrices based the fuzzyT-norm.

Furthermore, the proposed multi-kernel fuzzy rough set modal is employed to the fuzzy classification of large-scale multi-modality data [36].

In order to discover knowledge from complex data with multiple types of at-

(17)

tributes, Hu et al. proposed a general fuzzy rough set model in hybrid information systems including categorical and numerical attributes and defined several hybrid attribute significance measures, which are employed to construct a forward greedy algorithm for attribute reduction [37]. From the perspective of multi-granulation, Liang et al. divided a large scale complex decision table into multiple sub-tables with small size and fused different results of attribute reduction derived from various sub-tables to obtain an approximate reduct [38]. Qian et al. presented a pessimistic multigranulation rough sets for the multigranulation rule induction and attribute reduction [39]. In order to deal with the attribute reduction for hybrid data with the symbolic and numerical features, Hu et al. proposed two extended rough set models δ-neighborhood rough sets and k-nearest-neighbor rough sets by granulating the numerical objects based on δ-neighborhood or k-nearest-neighbor relations and the symbolic objects based on the equivalence relations. These two models are employed for attribute reduction in hybrid information systems [40]. By introducing the multi-granulation, Lin et al. further presented two neighborhood-based multi- granulation rough set models for the mixed attribute reduction [41]. Chen et al.

developed a paralled attribute reduction algorithm based on the dominance-based neighborhood rough set model in hybrid ordered decision systems [42]. By the combination of Dempster-Shafer theory of evidence, Wu et al. presented the rule induction methods in consistent and inconsistent incomplete multi-scale decision tables [43]. In order to discover knowledge from complex data collected from different information sources, Xu et al. developed theλ-attribute reduction algorithm based on mutual information in each subsystems and partially transformed the effective λ-tolerance classes derived from different subsystems to obtain an approximation global reduction in interval-valued multi-decision tables [44]. Hu et al. investigated an attribute induction method keeping the positive region unchanged in distributed decision information systems, where each subsystems locate different sites [45]. Lin et al. employed the Gaossian kernel function to compute the fuzzy equivalence relation matrices in each subsystems and proposed a novel fuzzy multigranulation decision-theoretic rough set model by integrating these matrices in multi-source fuzzy information systems. And the corresponding model is applied to rule induction [46].

Based on internal-confidence and external-confidence degrees, Xu et al. presented a source selection algorithm for multi-source numerical data and fused these different information sources by employing the triangular fuzzy information granules [47].

Moreover, an information fusion method based on conditional entropy is employed for multi-source incomplete data [48].

Dynamic Complex Data Scenario

Due to the rapid development of information technologies, the collected data are often not static, but evolve over time. The update of data will result in the changes of knowledge. How to efficiently discover knowledge from the evolving data has become an important challenge in big data era. The traditional static method for ongoing data is retrain the whole models on the entire updated data, which is too time-consuming to make decision or predict immediately. Incremental learning technology is an efficient method for discovering knowledge from dynamic data, which updates the knowledge by utilizing the previously learned knowledge to analyze the

(18)

newly updated data rather than recalculation in the whole data set from scratch. In what follows, we will introduce related works on incremental method for rough sets- based knowledge discovery in dynamic data with aforementioned various variations in Section 1.1, namely, the variations of objects, attributes and attribute values, respectively.

(1) The objects change over time in the information systems. Chen et al.

discussed the update principles of information granules and rough approximations based on variable precision rough sets when adding and deleting the objects. And the corresponding incremental algorithms for the maintenance of approximations are developed [49]. Luo et al. proposed the matrix-based representation for decision- theoretic rough sets and investigated the incremental method for the computation of rough approximations under the variation of objects [50]. In consideration of order information systems with the variation of objects, Li et al. discussed the incremental mechanisms w.r.t. theP-dominating sets and P-dominated sets with the dominance-based rough set model and developed the corresponding incremental algorithms for computing rough approximations [51]. Zeng et al. investigated the incremental update method for rough fuzzy approximations in dynamic fuzzy data environment when inserting and removing objects [52]. Fan et al. presented an incremental rule induction method by the utilization of the previous accumulated rules when adding a object [53]. Huang et al. further investigated that the rules extracted by Fan’s method were non-complete rules. Then an incremental rule extraction algorithm is developed for inducting the completed rules when inserting a new object [54]. By the combination of the Apriori algorithm and the Dominance- based rough set approach, Błaszczynáski et al. presented an incremental rule induction approach for incomplete information systems with the addition of multiple objects [55]. Liang et al. discussed the incremental principles with regards to the conditional entropy, the conditional combination entropy and the complementary conditional entropy. Then, the incremental attribute reduction approaches keeping these three conditional entropies unchanged are presented when inserting a group objects [56]. Considering the variation of objects in incomplete decision information systems, Shu et al. developed an incremental attribute reduction algorithm by keeping the positive region unchanged [57]. According to whether the newly injected object is impact on the current results of attribute reduction, Yang et al. presented an incremental feature selection method based on the active sample selection [58].

(2) The attributes evolve over time in the information systems. Luo et al. p- resented an incremental matrix approach for the update of approximations w.r.t.

dominance-based rough set approach in set-valued ordered decision systems with the addition and deletion of attributes [59]. Liu et al. investigated the incremental mechanisms of information granules based on probabilistic rough sets and developed two incremental algorithms for the maintenance of approximations under the generalization of attributes [60]. Chen et al. discussed the update principles of the boundary set and cut set with regards to rough fuzzy sets when the attributes of fuzzy data increase and decrease over time and developed the corresponding two incremental algorithms for the computation of rough fuzzy approximations [61]. Yu et al. proposed an incremental method for the update of rough approximation- s based on dominance rough sets in interval-valued ordered information systems with the variation of attributes [62]. Chan proposed an incremental rule induction

(19)

method based on the classical rough set model when adding and deleting an attribute gradually [63]. In order to improve the performance of rule induction under the addition and deletion of multiple attributes, Li et al. developed an incremental rule extraction algorithm by the utilization of the characteristic relation-based rough set model [64]. Wang et al. investigated incremental principlesw.r.t. the complementary entropy, Shannon information entropy and combination entropy when the attributes increase over time. And the corresponding three attribute reduction algorithms are developed for keeping the entropy unchanged [65]. Su et al. investigated incremental mechanisms for the computation of the positive region and tolerance classes based on the tolerance rough set model in incomplete information systems and developed two attribute reduction algorithms when adding new attributes and deleting obsolete attributes [66].

(3) The attribute values vary with time in the information systems. Luo et al. discussed the incremental principles for the computation of dominating sets, dominated sets, upward unions and downward unions based on the dominance rough set model when the criteria values are inserted into or removed from the set-valued ordered decision systems. Then, corresponding two algorithms are developed for the update of rough approximation [67]. Luo et al. further developed two incremental algorithms for the computation of dominance-based rough approximations in hierarchical multicriteria decision systems with the cut refinement and coarsening of attribute values [68]. Chen et al. presented the equivalence feature matrix and the attribute importance matrix for extracting decision rules based on the classical rough sets. And two matrix-based incremental algorithms for rule induction are developed under the coarsening and refining attribute values [69]. Wang et al. discussed the incremental mechanisms for the computation of three different information entropies under the changes of attribute values and developed the corresponding incremental algorithms for attribute reduction [71]. Su et al. presented an incremental attribute reduction method for keeping the positive region unchanged in the incomplete information system when revising attribute values [71]. Jing et al. investigated the incremental principles for the computation of knowledge granularity when correcting an attribute value and multiple attribute values, respectively. Moreover, the incremental algorithms based on the updated knowledge granularity are developed to attribute reduction [72]. Wei et al. presented two incremental attribute reduction methods based on the discernibility matrices with regards to the whole and the compacted decision tables when the attribute values change over time [73].

The aforementioned researches focus on knowledge discovery from the data with only a type of attributes (can be deemed as a special case of complex data) in dynamic environments. There are rare studies on dynamic complex data with multiple types of attributes or multiple information sources. Zeng et al. presented a novel hybrid distance function for the construction of fuzzy rough sets and developed the incremental algorithms for the attribute reduction when inserting and removing an attribute in the hybrid information systems with multiple types of attributes [74].

Furthermore, considering the variation of attribute values in hybrid information systems, Zeng et al. developed the incremental algorithms for the maintenance of rough approximations w.r.t. gaussian kernel fuzzy rough sets [75]. Based on the discernible neighborhood counting, Yang et al. presented an incremental attribute reduction method for the complex data with categorical and numerical attributes

(20)

under the addition of objects [76].

1.3 Motivations

Although, the above mentioned research works make substantial contributions on rough set-based knowledge discovery in complex and dynamic data, most of these methods only concern on discovering knowledge from the data with a single type of attributes under the variation of objects or attributes or attribute values. However, complex data with multiple types of attributes or multiple different information sources widely exist in real-world. In addition, the objects or attributes or attribute values may simultaneously change over time in dynamic environments. In what follows, we give the motivations for the knowledge acquisition from four kinds of typical complex data with different orders of variations, respectively.

(1) Discovering knowledge from probabilistic set-valued data with the variation of attributes. Although there are many research works on knowledge discovery from set-valued data introduced in Section 1.2.1, no previous study has specifically focused on discovering knowledge from the set-valued data with probability distribution, named as probabilistic set-valued data in the thesis. Probabilistic set-valued data exist in many real-world situations. For example, in a language-test information system, a set value{German,Polish} under the test attribute of spoken language indicates that a candidate can speak German and Polish in terms of conjunctive semantic [25]. However, in reality, the candidate may speak German fluently, but a little Polish. In order to describe this phenomenon more accurately, we distinguish the ability of spoken language by characterizing the set value with a discrete probability distribution, such as {German,Polish}

{0.99,0.01} . This form of data is called as the probabilistic set-valued data, which is a type of complex data with set-valued and probability distribution features. The information systems with such probability distribution of data are suggested as Probabilistic Set-valued Information Systems (PSvIS) in our study. Moreover, the traditional tolerance relation based on intersection operation in set-valued information systems (SvIS) could not be applied directly in PSvIS.

For instance, two candidates with set values {German,Polish}

{0.99,0.01} and {German,Polish}

{0.01,0.99} under the attribute “speaking a language” are indiscernible according to the tolerance relation. However, it is not reasonable in terms of the probability distributions, i.e., two people where one speaks well in German and only a little Polish and the other is reverse are in the same tolerance class. Furthermore, considering that the classical rough set model is sensitive to misclassified and noisy data, Ziarko presented a robust model, e.g., variable precision rough set model (VPRS), which allows some degree of partial classification by introducing the majority inclusion degree [21]. VPRS has been widely applied in various fields, such as water demand prediction [21], economic and financial prediction [77,78] and medical decision making [79,80,81].

Motivated by these considerations, this study presents theλ-tolerance relation based on Bhattacharyya distance and the extended VPRS approach for PSvIS. In addition, considering the variation of attributes in PSvIS will result in the changes of rough approximation structures, this study further develops incremental methods for the update of rough approximations when adding and deleting attributes in PSvIS.

(2) Discovering knowledge from fuzzy data with the simultaneous variations of

(21)

objects and attributes. Only considering the variation of objects or attributes or attribute values, knowledge discovery from the dynamic data have been investigated intensively. However, new objects and new attributes may simultaneously appear in real-life situations. For example, new patients are added into the medical diagnosis system with new features information arise due to new instrument. For the categorical data with the variations of objects and attributes, Chen et al. investigated an incremental method for the computation of approximations based on decision- theoretic rough sets [82]. Wang et al. defined a novel P-generalized decision domains and developed an incremental algorithm for the update of dominance-based rough approximations in order information systems with the variation of objects and attributes [83]. But these approaches suffer the limitation of handling fuzzy data. As the fuzzy information universally exist in the real applications, this study investigates incremental mechanisms for the update of approximations based on the rough fuzzy set model when the objects and attributes of fuzzy data change over time simultaneously. Furthermore, considering that the matrix form is benefit for intuitive description, simplifying calculation and easy maintainability, this study presents a novel matrix operation for the construction of rough fuzzy approximations, and further develop incremental mechanisms based on matrix for the update of approximations when objects and attributes are added simultaneously in fuzzy decision systems.

(3) Discovering knowledge from multi-source hybrid data with the simultaneous variations of objects, attributes and attribute values. Multi-source data, collected by multiple information sources or characterize by different views, extensively exist in practical applications, e.g., web documents [84] and face images data [85]. In order to discover knowledge from multi-source data efficiently, Xu et al. developed a source selection algorithm based on internal-confidence and external-confidence degrees for multi-source numerical data [47]. Lin et al. presented a fuzzy multigranulation decision-theoretic rough set model for the analysis of multi-source fuzzy data [86].

For multi-source incomplete data, Xu et al. presented an information fusion method based on conditional information entropy [87]. These approaches only focus on a special multi-source data, where the objects are characterized by a type of attributes.

However, in real world applications, we often encounter multi-source hybrid data, which have multiple different types of attributes in each information source. For instance, meteorological data are obtained from multiple meteorological stations, which collect various types of data, such as wind direction with the categorical attribute, precipitation with the numerical attribute and temperature with the interval-valued attribute [88]. Besides, there may exist missing data due to the failure of measurement. The above-mentioned approaches could not be directly applied to deal with multi-source hybrid data. Furthermore, these methods concentrated on the static multi-source data. In dynamic data environments, the multi-source data will evolve over time, including the simultaneous changes of objects, attributes and attribute values. For example, to improve the prediction of weather, meteorological data are further collected by adding some new observation stations, supplying some new instruments and revising missing values due to the repair of damaged requirements simultaneously. Since the dynamic changes of data will result in the update of knowledge, it deserves to develop effective updating methods, which are helpful for making decision in real-time. Furthermore, in real applications, when

(22)

different information sources locate diverse sites, the method of centralizing all data together may be poor efficiency while the size of data is large and the transmission bandwidth is relatively limited [89]. Besides, considering the requirement of privacy preserving, it is not suitable to aggregate all data in a data center [90]. Inspired by these considerations, this study presents a novel multi-source composite rough set model to cope with multi-source hybrid data and investigates incremental approaches for the update of rough approximations under the addition of objects and attributes and the revision of missing values. And, the presented method need not centralize all data derived from different sites for the update of knowledge.

(4) Discovering knowledge from multi-source interval-valued data with the variation of data sources. Information fusion can combine and transform information from multiple different sources to construct a unified representation, which is beneficial to reduce the ambiguity and uncertainty of data and improve the effectiveness of knowledge discovery [91]. Information fusion technologies have been applied to deal with multi-source data. Xu et al. presented an information fusion approach based on information entropy to fuse multi-source fuzzy incomplete data [92]. Yager proposed a monotonic set measure for the fusion of multi-source data [93]. However, these approaches could not be directly applied to fuse multi-source interval-valued data. As mentioned in Section 1.1, GrC is a useful tool for knowledge discovery, information processing, machine learning, etc [94, 95, 96]. There are two basic issues in GrC, namely, the generation of information granules and computation with granules [97, 98, 99]. Pedrycz et al. presented an information granulation method based on fuzzy set theory for constructing information granules in the analysis of temporal data [100]. Yao et al. proposed a hierarchical granulation structure in terms of rough set theory and investigated the corresponding rough approximation structures [101].

Due to the simplicity of information granules representation and the convenience of granular computation, some information fusion methods via incorporating GrC have been investigated in multi-source situations. Yager presented a fusion framework for dealing with the conflict of different data sources by the utilization of granular method [102]. Lin et al. discussed the relation between the multigranulation rough set theory and the evidence theory and further proposed a granulation fusion method by combing these two theories [103]. Xu et al. presented a source selection algorithm for multi-source numerical data and further fused these different information sources by employing the method of triangular fuzzy information granules [47]. However, these approaches are suitable for dealing with multi-source single-valued data. They could not be applied to fuse multi-source interval-valued data directly. Additionally, the number of data sources in multi-source interval-value data will continuously change due to the new sources are inserted or the obsolete sources are deleted in the context of dynamic multi-source environments. For example, to improve the precision of weather prediction, more meteorological sensors are installed at different weather monitor stations. And, aged meteorological sensors or unreasonable monitoring locations will be removed or canceled due to their ineffectiveness. Since the variation of sources will result in the change of fusion results, traditional static fusion methods need to recompute the whole process of information fusion, which is too expensive or even infeasible for large datasets. It is important to develop incremental fusion methods for improving the efficiency of information fusion. There are no studies on GrC-based incremental fusion of multi-source interval-valued data

(23)

with the variation of data sources. This study presents a novel fusion method based on fuzzy granulation and develops incremental fusion algorithms when adding and deleting data sources.

1.4 Contributions

This thesis researches on knowledge acquisition from complex data in dynamic environments. The main contributions of this study are concisely summarised as follows.

• For knowledge discovery from probabilistic set-valued data with the variation of attributes: (i) VPRS is extended to deal with probabilistic set-valued data by the introduction of λ−tolerance relation based on Bhattacharyya distance.

(ii) two matrix operators and two vector functions are designed to characterize the matrix representation of rough approximations. (iii) two incremental mechanisms of updating approximations are presented under the variation of attributes in PSvIS. Theoretical and experimental results demonstrated the efficiency of the proposed method compared with the static and other incremental approaches.

• For knowledge discovery from fuzzy data with the variation of objects and attributes: (i) a novel matrix operator is presented for the construction of rough fuzzy approximations. (ii) a matrix-based incremental method for the update of rough approximations is presented under the simultaneous variations of objects and attributes in fuzzy decision systems. Comparing with the static and other available incremental methods, theoretical and experimental results indicated the proposed approach can improve the efficiency of updating knowledge.

• For knowledge discovery from multi-source hybrid data with the variation of objects, attributes and attribute values: (i) a novel multi-source composite rough set model (MCRS) is presented for multi-source hybrid data. (ii) a matrix-based representation of rough approximations is presented for MCRS by introducing two matrix operators. (iii) The incremental principles for the maintenance of rough approximations are presented when adding the objects and attributes as well as revising the attribute values simultaneously.

Theoretical and experimental results showed the proposed method outperforms the static and other existing incremental approaches in terms of the efficiency of computation and transmission.

• For knowledge discovery from multi-source interval-valued data with the variation of data sources: (i) a novel information fusion method based on fuzzy granulation is presented for multi-source interval-valued data by the introduction of a novel dominant matrix. (ii) two incremental fusion approaches are proposed under the addition and deletion of data sources in multi-source interval-valued decision systems. Theoretical and experimental results demonstrated the proposed method is better than other fusion methods with regards to the classification accuracy and computational efficiency.

(24)

1.5 Thesis Structure

The thesis is organized as follows.

In chapter 1, the research background, the related works, the motivations, the contributions and the content of this thesis are stated.

Chapter 2 reviews the basic definitions related to rough set theory and fuzzy set theory.

To discover knowledge from the probabilistic set-valued data, chapter 3 presents an extended VPRS and develops two incremental algorithms for the update of rough approximations when adding and deleting the attributes in PSvIS. Bhattacharyya distance is adopted to measure the similarity degree of objects in PSvIS.λ−tolerance relation based on Bhattacharyya distance is presented to construct the novel rough set model. In addition, positive, negative and boundary region matrices are proposed for the maintenance of rough approximations under the attribute generalization.

In chapter 4, a matrix-based incremental method for the computation of rough fuzzy approximations is proposed when the objects and attributes simultaneously increase over time. Matrix-based representation of rough fuzzy approximations is proposed by the introduction of a novel matrix operator associated with the relation matrix and the fuzzy concept. The matrix block and partially updated matrix elements methods are applied to incrementally compute approximations when adding and deleting attributes.

Chapter 5 introduces a novel multi-source composite rough sets to deal with multi-source hybrid data and develops an incremental method for the maintenance of rough approximations under the simultaneous variations of objects, attributes and attribute values. Composite relation matrix is presented to fuse different types of binary relations and Fusion relation matrix is prosed to integrate multiple composite relation matrices derived from different information sources. Two matrix operators associated with composite and fusion relation matrices are introduced to construct the matrix representation of rough approximations. An incremental approach based on matrix for the maintenance of approximations by using the previously accumulated matrix information and transmitting part location information of each composite relation matrix is presented when the objects and attributes are added and the attribute values are revised simultaneously.

A fuzzy granulation-based incremental fusion method for multi-source interval- valued data is presented in chapter 6 when the data sources change over time. From the perspective of probabilistic, a novel dominant matrix is proposed to describe the dominant relation between two interval-valued data derived from different information sources. When adding and deleting data sources in multi-source interval- valued decision systems, an incremental fusion approach is proposed by utilizing the accumulated granules information to improve the fusion efficiency.

Finally, Chapter 7 is to conclude the thesis and point out the further research directions.

(25)

(26)

Preliminaries

This chapter introduces some basic concepts and notations of the classical rough set model, the extended rough set models and the fuzzy set theory, which will be used throughout the thesis.

2.1 Classical Rough Sets

In this section, we introduce the basic concepts of the classical rough sets in an information system.

The main research object of rough sets is the information system, where the rows and columns are labelled by objects and attributes respectively, and the entries are called by attribute values. It can be defined by as follows.

Definition 2.1.1. Let S =hU, AT, V, fi be an information system, where

(1) U ={x_i|i∈ {1,2, . . . , n}}is a non-empty finite set of objects, called the universe;

(2) AT =C∪D(C∩D=∅) is a non-empty finite set, whereC and D denote the condition and decision attribute sets, respectively;

(3) V = ^S

a∈AT

V_ais a domain of attributes, whereV_ais the domain w.r.tthe attribute a;

(4) f is an information function from U×AT to V.

Equivalence relation is a basic concept of the classical rough set. It can be employed to construct the information granules of the universe in an information system.

Definition 2.1.2. Let S =hU, AT, V, fi be an information system, where AT = C^SD and C is a categorical attribute set. Given B ⊆ C, a binary equivalence relation RB is defined as follows:

RB ={(x, y)∈U ×U|f(x, b) =f(y, b),∀b∈B} (2.1)

(27)

Based on the equivalence relationR_B, the universeU is partitioned by different equivalence classes, namely, U/R_B={[x]_R_B|x∈U}, where [x]_R_B ={y∈U|(x, y)∈ RB}. The equivalence classes, also called knowledge granules, cab be viewed as the information granules of the universe, which describe the capability of classification.

The more knowledge will induce the finer partition and the less knowledge will induce the coarser partition.

By the utilization of granulated vieww.r.t. the classical rough set model describe the uncertainty concept by a pair of precise concepts, which are called by lower and upper approximations.

Definition 2.1.3. [11] Let S=hU, AT =C^SD, V, fi be an information system.

∀X ⊆U and B ⊆C, the lower and upper approximations of X with respect to the equivalence relation R_B are respectively defined as:

RB(X) ={x|[x]_R_B ⊆X} (2.2)

R_B(X) ={x|[x]_R_B ∩X6=∅}. (2.3) According to the lower and upper approximations, the positive, negative and bound regions of X are defined by as follows.

P OSB(X) =RB(X) (2.4)

N EG_B(X) =U −R_B(X) (2.5)

BN D_B(X) =R_B(X)−R_B(X) (2.6) Based on the aforementioned definitions, the positive region can be interpreted as the objects inP OSB(X) definitely belong toX, the negative region can be explained as the objects in N EG_B(X) definitely do not belong to X and the boundary region can be interpreted as the objects in BN D_B(X) only possibly belong toX.

2.2 Extended Rough Sets

Although classical rough set model has been applied in various fields, it could not be directly applied to deal with the noise data and other types of data, such as numerical data and interval-valued data, which exist widely in practical applications. Hence, some researchers have developed many extended rough set models for different cases.

In what follows, we introduce four extended rough set models, which will be used in the thesis.

To cope with the misclassification and noise data, Ziarko presented the variable precision rough set model (VPRS) by introducing a threshold parameter β for controlling the degree of misclassification [21].

Definition 2.2.1. [21] Let S=hU, AT =C^SD, V, fi be an information system.

The parameter with respect to the proportion of correct classification is denoted as β and β ∈(0.5,1]. ∀X ⊆U and B⊆C, the lower and upper approximations of VPRS model are respectively defined as:

R_B^β(X) ={x|P(X|[x]_R_B)≥β} (2.7) RB

β(X) ={x|P(X|[x]_R_B)>1−β} (2.8)

(28)

where P(X|[x]_R_B) = XT

[x]_RB

, where | • | denotes the cardinality of a set.

According to the lower and upper approximations, the positive, negative and boundary regions of X are easy to obtain as follows.











The positive region: P OSB(X) =RB(X) The negative region: N EGB(X) =U−RB(X) The boundary region: BN D_B(X) =R_B(X)−R_B(X)

(2.9)

Hu et al. presented the neighborhood rough set model by the definition of neighborhood relation, which can be employed for numerical data [104].

Definition 2.2.2. [104] Let S = hU, AT, V, fi be an information system, where AT =C^SDand C is a numerical attribute set. Given B_N ⊆C, the neighborhood relation R_B_N is defined as

R_B_N ={(x, y)∈U×U|∆_B_N(x, y)≤δ}, (2.10) where ∆_B_N(x, y) is a distance function, which satisfies three properties, i.e., non- negativity, symmetry and triangle inequality.

∆_B_N(x, y) can be determined by some common distance functions widely used in machine learning, such as Chebychev distance, Euclidean distance and Man- hattan distance. In this thesis, the Chebychev distance, namely, ∆_B_N(x, y) = maxb∈B_N|f(x, b)−f(y, b)|is employed to describe the neighborhood relation, where max denotes the maximum operator.

The neighborhood rough set model based on the neighborhood relation has been widely applied to real-world applications [105,106].

To deal with interval-valued data, Chen et al. presented an interval-valued rough set model based on the similarity relation [107].

Definition 2.2.3. [107] Let S=hU, AT =C^SD, V, fi be an interval-valued infor- mation system, where the information function f such that ∀x∈U, a∈C, f(x, a) = [f−(x, a), f₊(x, a)] ∈ V_a and V_a is a set of interval numbers. Given B_I ⊆ C, the similarity relation RBI is defined as follows:

RBI ={(x, y)∈U×U|S_B^b

I(x, y)≥α,∀b∈BI}, (2.11) where S_B^b

I(x, y) = ^l(f^(x,b)

T_f_(y,b))

l(f(x,b)S

f(y,b)) denotes the similarity degree, l(•) stands for the length of the interval-valued data,^T and ^S are the intersection and union operators between two interval setsf(x, b) and f(y, b), respectively, and α∈(0.5,1].

The interval-valued rough set model can be utilized to analysis the interval-valued data in real-life, such as temperature, blood pressure and stock price [108,109].

In many practical issues, we always encounter that attribute values of some objects are missing in an information system. There are many approaches to cope with missing values in rough sets. From the Kryszkiewicz’s viewpoint, missing values are assumed to can be replaced by any other values in the corresponding attribute domain. Then, the tolerance relation presented by Kryszkiewicz is defined as follows [110].

(29)

Definition 2.2.4. [110] LetS =hU, AT =C^SD, V, fi be an incomplete informa- tion system, where there exist x∈U and c∈C such that f(x, c) =∗ (“*”denotes the missing value). Given BM ⊆C, the tolerance relation RBM is defined as follows:

R_B_M ={(x, y)∈U×U|f(x, b) =f(y, b)∨f(x, b) =∗∨f(y, b) =∗,∀b∈B_M} (2.12) The tolerance relation-based rough set model has been widely employed to deal with incomplete data [111,112].

2.3 Fuzzy Sets

Fuzzy set theory has played a key role in data mining, machine learning, decision- making systems, etc [113, 114, 115, 116], since it was presented by Zadeh initial- ly [117]. This section introduces the basic concepts of fuzzy set theory and an extended rough set model by integrating the fuzzy set and the rough set, i.e., rough fuzzy sets.

Definition 2.3.1. [117] LetA^e be a map from the universe U to the interval [0,1], namely,

Ae:U →[0,1]

x7→A(x).e

(2.13) Then Aê is called fuzzy set on the universe U and A(x)ê is called member function, which describe the object x belong to Aêwith a certain degree.

The fuzzy set Ae can be denoted by Ae = {^A(x^e_x¹⁾

1 ,^A(x^e_x²⁾

2 , . . . ,^A(x^e_xⁿ⁾

n }. Without confusion, we let A denote the fuzzy setAe for notational simplicity.

Let F(X) be the the family of fuzzy subsets of X. For any fuzzy subsets A, B ∈F(X), three basic operators complementation, intersection ∩and union∪ are defined as follows, respectively.











A^c(x) = 1−A(x);

(A∪B)(x) =A(x)∨B(x);

(A∩B)(x) =A(x)∧B(x);

(2.14)

where∧and ∨ stand for the minimum and maximum operations, respectively.

As the fuzzy number represents imprecise and uncertain information, it has been employed in many fields, like medical diagnosis, load forecasting of power and so forth [118,119,120]. In the thesis, we focus on the trapezoidal fuzzy number for our study. A trapezoidal fuzzy number A is defined by a quadruple, viz., A= (a, b, c, d), whose membership function has the following form:

A(x) =











x−a

b−a, if a≤x < b 1, if b≤x≤c

d−x

d−c, if c≤x < d 0, otherwise

(30)

Fig. 2.1 shows a general class of a trapezoidal fuzzy membership function. [a, d]

and [b, c] are defined as the support and core of the trapezoidal fuzzy number A, respectively. b−a is called the left width andd−c is the right width. Specially, when b=c, it is commonly named the triangular fuzzy number.

!

A x

" #

$ %

#&'(

)*++&',

Figure 2.1. A general trapezoidal fuzzy number.

To deal with the fuzzy concept in a crisp approximation space, Dubois and Prade presented the rough fuzzy set model in a fuzzy decision system, which is defined by as follows [121].

Definition 2.3.2. [121] A fuzzy decision system (FDS) is 4-tuple S =hU, AT = C∪D, V, fi, where U ={x_i|i∈ {1,2, . . . , n}} is a non-empty finite set of objects, called the universe; C is a non-empty finite set of condition attributes andD is a non-empty finite set of fuzzy decision attributes, C∩D=∅;V =V_C∪V_D, where V is the domain of all attributes, VC is the domain of condition attributes and VD is the domain of decision attributes; f is an information function from U×(C∪D) to V such thatf :U×C→V_C, f :U×D→[0,1].

Definition 2.3.3. [121] Let S = hU, AT =C∪D, V, fi be a FDS and B ⊆C. dê is a fuzzy subset on D, whered(x) (xê ∈U) denotes the degree of membership with respect to x in dê. The lower and upper approximations of dêare a pair of fuzzy sets onD in terms of the equivalence relation R_B, and their membership functions are defined as follows:

R_Bd(x) =^e inf{d(y)|ye ∈[x]_R_B}

RBd(x) =e sup{d(y)|ye ∈[x]RB}. (2.15) Due to the advantage of integrating two uncertainties (roughness and vagueness), this model has been widely applied for various applications (e.g., formal concept analysis [122], clustering [123], robust classifies [124], etc).

(31)

2.4 Chapter Summary

This chapter first briefly reviewed some basic rough set models, including the classical rough set model for categorical data as well as other four extended rough set models for noise data, numerical data, interval-valued data and missing data, respectively.

Furthermore, the basic concepts of fuzzy set theory and the rough fuzzy set model for fuzzy data are introduced. These concepts lay the foundation for the following chapters.

(32)

Extended Variable Precision Rough Sets for Dynamic Probabilistic Set-valued Data

Set-valued information systems (SvIS) are important generalized models of single- valued information systems, in which sets are used to characterize the imprecise and missing information. As aforementioned in Section 1.2, knowledge discovery from SvIS has been investigated extensively. However, there are no study on knowledge acquisition from Probabilistic Set-valued Information Systems (PSvIS) with the variation of attributes. PSvIS are the generalization of SvIS, which depict set-valued objects with probability distribution. These methods based on SvIS could not be employed to PSvIS directly. In this chapter, we present an extended VPRS approach for the incremental updating rough approximations under the variation of attributes in PSvIS.

The rest of this chapter is organized as follows. In Section 3.1, the concept of PSvIS and the extended VPRS based on theλ-tolerance relation are presented. In Section 3.2, the matrix characterization of rough approximations is proposed by defining the relation matrix associated with two matrix operators. Furthermore, by introducing the concept of region relation matrices, incremental mechanisms for the update of rough approximations are presented when the attributes change over time.

Section 3.3 develops the static and incremental algorithms for the computation of rough approximations. Section 3.4 reports experimental results, and the conclusions are presented in Section 3.5.

3.1 Probabilistic set-valued information systems (PSvIS)

This section introduces the basic concept of PSvIS and presents an extended VPRS model based on theλ-tolerance relation in PSvIS.

PSvIS are the extension of SvIS, which can be defined by as follows.

Definition 3.1.1. A PSvIS is a sextuple hU, AT = A^SD, V = V_A^SV_D, f, σ, Pi, where U = {x_i|i ∈ {1,2,· · · , n}} is a non-empty finite set of objects, called the universe. AT is a non-empty finite set of attributes, where A is a non-empty finite set of condition attributes and D is a decision attribute set with A^TD=∅. V =VAS

VD is the domain of attributes set AT, where VA is the set of condition attribute values, VD is the set of decision attribute values. f : U ×A → 2^V^A is