• Keine Ergebnisse gefunden

Defining Knowledge Graph Context

Background

2.1 Knowledge Graphs

2.1.3 Defining Knowledge Graph Context

Knowledge graphs are based on the mathematical field of graph theory. A knowledge graph is therefore a special type of a graph constructed to represent world facts. To define a KG, we begin by looking at the theoretical definition of a graph. For semantic consistency in this work, we represent the vertices and edges of a graph using the notationEandRto denote entities and relations, respectively. Therefore, a graph is defined as follows:

Graph - Graph Theory

Definition 2.1.2 (Graph) A graph is an ordered pair G=(E,R), whereEis a finite set called the set of vertices of G, andR ⊆E

2

is a set of pairs of elements inEcalled the set of edges of G.

• The edge r={v,e} ∈ Ris also denoted by r =ve.

• If r =ve∈ Ris an edge of G, thenvis adjacent to e andvis incident to r.

Building on this definition, a knowledge graph (KG) is a special graph that incorporates unique characteristics brought about by the type of information represented. The need to follow a given data format but allow for expressivity and reasoning makes KGs very interesting structures. Whereas a simple graph has single-valued vertices and simple labelled or unlabelled edges, a KG’s vertices are themselves sub-graphs that describe attributes of entities. Depending on the KG design, the edges may also be simple labelled edges or hyper-relational (where an edge has several attributes). Generally, a Knowledge graph is defined as follows:

Knowledge Graph

Definition 2.1.3 (Knowledge Graph) Generally: A knowledge graph KG is a labeled directed multi-graph expressed as a quadruple KG=((E,R),L,F,T)such that:

1. Eis the set of all the entities in KG that represent the vertices, referenced with unique identifiers:

e∈ E.

2. Ris the set of all relations in the KG that represent the edges between vertices, referenced by unique identifiers: r∈ R.

3. L represents the natural language labels of vertices and edges in the KG

4. F is a function F : (E∪R) → L that returns the label of any item in the KG such that given u∈(E ∪ R); lu∈L=F(u).

5. T is the ordered set of all triples in the KG represented asT ⊆ (E × R × E). Where a triple (h,r,t)∈ T implies h,t∈ E ∧ r∈ R=ht where h is the head entity and t is the tail entity.

The open-domain knowledge graphs published on the web, such as DBpedia, Wikidata, YAGO, and Freebase, are designed using semantic web concepts and principles. These group of KGs is referred to as the RDF Knowledge Graphs. RDF KGs differ from other types of KG (e.g., triple stores or graph databases) because they utilise RDF as the core formatting language. Consequently, these KGs exhibit specific characteristics that are relevant for use in different NLP and inference tasks. The models in this thesis are evaluated using datasets build for these knowledge graphs. Hence the knowledge context deployed in these models is also obtained from the open-domain KGs. We define as an RDF KG as follows.

RDF Knowledge Graph - Holistic View

Definition 2.1.4 (RDF Knowledge Graph) A RDF Knowledge Graph is a Knowledge Graph with additional set of vertices called concepts (classes) and special non-entity value vertices called literals.

Formally: A RDF Knowledge Graph KGRDF=((C,P,E,R,L),L,F,T+)Where:

1. Cis the set of all concepts in the RDF schema.

2. Pis the set of all properties between classes in the RDF schema.

3. Lis the set of all literals in the KG

4. T+is a set of all triples: subject,predicate,object(s,p,o)denoted as:(s,p,o)∈(T ∪ S ∪ I ∪L)

• S ⊆(C × P × C)is the set of RDF Schema triples defining the concepts and their relationships.

• I ⊆ S

e∈E,c∈C(e,isa,c|eX−c) S S

r∈R,p∈P)(r,isa,p|rX− p)is the set of instance of (is-a relations between entities and relations to Concepts and properties in the schema).

wd:Q974 wd:Q15

wd:Q6256 wd:Q5107

"Democratic Republic of the Congo"

"30 June 1960"

wd:Q3838

Africa

wdt:P30 wdt:P17

wd:Q3838

"Kinshasa"

xsd:string xsd:date

xsd:string xsd:string

wd:Q205895 wd:Q1048835

xsd:date

"Continent"

wdt:P297 wdt:P571

wd:Q5119 wdt:P1376

"capital of"

SCHEMA

instance of Domain Range

"1881"

xsd:date

concrete property

"1,200,000,000"

xsd:int

concrete entity literal

30,271,000 xsd:string

"capital" "inception"

"subclass of"

"Landmass"

"Political Territory"

"Country"

"continent"

"country"

"City"

Figure 2.3: Illustration of a Subgraph of Wikidata Depicting the schema modelling and the instance entities of an RDF knowledge graph. The types of relations depend on which part of the KG the relation is defined (The Terminologies Box T-Box or the Assertions Box)

• L⊂(E × P × L)∪(R × P × L)is the set of all triples where the object is a literal. This set defines, in part, the attributes of an entity.

After formally introducing the knowledge graph, we will spend the next part of this section to illustrate the richness of information contained in a KG. using the running example from figure1.2: “Result of the second leg of the African Cup Winners Cup final at the National Stadium on Friday: Arab Contractors - Egypt 4 Sodigraf Zaire 0, halftime 2:0 Scorers: Aly Ashour 7’, 56’(penalty), Mohamed Ouda 24’ 73’.Contractors won 4-0 on aggregate”, where the surface form“Zaire”disambiguates to Wikidata entityQ974. Figure2.3illustrates how the entityQ974“Democratic Republic of the Congo”is represented in the KG. The two major design partitions for RDF KGs are: i) The Schema, which defines the ontological terms (also referred to as Terminologies Bos / T-Box) consists of the concepts and their relationships that model the real-world objects. ii) The second partition is the Assertions Box. In this partition, the instance of the T-Box concepts are stored as entities, and the instances of the relations are stored as predicates to form triples (subject, predicate, object - s,p,o). With the open-world assumption of KG design, these concepts can be extended, and a limitless number of triples can be added to the A-Box.

Rep. Type Subj. Pred. Obj. Synopsis

s p o (s,p,o)

S Schema c∈ C p∈ P c∈ C c,p,c

I Instance e∈ E isa $∈ C ∪ P e,isa,$

L Literal v∈ C ∪ E ∪ R p∈ P literal v,p,literal

T E2E h∈ E r∈ R t∈ E h,r,t

Table 2.1:KGTriple Classification

Table2.1 shows a generic classification of triples in a KG, in which all the triples fall into one

of the four partitions: Schema, Instance, Literal Triples, and entity to entity (E2E) triples. Strictly modeled, the concepts in a RDF-Schema are represented using the triple format: (c, rdf:type, rdfs:Class), such that allc∈ Care instances ofrdfs:Class, while the properties are represented in the triple format: (p,rdf:type, rdfs:Property) indicating that all p ∈ Pare instances of rdf:Property. instances of a schema conceptc ∈ Cbecome a subject of a given property p ∈ P, if the special triple: (p,rdfs:domain,c) exists inS. On the other hand a property in the Schema can either take instances of a concept or literal values. Instances of a Schema conceptc∈ Cbecome objects of a propertyp∈ P, if the special triple: (p,rdfs:range,c) exists inS. properties which take instances of concepts are referred to as object properties, where as those that take literals are datatype properties. For all (s,p,o)∈ T+, we therefore have the following tautologies in a RDF Graph:

1. (s,rdf:type,c1)∈ I =⇒ (p,rdfs:domain,c1)∈ S 2. (o,rdf:type,c2)∈ I =⇒ (p,rdfs:range,c2)∈ S 3. if 1 & 2 then (c1,p,c2)∈ S

Item synopsis Example Verbalised

(s|o) (s,p,o)

Domain p∈ P |c∈ C wdt:P30, rdfs:Domain, wd:Q6256 continent domain Country Range p∈ P |c∈ C wdt:P30, rdfs:Range, wd:Q5107 continent range Continent SubClass c∈ C |c∈ C wd:Q6256, wdt:P297, wd:Q1048835 Country subclass of Political Territory SubProperty p∈ P |p∈ P wd:P276, wd:P1647 , wd:P361 location sub property of part of inverse p∈ P |p∈ P wd:P36, wdt:P1696, wd:P1376 capital inverse property of capital of

Instance c∈ C |c∈ C wd:Q974, wdt:P31, wd:Q6256 Democratic Republic of the Congo instance of Country

Table 2.2: ExampleKGrepresentation of properties and relations. Where: p∈ Pdenotes properties, andc∈ C denotes Classes/Concepts in the schema

Knowledge Graph Entity Context

Definition 2.1.5 (Knowledge Graph Entity Context) A knowledge graph context (KG-Context) is a single fact from the KG expressed as a triple s,p,o - subject, predicate, object according to the definition of the setT+.

Entities in a KG naturally exhibit these triples. Given the classification of KG triples in table2.1only the triples in the KG schema (i.e. setS) do not apply to any specific entity. In this work we view an entity as a collection of two sets of contextual information namely: i) knowledge that refers to the attributes of the entity which we denote asAe, and ii) information that refers to the relationship of an entity to other entities, obtained by following the outgoing edges of the entity, which we denote asTe.

1. Ae=(s,p,o)∈ I ∪ L |s=e all instance of and literal triples where the head entity is e.

2.

Te =













hop=1

outDegree(h)

S

i=1 (F(h),F(r),F(t) else (hhop,r,t)∈ T |hhopX−hhop−1

(2.1.1)

Inducing Semantic Knowledge in Deep Learning ModelsIn this section, we have defined “Know-ledge Context”especially in the purview of a knowledge graph (KG). We take a systematic look at the underlying technologies that enabled research into KG that culminates in Semantic Web technologies such as RDF for knowledge representation. We saw the need to describe the KG from the theoretical definition of a Graph to the structure and information represented in a KG. Having this in-depth view of how a KG is structured and the depth of knowledge represented is vital in addressing the challenges we define in chapter1. For example challenges 2, 3, and 4 in section1.1involve accessing attributes of an entity (challenge 2), or triples (challenge 3) or evaluating the relevance of different forms of information from the KG (challenge 4). This thesis will combine semantically rich knowledge from several knowledge sources with the power exposed in machine learning concepts to achieve targeted research objectives. In the next section (section2.2) we introduce concepts in Machine Learning that will be vital in our contribution chapters (cf. chapters5,6, and7) in next section.