Morph-RDB - Related Work 62 - Global-as-View Ontology-Based Data Access for Relational Data [09

7. Related Work 62

7.5. Morph-RDB

Morph-RDB [31] is an OBDA system based on R2RML mappings. In addition to the relational databases MySQL, PostgreSQL and H2, Morph-RDB also allows for querying CSV files and the column store MonetDB¹⁶ as RDF graph. The OBDA system does not create any views in the relational database and has only one phase in which SPARQL queries are translated to SQL queries. A set of query optimizations are used to enhance the performance of SQL queries. Some of the optimization strategies Morph-RDB uses during query translation from SPARQL to SQL are self-join elimination, subquery elimination, and left-outer self-join elimination. The OBDA system does not support any inference. Therefore, underlying relational data may be queried with SPARQL, but no new triples can be inferred based on the given ontology.

Besides as OBDA system, Morph-RDB can also be used as R2RML processor.

This means that based on the input R2RML mapping and the underlying database, Morph-RDB can create and save an RDF graph consisting of the triples specified by the mapping and the underlying data. However, this feature can be achieved by most OBDA system by simply issuing a query that retrieves all triples against the OBDA system. Nonetheless, actual R2RML processors might be more performant for generating RDF graphs from relational data than OBDA systems.

15http://d2rq.org/ last retrieved 19.05.19

16https://www.monetdb.org/ last retrieved 18.09.2019

8. Conclusion and Future Research

Relational databases are the most frequently used databases. In order to include rela-tional data to knowledge graphs, OBDA systems can be used. Based on an ontology that serves as global schema and mappings from relational data onto this ontology, relational data can be queried with SPARQL.

In this thesis a formal framework for OBDA systems has been introduced. Based on this framework the OBDA system Ultrawrap^OBDA has been formally defined.

Ultrawrap^OBDA uses views and materialized views to create a virtualized graph that is queryable with SPARQL. After reimplementing this system, two optimizations have been made: i) The amount of columns in views has been reduced and subsequently the space needed to store materialized views has been reduced. ii) The support of instances of superclasses that are not instances of any of their subclasses has been added. In the reimplemented unoptimized system instances of superclasses also had to be instances of at least one subclass of the superclass.

The reimplementation, the optimized reimplementation and the state of the art OBDA system Ontop have been benchmarked with the Texas Benchmark, which is a benchmark created especially for OBDA systems. Results of the benchmark show that the average query execution times of the reimplemented and optimized sys-tem are comparable even though the optimized syssys-tem supports exclusive superclass instances and the space required to materialize views has been reduced by approx-imately 55%. The comparison of the execution times of Ultrawrap^OBDA with the reimplemented system shows that Ultrawrap^OBDA needs averagely 3.14 times longer to execute queries than the reimplemented system. Ontop needs needs averagely 1.87 times longer than the reimplementation. However, the reimplementation returns du-plicate results and Ontop does not. Therefore, one aspect that could be addressed in future research is to identify, which duplicates should be retained based on the bag semantics of SPARQL as described in [32] and which duplicates should be eliminated.

Currently, the implemented OBDA systems supports SPARQL triple patterns, pro-jection, joins, optionals and unions. In the future more SPARQL features could be added to the system, such as filters, minus or property paths. Furthermore, when unwanted duplicate results are eliminated, the system could be extended to support aggregation functions. Another SPARQL feature that could be added are UPDATE functions to add data to the relational database via SPARQL queries. Furthermore, whenever the relational database is updated, the materialized views should be up-dated, too. Thereby, the transactional security of the OBDA system may be main-tained. The system developed in this thesis only supports a subset of OWL2 QL and could be extended to fully support OWL2 QL.

Furthermore, benchmark query execution times that were measured on cold and warm caches could be compared to compare speed ups. Since the Texas Benchmark evaluates only very specific features of OBDA systems as described in section 6.4 the implemented OBDA system could be evaluated more extensively with other OBDA benchmarks such as the NPD benchmark [33], or a new benchmark could be

devel-oped, which focuses on measuring execution times of diverse queries as well as on testing for completeness and correctness of results.

Acknowledgments

First and foremost I want to thank my advisor Daniel Janke not only for the support during this thesis but for all the things I have learned during my masters program.

Secondly, I want to thank Martin Leinberger, without whom this thesis would not have been possible. With Daniel and Martin each meeting had the perfect balance of productive discussions and laughs.

I also want to thank Steffen Staab for his guidance during this thesis, and for all the possibilities I had during my masters program.

Next, I want to thank Frederik Rüther, Nick Theisen and Thies Möhlenhof for proof-reading this thesis and for every beer we had together during our studies.

Finally, I want to thank my family and my girlfriend for their constant support during this thesis and my whole studies.

References

[1] J. F. Sequeda, Integrating Relational Databases with the Semantic Web. PhD thesis, University of Texas at Austin, 5 2015.

[2] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez-Muro, and G. Xiao, “Ontop: Answering sparql queries over rela-tional databases,” Semantic Web, vol. 8, 02 2016.

[3] J. F. Sequeda, M. Arenas, and D. P. Miranker, “Obda: Query rewriting or ma-terialization? in practice, both!,” inThe Semantic Web – ISWC 2014(P. Mika, T. Tudorache, A. Bernstein, C. Welty, C. Knoblock, D. Vrandečić, P. Groth, N. Noy, K. Janowicz, and C. Goble, eds.), (Cham), pp. 535–551, Springer Inter-national Publishing, 2014.

[4] M. Lanthaler, D. Wood, and R. Cyganiak, “RDF 1.1 concepts and abstract syntax,” W3C recommendation, W3C, Feb. 2014.

http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.

[5] N. Guarino, D. Oberle, and S. Staab,What Is an Ontology?, pp. 1–17. 05 2009.

[6] J. Weaver and J. A. Hendler, “Parallel materialization of the finite rdfs clo-sure for hundreds of millions of triples,” in The Semantic Web - ISWC 2009 (A. Bernstein, D. R. Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Motta, and K. Thirunarayan, eds.), (Berlin, Heidelberg), pp. 682–697, Springer Berlin Heidelberg, 2009.

[7] S. Harris and A. Seaborne, “SPARQL 1.1 query language,” w3c recommen-dation, W3C, Mar. 2013. http://www.w3.org/TR/2013/REC-sparql11-query-20130321/.

[8] J. Pérez, M. Arenas, and C. Gutierrez, “Semantics and complexity of sparql,” in The Semantic Web - ISWC 2006 (I. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. M. Aroyo, eds.), (Berlin, Heidelberg), pp. 30–43, Springer Berlin Heidelberg, 2006.

[9] B. He, M. Patel, Z. Zhang, and K. C.-C. Chang, “Accessing the deep web,”

Commun. ACM, vol. 50, pp. 94–101, May 2007.

[10] R. Elmasri and S. Navathe,Fundamentals of Database Systems, ch. 3, pp. 59–85.

USA: Addison-Wesley Publishing Company, 6th ed., 2010.

[11] S. Ceri and G. Gottlob, “Translating sql into relational algebra: Optimization, semantics, and equivalence of sql queries,” IEEE Transactions on Software En-gineering, vol. SE-11, pp. 324–345, April 1985.

[12] G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, and M. Zakharyaschev, “Ontology-based data access: A survey,” inProceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 5511–5519, International Joint Conferences on Artificial Intelligence Or-ganization, 7 2018.

[13] A. Chebotko, S. Lu, and F. Fotouhi, “Semantics preserving sparql-to-sql trans-lation,” Data & Knowledge Engineering, vol. 68, pp. 973–1000, 10 2009.

[14] R. Cyganiak, “A relational algebra for sparql,” 01 2005.

[15] U. S. Chakravarthy, J. Grant, and J. Minker, “Logic-based approach to semantic query optimization,” ACM Trans. Database Syst., vol. 15, pp. 162–207, June 1990.

[16] Q. Cheng, J. Gryz, F. Koo, T. Y. C. Leung, L. Liu, X. Qian, and K. B. Schiefer,

“Implementation of two semantic query optimization techniques in db2 universal database,” in Proceedings of the 25th International Conference on Very Large Data Bases, VLDB ’99, (San Francisco, CA, USA), pp. 687–698, Morgan Kauf-mann Publishers Inc., 1999.

[17] S. T. Shenoy and Z. M. Ozsoyoglu, “A system for semantic query optimization,”

inProceedings of the 1987 ACM SIGMOD International Conference on Manage-ment of Data, SIGMOD ’87, (New York, NY, USA), pp. 181–195, ACM, 1987.

[18] D. J. DeWitt, “The wisconsin benchmark: Past, present, and future,” in The Benchmark Handbook, 1991.

[19] S. Sundara, S. Das, and R. Cyganiak, “R2RML: RDB to RDF mapping language,” W3C recommendation, W3C, Sept. 2012.

http://www.w3.org/TR/2012/REC-r2rml-20120927/.

[20] A. Skubella, D. Janke, and S. Staab, “Beseppi: Semantic-based benchmarking of property path implementations,” inThe Semantic Web(P. Hitzler, M. Fernández, K. Janowicz, A. Zaveri, A. J. Gray, V. Lopez, A. Haller, and K. Hammar, eds.), (Cham), pp. 475–490, Springer International Publishing, 2019.

[21] D. Brickley and R. Guha, “RDF schema 1.1,” W3C recommendation, W3C, Feb.

2014. http://www.w3.org/TR/2014/REC-rdf-schema-20140225/.

[22] I. Horrocks, B. C. Grau, Z. Wu, A. Fokoue, and B. Motik, “OWL 2 web ontology language profiles,” W3C recommendation, W3C, Oct. 2009.

http://www.w3.org/TR/2009/REC-owl2-profiles-20091027/.

[23] M. Rodríguez-Muro and D. Calvanese, “Quest, a system for ontology based data access,” CEUR Workshop Proceedings, vol. 849, 01 2012.

[24] M. Giese, A. Soylu, G. Vega-Gorgojo, A. Waaler, P. Haase, E. Jiménez-Ruiz, D. Lanti, M. Rezk, G. Xiao, O. Ozcep, and R. Rosati, “Optique: Zooming in on big data,” Computer, vol. 48, pp. 60–67, Mar 2015.

[25] A. Bertails, E. Prud’hommeaux, M. Arenas, and J. Sequeda, “A direct map-ping of relational data to RDF,” W3C recommendation, W3C, Sept. 2012.

http://www.w3.org/TR/2012/REC-rdb-direct-mapping-20120927/.

[26] E. Jiménez-Ruiz and B. Cuenca Grau, “Logmap: Logic-based and scalable on-tology matching,” in The Semantic Web – ISWC 2011 (L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy, and E. Blomqvist, eds.), (Berlin, Heidelberg), pp. 273–288, Springer Berlin Heidelberg, 2011.

[27] H. Kllapi, E. Sitaridi, M. M. Tsangaris, and Y. Ioannidis, “Schedule optimization for data processing flows on the cloud,” inProceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, (New York, NY, USA), pp. 289–300, ACM, 2011.

[28] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodriguez-Muro, R. Rosati, M. Ruzzi, and D. F. Savo, “The mastro system for ontology-based data access,” Semant. web, vol. 2, pp. 43–53, Jan. 2011.

[29] M. Namici, “R2RML mappings in OBDA systems: Enabling comparison among OBDA tools,” CoRR, vol. abs/1804.01405, 2018.

[30] V. Eisenberg and Y. Kanza, “D2rq/update: Updating relational data via virtual rdf,” WWW’12 - Proceedings of the 21st Annual Conference on World Wide Web Companion, 04 2012.

[31] F. Priyatna, O. Corcho, and J. Sequeda, “Formalisation and experiences of r2rml-based sparql to sql query translation using morph,” pp. 479–490, 04 2014.

[32] C. Nikolaou, E. V. Kostylev, G. Konstantinidis, M. Kaminski, B. C. Grau, and I. Horrocks, “Foundations of ontology-based data access under bag semantics,”

Artif. Intell., vol. 274, pp. 91–132, 2019.

[33] D. Lanti, M. Rezk, G. Xiao, and D. Calvanese, “D.: The npd benchmark: reality check for obda systems,” inIn: Proceedings of the 18th International Conference on Extending Database Technology (EDBT, pp. 617–628, 2015.

A. Overview of symbols

Symbol Meaning

I The set of all IRIs.

B The set of all blank nodes.

L The set of all literals.

I BL I ∪ B ∪ L.

tr An RDF triple.

G A set of RDF triples called RDF graph.

Tontological The set of ontological terms.

O An ontology.

τ_tr^G The evaluation of the ontological triple tr over the RDF graphG. V The set of possible variables in a SPARQL query.

tp A triple pattern. tp∈ (I BL ∪ V ) × (I ∪ V ) × (I BL ∪V). T P The set of triple patterns.

P A graph pattern.

var(P) The set of variables in P. µ A variable binding.

dom(µ) Function returning the domain of µ.

µ(tp) The triple obtained by replacing all variables intp according toµ. Ω A set of variable bindings.

Q A SPARQL query.

D A domain.

A An attribute name.

R(A1, A2, ...An) A relation schema with the attributes A1, A2, ...An. dom(Ai) Returns the domain of the attributeAi.

att(R) Returns the set of attributes in R.

r= {tu₁,tu₂, ...tun} A relation.

tu=<v₁, v₂, ...v_n> A tuple in a relation with the valuesv₁, v₂, ...v_n. tu[A_i] Returns the ith value v_i of a tuple tu.

S= {R₁, R₂...R_n} A relational schema.

s= {r₁, r₂...r_n} An instance of a relation schema.

ϕ A relational algebra expression.

σ_cond(ϕ) Selection in ϕsatisfying the cond.

π_A₁_,A₂_,...A_n(ϕ) The projection of the attributes A1...An inϕ. ρ_A₁_→A₂(ϕ) Rename of the attribute A1 to A2 inϕ.

ϕ1⊎ϕ2 The outer union of ϕ1 and ϕ2. ϕ1∖ϕ2 The difference of ϕ1 and ϕ2. ϕ1×ϕ2 The cross join ofϕ1 andϕ2. ϕ1&_cond

Aϕ2 The theta join of ϕ1 and ϕ2 with the condition cond.

ϕ₁d|><|condAϕ₂ The left outer join ofϕ₁ andvarphi₂ with the condition cond.

θ A mapping template.

Im Dokument Global-as-View Ontology-Based Data Access for Relational Data [09/2019] (Seite 75-83)