• Keine Ergebnisse gefunden

HTML5 canvas

5.7 Converting vocabularies to SKOS for web usage

thesaurus management tool used for the marine thesaurus has to have some key technical and thematic characteristics27:

• technical

◦ expose terms respectively vocabularies through services

◦ import thesauri in SKOS format

◦ offer user management with roles and different rights

◦ include collaborative content management

• thematic

◦ support multiple languages (multilingual)

◦ allow relationships between terms

Table 5.4 shows all these characteristics and if the tools from subsection 3.5.2 support them. It can be clearly seen that the features are almost the same. PoolParty is not Open Source (and thus not free) making PoolParty not an option. Although some features of MMI ORR are not known MMI ORR is not an option anyway. Firstly because you depend on them which means whenever they change something or if their services are unavailable there is nothing you could do about that. Secondly because we do not own the rights to all the existing vocabularies legal issues would be problematic.

That leaves the two options TemaTres and iQvoc which are identical featurewise. An aspect in disfavour of TemaTres is their relative old-fashioned technology (mainly PHP) and their inadequate documentation (e.g. it is almost impossible to find information about their HTTP API). On the grounds of these considerations iQvoc is the tool which will be used to manage the vocabularies.

Table 5.4:Comparison of web based thesaurus management tools PP

PP PP

PPP Feat.

Tool TemaTres PoolParty iQvoc MMI ORR

Services HTTP API HTTP API28

and SPARQL HTTP API HTTP API and SPARQL

Multilingualism + + + +

Relationships

between terms + + + +

Import SKOS, tabbed

txt

SKOS, CSV, Zthes29

SKOS N-Triples

CSV, Turtle, RDF/XML, N3, N-Triples

User management + + + NA

Collaborative

con-tent management + +30 + NA

Open Source + - + NA

5.7 Converting vocabularies to SKOS for web usage

The preceding section (section 5.6) defined the requirements to build a marine thesaurus. These are the base for the actual implementation in this section. The existing vocabularies were introduced in section 3.4 on page 60 and are all in Excel format. Figure 5.30 depicts excerpts of the different vocabularies the application has to handle. Although these word lists are easy to comprehend for humans the lists cannot be used for indexing (metadata) or search because the Excel format does not allow computer systems to “understand” or make use of the vocabularies. However, converting word

27Being Open Source is an additional characteristic due to lacking funds for commercial software.

28http://poolparty.biz/de/skos-without-sparql-poolparty-skos-api/

29https://grips.semantic-web.at/pages/viewpage.action?pageId=40437853

30http://poolparty.biz/poolparty-functionalities-features-at-a-glance/

lists into SKOS format which was introduced in 2.4.2.2 on page 37 changes that. Vocabularies in SKOS form can be handed over to SKOS management tools like TopBraid Enterprise Vocabulary Net or iQvoc. Through the use of such tools the vocabularies can be visualized or maintained through a web browser which enables specialists to contribute their knowledge to a vocabulary.

(a) K ¨uste (b) LHM

(c) NOKIS

Figure 5.30:Examples for existing German marine vocabularies (excerpts)

Subsection 5.7.1 will lay the foundations for the implementation, looks at alternatives and what can be learned from existing tools. Afterwards subsection 5.7.2 will develop a concept for the implementation which will be described in subsection 5.7.3.

5.7.1 Foundations

Subsection 3.5.1 on page 62 introduced tools that can be used to convert a given format into SKOS format. Subsection 5.6.1 on page 134 concluded that none of the existing tools fulfil the requirements proposed in section 5.6 on page 134. However, another way to conduct such a conversion without using an existing tool is using Extensible Stylesheet Language Transformations (XSLT). Through the use of XSLT code and an XSLT processor the desired results can be achieved. However, this approach is not generic enough because the XSLT code would have to be (re-)written for every word list which means reusability would be low if there is more than one vocabulary. Furthermore additional functionalities (see subsection 5.7.3) would not be possible using XSLT.

Although no tool was able to fulfil the requirements there may be lessons that can be learned from these existing implementations.Skosifyis not suitable for the conversion of arbitrary word lists (e.g.

in Excel or CSV format) into SKOS because it accepts files only in formats of the semantic web (RDF and OWL) and such conversions could easily be done with frameworks like Sesame 2 as well.

However, it provided some usable knowledge through its processing steps. Steps such as making sure the vocabulary has askos:ConceptSchemeand performing validations such as making sure that there is only oneskos:prefLabelper language are important for an own implementation.

Open Refine was the tool closest to fulfilling the requirements but cannot handle hierarchies or additional functionalities (see subsection 5.7.3). However, letting the user select the resources and

5.7 Converting vocabularies to SKOS for web usage 137 literals and what they will become in the RDF document can be considered for an own implementation as well as defining an URI.Zthesalso lets the user specify a URI. Users can specify a base name space and a concept scheme ID which prepends the name space in the URI which is interesting, too. In Voc2skos the URI can be defined using anontologyURIelement in the preamble of a CSV (Comma-separated values) document. FurthermoreVoc2skoseven supports hierarchies through the use of indentation in a CSV file. However, the problem is that the user has to alter the data (change headings, indentation etc.). The same problem can be found in theExcel to SKOS/RDF conversion tool which requires the data to adapt and not the other way round. Both approaches violate the “rule” of data life time (Christl, 2013)“Your software will go away. Your data is going to stay.”

In addition to the points made in subsection 5.6.2 on page 134iQvocis used as thesaurus web man-agement tool because one of the project partners of the MDI-DE project is the Federal Environmental Agency (UBA) and they initiated the development of iQvoc. Furthermore iQvoc – just like SKOS itself – builds upon the four principles of the Linked (Open) Data Concept by (Berners-Lee, 2003):

• Use URIs as names for things.

• Use HTTP URIs so that people can look up those names.

• When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL).

• Include links to other URIs, so that they can discover more things.

5.7.2 Concept

The preceding section underlined what is important when setting up or converting a SKOS docu-ment:

(1) Do not change the data – the tool has to adapt to the data

(2) Let the user specify what SKOS properties the resources and literals, i.e. column headings, will become

(3) Setup the Document and make sure it uses askos:ConceptSchemeand HTTP URIs

However, the first step for a tool that converts vocabularies to SKOS format of course is to import and load the file. The next step is to setup the SKOS document which means to specify name space loca-tions for SKOS and RDF in the preamble of the document. Subsequently askos:ConceptScheme (note the prefixskosthat is now usable) will be specified that uses an HTTP URI to which all the concepts will point. In order to do this the user has to specify a base URI which typically is the URL of the server that is hosting the thesaurus, e.g. http://www.example-thesaurus.com. Now all the concepts could point to http://www.example-thesaurus.com#ExampleConceptScheme and are available as http://www.example-thesaurus.com#ExampleConcept. The problem with URIs like that and the combination of more than one vocabulary is that for example the term “beach” might appear in more than one vocabulary and it cannot be stored as http://www.example-thesaurus.com#Beach multiple times. This means that an additional term is needed to narrow down the vocabulary the term originates from. Now the term is available under http://www.example-thesaurus.com/#Beach LHM for example.

After the import of the vocabulary and the setup of the SKOS document the user is presented with the column headings and selects the suiting SKOS properties and their language for each column.

Additionally the user may select the hierarchy level of a column for vocabularies like LHM depicted in figure 5.30b on page 136.

Afterwards the vocabularies are read line by line which fills the SKOS document with concepts (each line is a concept). Relations between concepts of the same vocabulary and/or other vocabularies can be incorporated after or within the preceding step. The final step is the saving of the document to disk in a semantic format like RDF XML, Turtle or N-Triples whereat iQvoc only accepts imports in N-Triples format.

5.7.3 Implementation (JSKOSify)

Based on the concept from the preceding section six steps evolved which need to be implemented:

1) Import vocabulary 2) Setup Document

3) User specifies what SKOS properties the columns will be 4) Fill document

5) Relationships (Hierarchies, Matches in other thesauri etc.) 6) Save document

These steps are reflected in the overview of the implementation depicted in figure 5.31 which also shows the division into two main classes. One class (JSKOSifyImpl) implements the steps 1, 2 and 4 to 6 and the other class (JSKOSifyGUI) uses the functions ofJSKOSifyImpland provides the user a Graphical user interface (GUI) to carry out step 331. The three green helper classes will implement parts of step 5. The next sections will detail every step and the whole code is available on GitHub32.

JSKOSifyImpl JSKOSifyGUI

setupDocument()

fillDocument()

addRelatedTerms()

saveRDF() getHeaders()

findConceptWithURI() findConceptWithURI()

initialize() buildConcept()

buildElem()

saveFile() updateTable()

Import CSV

startButton

Legend

Function Button/Menu Item

Class Function Call

Membership addSemantics()

GemetFinder UmthesFinder AgrovocFinder

Figure 5.31:JSKOSify overview (classes and functions)

31see listing A.13 on page 159 for exemplary GUI class methods

32https://github.com/Sicky/JSKOSify

5.7 Converting vocabularies to SKOS for web usage 139 1) Importing vocabularies

Although the existing vocabularies are in Excel format they were converted to CSV (Comma Separated Values) format because no Excel specific functionalities were used. Furthermore Excel is a proprietary format and there are free libraries which can work with CSV. The free and open source Java library CSVReader33 was chosen that imports CSV files row by row based on the column headings which suits the implementation approach as all word lists have column headings because otherwise the lists would be hard to interpret.

The GUI of JSKOSify calls a function calledgetHeaders()when importing a CSV file. This function uses CSVReader for the table presented to the user (to be able to assign the SKOS properties to the columns, see figure 5.33 on page 140) that is constructed by the functionupdateTable(). CSVReader will also be used in the functionfillDocument()that will be detailed later.