• Keine Ergebnisse gefunden

requirements in terms of data coverage, we can ask which parts of the data infrastructure (semantic layer and data

layer) already exist, and identify the implementation gaps that remain to be addressed:

2.5.1 Data Model and Ontologies

Data models or ontologies are part of the semantic infrastructure of a linked open data ecosystem. Ontologies exist in various forms, which differ in their complexity. Ontologies of lower degrees of complexity are sometimes named catalogs, glossaries, thesauri or taxonomies and are generally referred to as

“controlled vocabularies”. If different data sets are described using the same ontologies, they are interoperable at the semantic level.

The Data Model for the Swiss Performing Arts Platform (Estermann &

Schneeberger 2017) covers all the aspects of the common core of the international linked open data ecosystem for the performing arts as well as most of the needs of the use scenarios related to heritage, research &

education, online consumption, as well as coverage and re-use. Large parts of this conceptual model have already been implemented on Wikidata and are available as a formal ontology within Wikidata9. Its documentation requires improvement, and there is a series of open data modelling issues that need to be resolved in order to harmonize practices. Implementation of the conceptual model in classical RDF, mostly based on existing RDF ontologies, is currently underway at the Swiss Archives for the Performing Arts10.

The existing conceptual model does not cover some of the transactional needs along the performing arts value chain (e.g. related to the acquisition of performance rights, the hiring of artists, the engagement of production companies, the rental of venues, or the sales of tickets). The conceptual model will need to be extended accordingly. As the transactional aspects do not lend themselves to implementation on Wikidata, they will need to be implemented in the context of specialized platforms supporting online transactions between stakeholders.

_______________

9 See: https://www.wikidata.org/wiki/Wikidata:WikiProject_Performing_arts/Data_structure

10 For an overview of RDF data models and ontologies relevant to the performing arts, see the listing on the

“Data structure” page of the Wikidata project: https://www.wikidata.org/wiki/Wikidata:WikiProject_Performing_arts/

Data_structure. For the work in progress at SAPA, see: https://sapa.github.io/spa-specifications/

2.5.2 Typologies / Vocabularies

Typologies and vocabularies are agreed-upon sets of specific manifestations of a given characteristic. In the course of first data ingests on Wikidata,

different typologies and vocabularies related to the performing arts have been implemented, e.g. regarding performing arts genres, types of music theater performances, performance types, or voice types11. A whole set of typologies and vocabularies have also been identified in the context of the DOREMUS project12.

2.5.3 Base Registers / Authority Files

So-called “named entities” are used to uniquely identify the different instances of a class. By providing persistent identifiers for the various entities, shared registers with named entities allow to make statements about the same person, the same organization, the same administrative unit, etc. within the scope of different data sets. In the context of statistical offices and other public authorities, such registers are commonly referred to as “base registers”. These registers are typically

expected to list all existing instances of a class within a given

administrative-territorial unit and are typically maintained by a public authority that has the legal mandate to do so. Their equivalent in the library world are called “authority files”

and serve, for example, to unambiguously identify persons or works in the context of a library catalog. Since linked data is meant to link data across organizational and domain boundaries, base registers and authority files are nowadays

often used beyond their original domain. Insofar as different base registers or authority files describe the same instances, concordance databases are used to map equivalent entities between them – a prominent example is the Virtual International Authority File (VIAF), which links the authority files of the national libraries of various countries. Another prominent example of such a central data hub for “entities” of different classes is Wikidata (Allison-Cassin & Scott, 2018).

So far, several base registers and authority files relevant for the performing arts have been made available as linked open data, such as the Virtual Authority File (VIAF) or the International Standard Name Identifier (ISNI).

The objective of VIAF consists in providing a reference source for libraries, archives, and museums worldwide and to reduce cataloguing costs through the pooling of data. For this purpose, VIAF clusters authority files of national libraries and countrywide union catalogues as well as specialized databases and makes them available for anyone to use. It initially started out with persons and corporate bodies but has since expanded its scope to include works, expressions, meetings, and geographic names. Its development is primarily driven by libraries (Angjeli et al., 2014).

_______________

The objective of ISNI, on the other hand, is to facilitate research and discovery of resources and to streamline business transactions across domains. The focus is on providing unique, global, cross-domain, standard, persistent identifiers for persons and organizations involved in the production and exploitation of creative content. The development of ISNI is mainly driven by libraries, rights management societies, stakeholders of the book supply chain, aggregators and service suppliers (Angjeli et al., 2014).

Both VIAF and ISNI aggregate existing identifiers without superseding them. As VIAF is among ISNI’s sources for the aggregation of person and organization data, there is an important overlap between the two. However, while VIAF primarily serves as an automatized hub for decentrally managed authority files, ISNI takes a more centralizing role in standardizing the data by correcting the clustering of data and by excluding clusters that appear as sparse or undifferentiated. While ISNI is edited by its own quality team, it is accepting feedback from the public through a “monitored crowdsourcing” mechanism.

The ISNI database is expected to evolve into a reliable, shared authority file at a global level (Angjeli et al., 2014).

As Vrandecic and Krötzsch (2014) note, Wikidata is developing into yet another global aggregator of authority files by providing links to a variety of such

resources. In contrast to VIAF and ISNI, Wikidata is much wider in scope, covering the variety of concepts that can be found in an encyclopaedia (see table 2 for an overview of the number of entities for various classes contained in Wikidata). Furthermore, it relies on crowdsourcing and is free for anyone to edit. It is heavily interlinked with VIAF and ISNI13. A further reference source for music albums, musical works and performers is MusicBrainz, another community project relying on crowdsourcing. Further reference sources that might be of interest in the context of the performing arts are: the Internet Movie Database (IMDb) for actors, Songkick for concerts, and Discogs as an alternative to MusicBrainz for performers14. Table 2 gives an overview of the number of entities for each class contained in Wikidata.

______________

13 1.38M Wikidata entries are currently linked to one of the over 30 mio. VIAF clusters, while 1.06M Wikidata entries are linked to one of over 10M ISNI identities; Wikidata itself currently has 4.96M entries for people and 1.6M entries for organizations.

14 In a specific regional or national context, further reference sources might be of interest, such as the Canadian Encyclopedia, the Dictionnaire du théâtre en Suisse, etc.

Table 2: Number of entities in Wikidata (as of spring 2019)

Entities of the Class Wikidata15

Musical work 420’000

Edition of a musical work 570

Choreographic work 880

Edition of a choreographic work 4

Play (incl. opera) 21’000

Edition/translation of a play (incl. opera) 650

Character role (in play or opera) 11’000

Performing arts building 20’000

There is already a host of performing arts related content available on the Internet. Some prominent platforms to go to are Youtube, Wikimedia

Commons, or Europeana. They are all in the process of improving the metadata about their content. Youtube, for example, is nowadays using ISNI identifiers to refer to artists and is increasingly providing structured data about the works to be found on the platform. Wikimedia Commons is in the process of being transferred to a new platform infrastructure that uses the same software extension as Wikidata to store structured data. In the future, metadata on Wikimedia Commons will be available as linked open data through a SPARQL endpoint, and the IIIF standard will be supported to facilitate cross-platform exchange and manipulation of digital content. Similarly, metadata from Europeana is provided as linked open data, while the access to the actual content depends on the various decentralized data providers.

While some of the digital content is out of copyright or is made available under a free copyright license (as is regularly the case for content published on

Wikimedia Commons), other content has been published under a proprietary license or without a proper rights statement (as is most often the case for

content published on Youtube). User requirements regarding the licensing situation vary depending on the usage scenario.

2.5.5 Data Available as Linked Open Data

Linked data publication in the area of the performing arts is still in a pilot

phase. So far, only a small share of performing arts related data is available as linked open data. Examples of datasets that contain data about performing arts productions or performance events include:

- AusStage Australian Live Performance Database (more than 100,000 performance events);

- Carnegie Hall Performance History16 (approx. 50,000 performance events);

- Database of the Flanders Arts Institute (1993-2018), published on Wikidata (approx. 12,000 performing arts productions);

- Repertoire of Schauspielhaus Zürich (1938-1968), published on Wikidata (approx. 700 performing arts productions);

- Database of the DOREMUS project17

- Database of the Swiss Archive for the Performing Arts (in the process of being published)18.

Further services, such as the Austrian performance data platform Theadok19 and the Frankfurt-based Specialized Information Service Performing Arts20 are planning to publish their data as linked data in the near future.

Based on these first pilot datasets, data modelling issues should be addressed systematically in order to harmonize data modelling practices. As could be demonstrated in the case of Wikidata, many critical data modelling issues await resolution21. Similar issues are to be expected whenever several databases are integrated and/or exploited in combination with each other.

There are important databases in adjacent areas that have a high potential for interlinking with performance histories, such as a variety of library databases (literary and musical works) or MusicBrainz (musical works, expressions of musical works, performers).

Further databases are available online, free from copyright restrictions, but have not yet been published as linked open data.

_______________

2.6 Bootstrapping the Linked Open Data Ecosystem for the Performing Arts

Today, the linked open data ecosystem for the performing