• Keine Ergebnisse gefunden

MM2RDF: A framework for extraction of semantic multimedia-metadata

N/A
N/A
Protected

Academic year: 2022

Aktie "MM2RDF: A framework for extraction of semantic multimedia-metadata"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

MM2RDF: A Framework for Extraction of Semantic Multimedia-Metadata

Daniel Gerber daniel.gerber@me.com

It is not possible to imagine a person’s daily routine without multimedia anymore. Mul- timedia accompanies us as we leave the front door in the morning and turn on the mp3 player, buy a tram-ticket on a smartphone, check for new mail with the help of a netbook at the park or simply watch a television series after a hard day’s work. We do not only con- sume or generate an extensive amount of multimedia but also hoard everything we can, due to the increasing storage capabilities of new end user devices. This leads to a multimedia oversupply and an information fragmentation on users’ desktops and furthermore to an aggravated retrieval of multimedia content.

In this paper we present a framework which deals with the abovementioned problem, by extracting metadata from files and transforming them to a semantically enriched and ma- chine processable format. This framework is part of a larger task which is discussed in detail in a master thesis titled”Development and use of strategies for semantic multimedia management”.

As defined in [BBJ06] multimedia is in general ”the use of different technical media, pre- sentation forms and sensory channels for communication”. In particular these presentation forms may be text, pictures, audio, video and animation and form the input of the frame- work. The framework itself is light-weighted, written in PHP, easily extensible and uses several libraries, e.g. Erfurt1as well as Zend2for backend integration and a set of Linux command line applications for specialized multimedia operations like image resizing or video editing. Additionally, MM2RDF is highly extensible in terms of the vocabulary used to generate the semantic metadata. As the Semantic Web already provides a number of widely used vocabularies, which are suitable for multimedia annotation, there is no need to create and publish another one. A complete discussion of all concepts and properties used by this framework might be out of the scope of this paper, nevertheless a listing of used vocabularies should be discussed for the sake of completeness. The NEPOMUK File Ontology3 and Contact Ontology4 as well as the Dublin Core Metadata Element Set5 is used on a very basic not yet multimedia specific level to type files as file resources and save basic properties like the file’s name. For musical information the Music Ontology6is

1http://www.aksw.org/Projects/Erfurt

2http://framework.zend.com/

3http://www.semanticdesktop.org/ontologies/nfo/

4http://www.semanticdesktop.org/ontologies/nco/

5http://dublincore.org/documents/dces/

6http://musicontology.com/

987

(2)

used. Pictures get annotated using the EXIF RDF Schema7. The evaluation of video re- lated vocabularies has not yet been completed. Moreover, Adobe’s PDF vocabulary XMP8 is used to describe PDF documents and the FOAF vocabulary is used to describe authors and producers.

Functional principle MM2RDF basically has two different usage scenarios. On the one hand, a single file is the framework’s input. This file might be local or on a remote server.

If the latter is the case, the size of the file gets determined and if it’s below a threshold, the file gets imported to the local filesystem. Afterwards the metadata generation process gets started by initializing the backend (Virtuoso and ZendDB are supported currently) and generating common filesystem information like filename, super-folder, file-rights, file- hash etcetera. Subsequently the file’s MIME type is determined and checked for existence with a RESTful request athttp://mediatypes.appspot.com/. The so derived URIs are used to instantiate a MIME type specific metadata generator, which for example reads the EXIF information stored in the JPEG header and generates RDF Metadata in the form shown in the RDF PHP Specification9. On the other hand, MM2RDF can be used in a directory context to generate a semantic model of a local filesystem with the underlying folder hierarchy. This idea has been shown before in [SH09] and [SP10], but those approaches are built upon Java and do not mainly focus on multimedia metadata.

At the moment there exist twelve different MIME type extensions for the most com- monly used file types (e.g. application/pdf, video/x-msvideo, audio/mp4 or image/jpeg) in MM2RDF. This extension mechanism is very straightforward and follows only some simple naming conventions. In addition, it has been evaluated that those extensions are working within an acceptable timeframe from about 1,5s for processing images to up to 6s for processing videos. An extensive description of this mechanism and of the evaluation can be found at [Ger10].

References

[BBJ06] G¨unter Bentele, Hans-Bernd Brosius, and Otfried Jarren. Lexikon Kommunikations- und Medienwissenschaften, 2006.

[Ger10] Daniel Gerber. Development and use of strategies for semantic multimedia management, 2010.

[SH09] Bernhard Schandl and Bernhard Haslhofer. The Sile Model — A Semantic File System Infrastructure for the Desktop, 2009.

[SP10] Bernhard Schandl and Niko Popitsch. Lifting File Systems into the Linked Data Cloud with TripFS, 2010.

7http://www.w3.org/2003/12/exif/

8http://www.adobe.com/products/xmp/

9http://n2.talis.com/wiki/RDF_PHP_Specification

988

Referenzen

ÄHNLICHE DOKUMENTE

The BOEMIE Ontology repository comprehends a set of OWL files containing assertional information, also called ABoxes, where each ABox contains the rich semantic metadata of a

[r]

Our experiments over a large ontology repository reveal that our techniques can lead to substantial improvements in classification times for both standard and modular

The name description logics is motivated by the fact that, on the one hand, the important notions of the do- main are described by concept descriptions, i.e., expressions that are

[1] describes an NLP-based approach that supports the ex- tention of an already existing biomedical terminology on dis- order and procedures. The aim of this approach is to find

How can we discover patterns based on process- relational ontology that translate the commons into new types of institutions, governance, and policy practice.. The

Traditionally, the founding of a collection was initiated after the founding of a new academic discipline or department: collections may be, for example, tools and equipment at

The new, high magnetic field instruments being built at RAL and PSI, whose design is entirely based on realistic Monte Carlo simulations, represent another example