• Keine Ergebnisse gefunden

Kataloganreicherung mit Sacherschließungselementen

N/A
N/A
Protected

Academic year: 2022

Aktie "Kataloganreicherung mit Sacherschließungselementen "

Copied!
23
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

DIE ÖSTERREICHISCHE BIBLIOTHEKENVERBUND UND SERVICE GMBH

KATALOGANREICHERUNG MIT SACHERSCHLIESSUNGSELEMENTEN

REALISIERUNG DER DATENÜBERNAHME AUS DEM GBV-KATALOG

VICTOR BABITCHEV

VERBUNDTAG, SALZBURG 9. MAI 2012

(2)

Contents

Background

Requirements and prototype

Prototype results and further requirements Towards production solution

The productive results

Verification of results by experts Current status of ACC01 update Summary

Next steps

(3)

Kataloganreicherung mit Sacherschließungselementen

Background

• OBVSG accumulated a good experience with projects of catalogue enrichment with electronic objects (CE-EO)

• 2010 – the request from the Verbundvollversammlung to implement CE with Sacherschließungselementen (CE-SE)

 to improve searches, facets and presentation of catalogue data

• OBVSG began evaluation works in September 2011

(4)

Kataloganreicherung mit Sacherschließungselementen

CE-SE requirements

• Only ACC01 records not having the enriching category will be enriched

Exception: 700f with SFz "Automatisch generiert aus Konkordanz...."

• Each added SE-category becomes SFz with information about:

• source of data

• source export date

• date of adding to ACC01

Sample. $$zAutomatisch aus GBV_2011-10 2012-04-22

• Prioritizing of data and its sources

• GBV – Basisklassifikation (700f) and LoC SH (740#)

• BVB – RVK (700g) and LoC SH

• DNB – DDC (700b), RSWK 902ff (except ACC02)

• hbz, SWB, Hebis ...

(5)

Kataloganreicherung mit Sacherschließungselementen

Development of prototype

• OBVSG decided to use the match algorithm of CE-EO (ISBN-based,

„strong“ match rules)

 with the goal to have the first results already in October 2011

• The necessary CE-specific components should be developed outside of stable CE-EO framework

• Sample data available from German consortia (GBV, BVB and DND) should be used for works on the prototype

(6)

Kataloganreicherung mit Sacherschließungselementen

Title-Matching

CE-EO / CE-SE engines

GBV MAB2

BK (ACC19)

BVB, hbz ...

ACC01 (Aleph aseq, ISBN titles)

new CE-SE

components Regensburg) RVK (von

ACC01 Update file - GBV

BVB, hbz ...

Configurations

ACC01 Match results

Transformations

GBV (Aleph aseq, ISBN titles)

update &

replicate

Processing

work flow

(7)

Kataloganreicherung mit Sacherschließungselementen

The results of prototype and further requirements

• The results obtained in October were promising and presented at:

• Bibliothekartag in Innsbruck, 19.10.2011 / W. Hamedinger

• Treffen 'Kataloganreicherung im Verbund', UB TUW, 22.11.2011

• Treffen der Systembibliothekarinnen, Uni Wien, 16.11.2011

• SE-experts from ÖNB and OBVSG prepared further requirements

 match algorithm should be adapted to SE specifics

 productive enrichment should be done

• In October 2011 OBVSG received the first complete catalogue of 35 Mio.

records from GBV

(8)

Kataloganreicherung mit Sacherschließungselementen

Towards a production solution

(9)

Kataloganreicherung mit Sacherschließungselementen

Extended requirements for matching algorithm

• the new algorithm should exclude some fields from CE-EO that “block” the enrichment for a “family of SE-equivalent titles”

•  from the strong “1:1” match rules using:

ISBN, JAHR and

TYPE, AUFLAGE, UMFANG, BAND, VERLAG, TITLE, EXIND

one ACC01 title matches one foreign title

•  to a weak „N:M“ match using a reduced set of matching fields:

ISBN and TYPE,TITEL,BAND

so that „N“ ACC01 titles can match „M“ foreign titles

this will open the way to match titles having different years, editions, pagination etc.

(10)

Kataloganreicherung mit Sacherschließungselementen

Implementation of new CE-SE framework

• A considerable part of new requirements was implemented using a configuration potential of the existing CE-EO framework

• Further requirements implemented as separate program components

• Considerable efforts required to prepare GBV data according to the standards of Aleph systems

• (UTF-8 pre-composite, data filtering, Aleph MAB2 fixes…)

• Intensive works began in January, February and March 2012 with a motivation to re-use the new tool with data from other MAB2 sources

(11)

Kataloganreicherung mit Sacherschließungselementen

... and what have we achieved?

(12)

Kataloganreicherung mit Sacherschließungselementen

Matching results – summary

The GBV enrichment categories used:

• BK (700f)

• LoC SH (740#)

Category

In ACC01 before enrichment

Added by enrichment

Total ACC01 records after update BK 0.225 Mio. 1.03 Mio. 1.26 Mio. or 14% of

ACC01 LoC SH 0.184 Mio. 0.586 Mio. 0.77 Mio. or 8.8% of ACC01

(13)

Kataloganreicherung mit Sacherschließungselementen

Matching results – details

Title Matches

1.26 Mio. (93.3%) from 1.35 Mio. matches will be enriched

* - 32.70% achieved using the CE-EO algorithm of the prototype, Oct. 2011 GBV

all records

(A)

GBV ISBN + 700f or 740#

records (match base)

(B)

ACC01 all records

(C)

ACC01 ISBN records (match base)

(D)

Bibl.- matches

(E)

Match %

(E/B)

~35 Mio. 3.45 Mio. 8.8 Mio 2.8 Mio. 1.35 Mio. 48.08%

prot. 32.70%*

with BK 1.03 Mio. 81.7%

with LoC SH 0.586 Mio. 46.5%

(14)

Kataloganreicherung mit Sacherschließungselementen

How many multiple matches the new algorithm delivered?

… this share is about 16.17% or 0.22 Mio. of total matches!

Match factor ACC01 titles Percent

1 1130843 83.83%

2 130140 9.65%

3 38772 2.87%

4 18420 1.37%

5 10280 0.76%

6 6174 0.46%

7 4095 0.30%

8 2880 0.21%

9 1935 0.14%

10 1500 0.11%

11 1078 0.08%

. . . . . . . . .

3 27 0.01%

(15)

Kataloganreicherung mit Sacherschließungselementen

... and are the results correct?

(16)

Kataloganreicherung mit Sacherschließungselementen

Tests: results verification by ÖNB and OBVSG experts

The challenges

• Large number of data (over 1.2 Mio.)

• Complex data relations between ACC01 and (enriching) GBV records (“M:N”)

• Implement convenient and possibly simple test tools for experts

a representative data set (~10%) from the entire data should be produced

The results

The new data model (a multi-dimensional hash) contained all necessary data … and this helped a lot in implementing the test tools

On 2012-04-05 „OK“ to start ACC01 update came from the experts!

(17)

Kataloganreicherung mit Sacherschließungselementen Test data sample

“Kübler-Ross, Elisabeth: Über den Tod und das Leben danach …“ in various editions and publication years.

• CE-SE-data: 700f (3 occur.) - 17.97 ,18.10 and 31.01 (plus verbal descriptions)

• This data came from 8 GBV records that enriched 9 ACC01 records!

Special program tools were developed to display records in 3 windows to control exactly each source and Match # GBV ID / Publ. Year AC-Number / Publ. Year

1 015770958 / 1986 (the primary match key) AC00106061 / 1991

2 181342952 / 1994 AC00309295 / 1989

3 238014142 / 1996 AC00529076/ 1992

4 251879100 / 1998 AC00862564 / 1994

5 363855645 / 1986 AC01632748 / 1990

6 468646817 / 2004 AC03407351 / 1985

7 481474749 / 1989 AC04255535 / 2004

8 483176036 / 2004 AC04632370 / 1988

9 na AC08586966 / 1987

(18)

Kataloganreicherung mit Sacherschließungselementen

... and where are the results?

(19)

Kataloganreicherung mit Sacherschließungselementen

Sample of enriched record in ACC01

FMT L MH

LDR L 00628nM2.01000024---h 001 L $$aAC00331282

...

100b L $$aBrandstätter, Christian$$9115858806$$b[Hrsg.]

331 L $$a<<Das>> ist Österreich

335 L $$aein ganzes Land in Bildern ; Landschaft, Kultur, Geschichte 410 L $$aWien [u.a.]

412 L $$aBrandstätter 425a L $$a1985

...

540a L $$a3-85447-156-4 ...

700f L $$a15.60$$bSchweiz, Österreich-Ungarn, Österreich <Geschichte>$$zAutomatisch aus GBV_2011-10 2012-04-18

700f L $$a74.19$$bEuropa <Geographie>$$zAutomatisch aus GBV_2011-10 2012-04-18 700g L $$aRK 60013

700g L $$aNR 8249

740u L $$aAustria$$zAutomatisch aus GBV_2011-10 2012-04-18

740u L $$aPictorial$$aworks$$zAutomatisch aus GBV_2011-10 2012-04-18

(20)

Kataloganreicherung mit Sacherschließungselementen

The results in ACC01

• The first results are already in our catalogues and Primo – to see the rest, some more weeks may be needed!

• The number of ACC01 updates is over 1.26 Mio.

 per evening OBVSG can process about 20.000 records (other data updates and projects must be considered)

• As of 01.05.2012 we updated and replicated over 19% or 270.000 ACC01 records including:

• 183.479 enriched with BK

• 151.853 enriched with LoC SH

(21)

Kataloganreicherung mit Sacherschließungselementen

Summary

• The developed procedures showed their efficiency and delivered expected large amounts of matches

• The savings from bringing such large data into catalogues (and search engines) are self-explained – perhaps years of work would be necessary to produce such results intellectually

• In the light of forthcoming switching to MARC 21 format of German consortia it is important to apply the developed MAB2-based matching tools for data of other consortia before this change happens

 OBVSG should organize the receipt of data from other consortia and institutions (BVB, DNB etc.)

(22)

Kataloganreicherung mit Sacherschließungselementen

Next steps

• Complete update of ACC01 (May-June)

• Evaluate expected improvements in searches (Primo, Aleph OPAC)

• Consider further enrichment projects

 for instance, the number of ACC01 titles enriched with RVK from the BVB catalog (20 Mio. titles) may reach 1 Mio. records

(23)

Kataloganreicherung mit Sacherschließungselementen

Danke für Ihre Aufmerksamkeit!

victor.babitchev@obvsg.at

www.obvsg.at

Referenzen

ÄHNLICHE DOKUMENTE

As the priority indices for the various research realms and the scores for the research themes show, themes relating to ecology and environment get the highest score as high and

[r]

and is formulated as: Determine the capabilities of IM departments in German hospitals with respect to (D1) the CIO’s position in the hospital management hierarchy, (D2)

These options would benefit from reliable, long-term information about climate change and its effects on land and water resources, and consequently on agricultural production and

The following viruses were assessed by DAS-ELISA: Grapevine leafroll-associated virus (GLRaV) 1, -2, -3, -4, Grapevine fanleaf virus (GFLV), Arabis mosaic virus (ArMV) and

7 We conclude that the observation of a wall velocity decaying with the number of current pulses is a general feature of the vortex walls in this geometry, but that the exact number

Where agricultural land is relocated because of biomass production for biofuels, the new areas of agriculture are not covered by the biodiversity criteria provided in the

It had an effect in curbing elements of the al-Qaida movement, including the killing or capture of significant elements of the dispersed leadership, although the methods