DIE ÖSTERREICHISCHE BIBLIOTHEKENVERBUND UND SERVICE GMBH
KATALOGANREICHERUNG MIT SACHERSCHLIESSUNGSELEMENTEN
REALISIERUNG DER DATENÜBERNAHME AUS DEM GBV-KATALOG
VICTOR BABITCHEV
VERBUNDTAG, SALZBURG 9. MAI 2012
Contents
Background
Requirements and prototype
Prototype results and further requirements Towards production solution
The productive results
Verification of results by experts Current status of ACC01 update Summary
Next steps
Kataloganreicherung mit Sacherschließungselementen
Background
• OBVSG accumulated a good experience with projects of catalogue enrichment with electronic objects (CE-EO)
• 2010 – the request from the Verbundvollversammlung to implement CE with Sacherschließungselementen (CE-SE)
to improve searches, facets and presentation of catalogue data
• OBVSG began evaluation works in September 2011
Kataloganreicherung mit Sacherschließungselementen
CE-SE requirements
• Only ACC01 records not having the enriching category will be enriched
Exception: 700f with SFz "Automatisch generiert aus Konkordanz...."
• Each added SE-category becomes SFz with information about:
• source of data
• source export date
• date of adding to ACC01
Sample. $$zAutomatisch aus GBV_2011-10 2012-04-22
• Prioritizing of data and its sources
• GBV – Basisklassifikation (700f) and LoC SH (740#)
• BVB – RVK (700g) and LoC SH
• DNB – DDC (700b), RSWK 902ff (except ACC02)
• hbz, SWB, Hebis ...
Kataloganreicherung mit Sacherschließungselementen
Development of prototype
• OBVSG decided to use the match algorithm of CE-EO (ISBN-based,
„strong“ match rules)
with the goal to have the first results already in October 2011
• The necessary CE-specific components should be developed outside of stable CE-EO framework
• Sample data available from German consortia (GBV, BVB and DND) should be used for works on the prototype
Kataloganreicherung mit Sacherschließungselementen
Title-Matching
CE-EO / CE-SE engines
GBV MAB2
BK (ACC19)
BVB, hbz ...
ACC01 (Aleph aseq, ISBN titles)
new CE-SE
components Regensburg) RVK (von
ACC01 Update file - GBV
BVB, hbz ...
Configurations
ACC01 Match results
Transformations
GBV (Aleph aseq, ISBN titles)
update &
replicate
Processing
work flow
Kataloganreicherung mit Sacherschließungselementen
The results of prototype and further requirements
• The results obtained in October were promising and presented at:
• Bibliothekartag in Innsbruck, 19.10.2011 / W. Hamedinger
• Treffen 'Kataloganreicherung im Verbund', UB TUW, 22.11.2011
• Treffen der Systembibliothekarinnen, Uni Wien, 16.11.2011
• SE-experts from ÖNB and OBVSG prepared further requirements
match algorithm should be adapted to SE specifics
productive enrichment should be done
• In October 2011 OBVSG received the first complete catalogue of 35 Mio.
records from GBV
Kataloganreicherung mit Sacherschließungselementen
Towards a production solution
Kataloganreicherung mit Sacherschließungselementen
Extended requirements for matching algorithm
• the new algorithm should exclude some fields from CE-EO that “block” the enrichment for a “family of SE-equivalent titles”
• from the strong “1:1” match rules using:
ISBN, JAHR and
TYPE, AUFLAGE, UMFANG, BAND, VERLAG, TITLE, EXIND
one ACC01 title matches one foreign title
• to a weak „N:M“ match using a reduced set of matching fields:
ISBN and TYPE,TITEL,BAND
so that „N“ ACC01 titles can match „M“ foreign titles
this will open the way to match titles having different years, editions, pagination etc.
Kataloganreicherung mit Sacherschließungselementen
Implementation of new CE-SE framework
• A considerable part of new requirements was implemented using a configuration potential of the existing CE-EO framework
• Further requirements implemented as separate program components
• Considerable efforts required to prepare GBV data according to the standards of Aleph systems
• (UTF-8 pre-composite, data filtering, Aleph MAB2 fixes…)
• Intensive works began in January, February and March 2012 with a motivation to re-use the new tool with data from other MAB2 sources
Kataloganreicherung mit Sacherschließungselementen
... and what have we achieved?
Kataloganreicherung mit Sacherschließungselementen
Matching results – summary
The GBV enrichment categories used:
• BK (700f)
• LoC SH (740#)
Category
In ACC01 before enrichment
Added by enrichment
Total ACC01 records after update BK 0.225 Mio. 1.03 Mio. 1.26 Mio. or 14% of
ACC01 LoC SH 0.184 Mio. 0.586 Mio. 0.77 Mio. or 8.8% of ACC01
Kataloganreicherung mit Sacherschließungselementen
Matching results – details
Title Matches
1.26 Mio. (93.3%) from 1.35 Mio. matches will be enriched
* - 32.70% achieved using the CE-EO algorithm of the prototype, Oct. 2011 GBV
all records
(A)
GBV ISBN + 700f or 740#
records (match base)
(B)
ACC01 all records
(C)
ACC01 ISBN records (match base)
(D)
Bibl.- matches
(E)
Match %
(E/B)
~35 Mio. 3.45 Mio. 8.8 Mio 2.8 Mio. 1.35 Mio. 48.08%
prot. 32.70%*
with BK 1.03 Mio. 81.7%
with LoC SH 0.586 Mio. 46.5%
Kataloganreicherung mit Sacherschließungselementen
How many multiple matches the new algorithm delivered?
… this share is about 16.17% or 0.22 Mio. of total matches!
Match factor ACC01 titles Percent
1 1130843 83.83%
2 130140 9.65%
3 38772 2.87%
4 18420 1.37%
5 10280 0.76%
6 6174 0.46%
7 4095 0.30%
8 2880 0.21%
9 1935 0.14%
10 1500 0.11%
11 1078 0.08%
. . . . . . . . .
3 27 0.01%
Kataloganreicherung mit Sacherschließungselementen
... and are the results correct?
Kataloganreicherung mit Sacherschließungselementen
Tests: results verification by ÖNB and OBVSG experts
The challenges
• Large number of data (over 1.2 Mio.)
• Complex data relations between ACC01 and (enriching) GBV records (“M:N”)
• Implement convenient and possibly simple test tools for experts
• a representative data set (~10%) from the entire data should be produced
The results
The new data model (a multi-dimensional hash) contained all necessary data … and this helped a lot in implementing the test tools
On 2012-04-05 „OK“ to start ACC01 update came from the experts!
Kataloganreicherung mit Sacherschließungselementen Test data sample
“Kübler-Ross, Elisabeth: Über den Tod und das Leben danach …“ in various editions and publication years.
• CE-SE-data: 700f (3 occur.) - 17.97 ,18.10 and 31.01 (plus verbal descriptions)
• This data came from 8 GBV records that enriched 9 ACC01 records!
Special program tools were developed to display records in 3 windows to control exactly each source and Match # GBV ID / Publ. Year AC-Number / Publ. Year
1 015770958 / 1986 (the primary match key) AC00106061 / 1991
2 181342952 / 1994 AC00309295 / 1989
3 238014142 / 1996 AC00529076/ 1992
4 251879100 / 1998 AC00862564 / 1994
5 363855645 / 1986 AC01632748 / 1990
6 468646817 / 2004 AC03407351 / 1985
7 481474749 / 1989 AC04255535 / 2004
8 483176036 / 2004 AC04632370 / 1988
9 na AC08586966 / 1987
Kataloganreicherung mit Sacherschließungselementen
... and where are the results?
Kataloganreicherung mit Sacherschließungselementen
Sample of enriched record in ACC01
FMT L MH
LDR L 00628nM2.01000024---h 001 L $$aAC00331282
...
100b L $$aBrandstätter, Christian$$9115858806$$b[Hrsg.]
331 L $$a<<Das>> ist Österreich
335 L $$aein ganzes Land in Bildern ; Landschaft, Kultur, Geschichte 410 L $$aWien [u.a.]
412 L $$aBrandstätter 425a L $$a1985
...
540a L $$a3-85447-156-4 ...
700f L $$a15.60$$bSchweiz, Österreich-Ungarn, Österreich <Geschichte>$$zAutomatisch aus GBV_2011-10 2012-04-18
700f L $$a74.19$$bEuropa <Geographie>$$zAutomatisch aus GBV_2011-10 2012-04-18 700g L $$aRK 60013
700g L $$aNR 8249
740u L $$aAustria$$zAutomatisch aus GBV_2011-10 2012-04-18
740u L $$aPictorial$$aworks$$zAutomatisch aus GBV_2011-10 2012-04-18
…
Kataloganreicherung mit Sacherschließungselementen
The results in ACC01
• The first results are already in our catalogues and Primo – to see the rest, some more weeks may be needed!
• The number of ACC01 updates is over 1.26 Mio.
per evening OBVSG can process about 20.000 records (other data updates and projects must be considered)
• As of 01.05.2012 we updated and replicated over 19% or 270.000 ACC01 records including:
• 183.479 enriched with BK
• 151.853 enriched with LoC SH
Kataloganreicherung mit Sacherschließungselementen
Summary
• The developed procedures showed their efficiency and delivered expected large amounts of matches
• The savings from bringing such large data into catalogues (and search engines) are self-explained – perhaps years of work would be necessary to produce such results intellectually
• In the light of forthcoming switching to MARC 21 format of German consortia it is important to apply the developed MAB2-based matching tools for data of other consortia before this change happens
OBVSG should organize the receipt of data from other consortia and institutions (BVB, DNB etc.)
Kataloganreicherung mit Sacherschließungselementen
Next steps
• Complete update of ACC01 (May-June)
• Evaluate expected improvements in searches (Primo, Aleph OPAC)
• Consider further enrichment projects
for instance, the number of ACC01 titles enriched with RVK from the BVB catalog (20 Mio. titles) may reach 1 Mio. records