• Keine Ergebnisse gefunden

Experiment 6: XTREEM-SL in Comparison to Google Sets . 179

10.4 Experiments

10.4.6 Experiment 6: XTREEM-SL in Comparison to Google Sets . 179

In this experiment we contrast the results obtained by Google Sets8with the results obtained by term retrieval upon XTREEM-SL. The exemplary results from Google Sets have been obtained at two different points of time, first in October 2006 and later in May 2008. As of October 2009, there is no difference to the May 2008 results. The tables 10.3, 10.4 and 10.5 show the results for both facilities.

The query terms of XTREMM-SL have been combined by “AND” conjunction. As table 10.3 shows, for the results for{hotel}and{hotel, hostel}it is rather hard to judge who performs better. In case of {hotel, hostel, motel}, Google Sets provides only a few result terms, whereas XTREEM-SL returns a comprehensive list, where even on lower ranks one can find some relevant sibling terms. The first case {ontologies, taxonomies} depicted in table 10.4 reveals better results for XTREEM-SL. For example, the terms thesauri and controlled_vocabularies are not retrieved by Google Sets though they are plausible siblings. For the next two cases helgoland and sylt, two islands in the North Sea, the results obtained by Google Sets are even worse. In contrast, XTREEM-SL returns many good sibling candidates. Here we have to mention again that the Web crawl was restricted to English documents; the good results of XTREEM-SL come from English Web documents. From this observation we conclude that for non frequent terms (technical terms, proper names, . . . ), Google Sets performs worse.

8http://labs.google.com/sets

Table 10.3: Exemplary results from Google Sets and XTREEM-SL (AND conjunction)

Google Sets 10/2006Google Sets 5/2008XTREEM-SLGoogle Sets 10/2006Google Sets 5/2008XTREEM-SLGoogle Sets 10/2006Google Sets 5/2008XTREEM-SL Hotelhotel hostel Hostel hotel hostel Hostelhotel hostel Retailhotel accommodation bed_breakfast Hotel hostel bed_breakfast Motelmotel motel Officetravel hotel hotels Bed Breakfast pension hotels Hotelhostel bed_breakfast Residentialinn/lodge 1 prices_from Motel camping prices_from Apartmentpension apartment Restaurantresort 1 guesthouse Apartment motel guesthouse Homestaycamping apartments Travelraddison hotel apartment b b jugendherberge apartment Backpackerhostelry guest_house Industrialconference... 2 home_page Camping hostelry home_page resort andjugendherberge hotels Tourtravelocity hotels apartment_rental condominium youth hostel apartment_rental Bed Breakfastyouth hostel other AIRhotels.com official site country_accommodationResorts kurhotel country_accommodationserviced apartmentkurhotel prices_from CARconference... 3 guest_house Camp Grounds bauernhof guest_house travelbauernhof guesthouse Transportationpriceline hotel discounts backpacker_insuranceTrailer parks appartement ferienwohnung backpacker_insuranceappartement ferienwohnung bed_and_breakfast Researchopryland hotel camping Cottage rentals almhütte berghütte camping almhütte berghütte us Car Rentalhoteles us Homestay privatvermieter us privatvermieter apartment_rental Manufacturinghotels about_average Inns ferienhaus bungalow about_average ferienhaus bungalow home_page Otherradison hotel less_expensive_than_averageBackpacker reisebüro tourismusverband less_expensive_than_averagereisebüro tourismusverband resort Medical Laboratorieschoice hotels least_expensive Restaurant sportanbieter least_expensive sportanbieter camping Investmentluxury hotel apartments inn apartment house apartments private accomodation country_accommodation Commercialmotel 1 less_than_us guesthouses private accomodation less_than_us apartment house about_average Vacationall suite 1 rooms serviced apartment holiday house rooms holiday house backpacker_insurance Parkingcheap hotel other uk holidays last minute other village tourism less_expensive_than_average Multi familymotel 2 home VILLA village tourism home youth camp inn Showershotel 4 prague FARMHOUSE gasthof prague gasthof least_expensive Pro Shoprestaurant more_expensive_than_averagehotels youth camp more_expensive_than_averageforest school prague TOURSovernachten hotel_prague caravan sites forest school hotel_prague restaurant all Accessorieshotel chains hotels_and_accommodationsHoliday Home restaurant hotels_and_accommodationsferienwohnung pension Railextended stay 1 double resort and ferienwohnung double camping, campingplatz restaurant Multifamilyhotel 5 inn campsites appartement inn appartement more_expensive_than_average Luggagehotel 8 farmhouse_b_b travel camping, campingplatz farmhouse_b_b last minute less_than_us Community Developmenthotel 2 family Selfcatering Accommodation holidays family zimmer frei category CRUISEShotel 7 suites CASOLARE zimmer frei suites apartmenthaus home Apartmentshotel search pub_inn Campground apartmenthaus pub_inn spa hotel hotels_and_accommodations Educationalall suite 2 bed_and_breakfast dwelling spa hotel bed_and_breakfast farm location Warehousehotel 6 twin student lodging farm twin bungalowpark suites Servicehotel 10 hostels_in_prague quartering bungalowpark hostels_in_prague travel agency hotel_prague Institutionalapt/condo 1 hotels_in_prague block travel agency hotels_in_prague hunting lodge super_motel Cruisehotel 3 eurotel quarters dorftourismus eurotel dorftourismus double Retail Fixturestravelzoo® hotel deals pension billet hunting lodge pension sports supplier twin Industrial propertieslondon hotel cheap_prague_hotels_pensionshome holiday apartment cheap_prague_hotels_pensionsprivate rental house hospitalhotel 1 hotels_pensions camp mountainhut shelter hotels_pensions mountainhut shelter farmhouse_b_b Laboratorycabin 1 individual_city_toursdomicile private rental individual_city_toursholiday apartment family Automotivehollandhotels prague_apartment_rentalsrattrap sports supplier prague_apartment_rentalsschutzhaus, jugendherberge pub_inn Placement Firmsother 1 oskar_vodafone abode schutzhaus, jugendherberge oskar_vodafone bungalow stars Lockersfietsen reservations youth hostel bungalow reservations jugendlager hostels_in_prague Publishingself_catering hospice jugendlager self_catering resort swimming_pool Self storagetown flat resort town waldschule b_b Financial Servicesto_entries_listed_belowhabitation waldschule to_entries_listed_belowland location location industrial_buildings you_can you_can contact_us please_see please_see hotels_in_prague

{hotel}{hotel, hostel}{hotel, hostel, motel}

10.4 Experiments

Table 10.4: Exemplary results from Google Sets and XTREEM-SL (AND conjunction)

Google Sets 10/2006Google Sets 5/2008XTREEM-SLGoogle Sets 10/2006Google Sets 5/2008XTREEM-SLGoogle Sets 10/2006Google Sets 5/2008XTREEM-SL Ontologiestaxonomies taxonomies Helgoland helgoland horumersiel Sylt sylt amrum Taxonomiesontologies thesauri Oman malta wangerland_minsen Buhneberlin f_hr Systemshtml metadata Tanganyika gibraltar tossens hamburg north_frisian_islands Research Groupsxml content_development Tanzania cyprus st_peter_ording bayern east_frisian_islands Peoplepdf metamodels Historical Flags deutschland aurich hessen kachelotplate Search Enginesrtf terminology_extractionNordfriesland portugal sylt brandenburg borkum Class interfacestext concept_systems Holsteinische Schweiz halstenbek norden bremen pellworm Problem solving methodsxhtml methodology_standardizationProbstei rellingen wilhelmshaven niedersachsen terschelling Data Model Mappingexcel controlled_vocabulariesFehmarn greece bad_zwischenahn baden württemberg nordstrand_germany Ontological Modellingsgml other_subject_based_techniquesTafellied egypt wittmund sachsen l_beck agentscsv faceted_classificationAquarius lebanon westerstede schleswig holstein neuwerk KIFpostscript what_is_metadata Westfalenliechtenstein leer rheinland pfalz schiermonnikoog Domain knowledge staticword metadata_as_a_finding_aidspain juist saarland texel Reasoning knowledge dynamiccss subjects_and_precisionandorra nieuweschans thüringen vlieland Question answer corporasvg the_names_of_subjectsluxembourg stadskanaal sachsen anhalt wieringen Electronic dictionariesascii occurrences italy info_motylek_com nordrhein westfalen west_frisian_islands Knowledge Managementtxt types switzerland spiekeroog mecklenburg vorpommernameland RDFxslt associations guernsey nordenham kiel Semiotic Modellinglatex benefits_and_costs pinneberg cuxhaven rottumeroog Introductionflash identity_and_mergingjersey wangerooge rottumerplaat FIPA agent standardsfolksonomy searching ireland husum hiddensee Semantic Webjavascript schemas germany bremerhaven fehmarn Knowledge Representationdoc owl turkey sylt_keitum helgoland Argumentationwml formal_is_a_relationshipsiceland bus_charter flensburg Hypothesesweb value_restrictions tornesch vila_dos_remedios heligoland Variablesdom pdf bahamas h_lil_nj bad_schwartau The Research Questionxls disjointness_inverse_part_ofdenmark schleswig lubeck Validity and Reliabilitysylk xml_schemas france kashi lindau Levels of measurement dyomedea xml belize stanley ruden rdf informal_is_a_relationshipsisrael jalalabad hamburg web2.0 formal_instances monaco heligoland germany ps microsoft_excel croatia roter_sand oland sql frames_properties appen port_blair vilm http word_processing_documentsromania friedrichstadt r_gen dtd relational_databasespanama braunschweig nordstrand tagging general_logical_constraintsoldenburg brunsbuettel_st_michaelisdonnpoel software html sweden linie home pcl classification_schemesaustralia ushuaia greifswalder_oie tiff faceted_metadata san_andres usedom schema robinson_crusoe reeperbahn neuralnetworks bremen_farge schleswig informationretrievaldamp mallorca som hamburg eckernf_rde ma langeoog neumuenster topic_maps germany rimini en settlement nebel_amrum learning hahnenklee_bockswieseuetersen hierarchies lun toscana_punta_ala nlp airport_express saxony

{sylt}{ontologies, taxonomies}{helgoland}

Table 10.5: Exemplary results from Google Sets and XTREEM-SL (AND conjunction)

Google Sets 10/2006Google Sets 5/2008XTREEM-SLGoogle Sets 10/2006Google Sets 5/2008XTREEM-SLGoogle Sets 10/2006Google Sets 5/2008XTREEM-SL Sylt sylt amrum shark diving shark diving whale_watching Whale Watching whale watching fraser_island Buhneberlin f_hr Rafting scuba diving cape_town Navatek Cruises sight seeing hervey_bay hamburg north_frisian_islandsRock Climbing skydiving scuba_diving USS Arizona Memorial hiking great_barrier_reef bayern east_frisian_islandsSea Kayaking paragliding quad_biking Sea Life Park bird watching monterey_movie_tours hessen kachelotplate Bungee Jumping bungee jumping kloofing Hawaii Vacation Rentals beachcombing worlds_best_k brandenburg borkum Mountaineering surfing surfing Waimea Fallsjet skiing light_tackle_fishing bremen pellworm Abseil Africa hang gliding hot_air_ballooning deepsea fishing fantasy_trail_rides niedersachsen terschelling wreck diving ballooning bungee_jumping snorkeling accommodation baden württemberg nordstrand_germany Sailing mountaineering water_skiing outlet shopping zodiac_charters sachsen l_beck Caving canyoning township_tours pier fishing horseback_riding schleswig holstein neuwerk Horse Riding parachuting sandboarding live theater skiing rheinland pfalz schiermonnikoog Cyclingscuba skydiving surf fishing sailing_cruises saarland texel diving microlighting freshwater fishing zoos thüringen vlieland gliding helicopter_tours sound/bay fishing victoria_falls sachsen anhalt wieringen zorbing table_mountain fly fishing bird_watching nordrhein westfalen west_frisian_islandsaerobatics kayaking_canoeing paddle boating cruise_guide mecklenburg vorpommernameland flying disc abseiling_rock_climbingwater tubing legoland_california kiel aeromodelling robben_island mountain climbing universal_studios rottumeroog casting horse_riding gambling casinos fishing rottumerplaat orienteering exclusive_shopping rafting cape_big_six_experience hiddensee sport fishing wine_tasting ice skating package_tours fehmarn sailing cape_peninsula sledding great_white_shark helgoland caving the_franschhoek whitewater rafting more flensburg rock climbing the_long_beach luaus travel_tips heligoland diving vacation the_bishops_court spelunking ensenada_fishing bad_schwartau whale watching the_constantia sea_kayaking lubeck windsurfing cape_winelands wd_hire lindau mountain biking cape_town_beaches home ruden kayaking golf_courses host_u hamburg padi helicopter_trips fishing_charters germany surf home harbor_dinner oland water skiing water_activities windjammers vilm fell running topless_bus_tour puffin_tours r_gen fishing sunset_cruise map_of_south_africa nordstrand snowboarding mt_kilimanjaro_tour golf_courses poel water sport home_page extend_this_holiday home sailing vacation fishing russian_tours greifswalder_oie sailing school experience_sa_networksea_kayaking_mountain_biking usedom scuba lesson the_vintage_hotel diving reeperbahn surf shop contact_us lunenburg_fisheries_museum schleswig scuba gear why_we_are_differentbluenose_golf_course mallorca sailing lesson the_vintage whitsundays eckernf_rde ocean sailing diving asa_sailing_instruction neumuenster climbing photo_gallery beaches rimini horse riding snorkeling whitewater_rafting nebel_amrum hikingfree_brochure kayaking uetersen rates_packages romantic_packages toscana_punta_ala private_pilots dolphins saxony fishing_map lunenburg_academy

{sylt}{shark diving}{whale watching}

10.5 Conclusion

Conclusion: The term sets retrieved by our approach can be regarded as having a stronger semantic coherence, with regard to being semantic siblings, than those obtained by Google Sets. Our approach works also well for rather infrequent domain specific terms where Google Sets performs weaker. This is an important observation, since engineering domain ontologies operate on rather infrequent terms. Therefore, XTREEM-SL can be regarded as being better suited for this purpose; using semantically founded term retrieval for ontology learning is enabled by XTREEM-SL, doing so with Google Sets seems not feasible.

10.5 Conclusion

In this chapter we described an approach for obtaining sibling terms within an open vocabulary. We showed an evaluation according to reference ontologies, and exemplary evaluations. While the measured quality according to the rediscovering ranks measures yielded not good results in general, the manual inspection revealed that the result contains a considerable number of plausible sibling terms which are not present in the gold standard ontologies.

We have performed experiments on a data set of millions of documents. For indexes covering bigger parts of the Web our method can be expected to scale well.

The process can be made parallel easily.