NeOn: Lifecycle Support for Networked Ontologies




Скачать 340.61 Kb.
НазваниеNeOn: Lifecycle Support for Networked Ontologies
страница6/9
Дата24.09.2012
Размер340.61 Kb.
ТипДокументы
1   2   3   4   5   6   7   8   9
4.2 Selection

Selection is a functionality which allows a NeOn user (or community of users) to choose the most suitable ontology among those available in an ontology repository, for his/her (their) domain of interest and associated tasks. NeOn supports such a mechanism by recommending to the user(s) a set of ontologies which are suitable for such a domain. It identifies such a set by means of the specific use cases the user(s) provided. An extended definition of the selection task as well as its relation with the ontology evaluation task are discussed in [SAB06].

Available Models and Tools:

Several approaches have already been proposed for the problem of selecting (ranking) ontologies. A distinguishing feature between these approaches is the selection criteria that they rely on (i.e., the kind of evaluation that is performed). Based on this feature, we identified three categories of approaches that select ontologies according to their popularity, the richness of semantic data that is provided and topic coverage. For a more complete analyses of these approaches see [SAB06].

 Popularity based selection

Approaches from this category select the “most popular” (i.e., well established) ontologies from an ontology collection. They rely on the assumption that ontologies that are referenced (i.e., imported, extended, instantiated) by many ontologies are the most popular (a higher weight is given to ontologies that themselves are referenced by other popular ontologies). These approaches rely on metrics that take into account solely the links between different ontologies. In fact, these approaches use the same principle as current Web search engines (the importance of a Web page is proportional to the number of pages that reference it) and they often use a modified version of the PageRank algorithm. To our knowledge there are three approaches that consider ontology popularity.

OntoKhoj

[PAT03] is an ontology portal that crawls, classifies, ranks and searches ontologies. For ranking, the OntoRank algorithm is used, which is in spirit similar to the PageRank algorithm. However, instead of relying on HTML links, it considers semantic links between ontologies. Semantic links are denoted by instantiation and subsumption. In OntoKhoj, word senses are considered when ranking ontologies to cover a topic. The algorithm accommodates a manual sense disambiguation process, then, according to the sense chosen by the user, hypernyms and synonyms are selected from WordNet. The algorithm first tries to determine ontologies that contain the supplied keyword. If no matches are found, the algorithm queries for the synonyms of the term and then for its hypernyms. The algorithm was designed for a single word and it does not take into account relations.

Swoogle

[DIN05] is a search engine that crawls and indexes online semantic Web documents, offering a limited search facility that can be interpreted as topic coverage. Swoogle allows querying its large base of semantic data and provides also some metrics for ranking ontologies. They rely on a similar principle as OntoRank and use a PageRank-like algorithm on semantic relations between ontologies (i.e., using terms of an ontology to define new terms, populating ontology terms, importing ontologies). Given a search keyword Swoogle can retrieve ontologies that contain a concept (or a relation) matching the given keyword. The matches are lexical and one can select between different levels of matches (exact, when the keyword matches exactly the concept label; prefix, when the keyword appears at the beginning of the concept label; suffix, when the keyword appears at the end of the concept label; and fuzzy, when the keyword appears at any position in the concept label).

OntoSelect

[BUI04] is one of the first comprehensive ontology libraries that offers a complex ontology selection algorithm relying, among others, on selecting the most “well established” ontologies. The authors name this as the “connectedness” criteria since they look at how well an ontology is connected to other ontologies in order to determine its popularity. Unlike Swoogle and OntoKhoj, they use a less complex metric which only considers ontology imports as denoting semantic links between ontologies.

 Selection Approaches based on the Richness of Ontological Knowledge

Another way to rank ontologies is to estimate the richness of knowledge that they express. When approximating this aspect, most approaches investigate the structure of the ontology.

AktiveRank

This algorithm (proposed in [ALA05]) is the only selection algorithm that has been developed independently from an ontology library. ActiveRank combines a set of ontology structure based metrics when ranking ontologies. To determine the richness of the conceptualization offered by the ontology they use the Density Measure (DEM) metric. This measure indicates how well a given concept is defined in the ontology by summing up the number of its subclasses, superclasses, siblings, instances and relations. ActiveRank introduces two other measures that rely on the ontology structure and aim to evaluate the quality of the ontological knowledge. First, the Centrality Measure (CEM) metric relies on the observation that concepts which are in the 'middle' of the ontology are the most representative and have the right level of generality. CEM is computed by taking into account the longest path from the root through the branch that contains a concept C to its node and the path from the root to the concept C. Second, the Semantic Similarity Measure (SSM) measures how close the concepts that correspond to the query are placed in the ontology by relying on the links between these concepts. The assumption is that an ontology that contains all queried concepts close enough to be treated as a module is better than an ontology in which these concepts are spread in different parts of the hierarchy.

OntoSelect

In OntoSelect a similar metric, called Structure, is used. The value of the Structure measure is simply the number of properties relative to the number of classes in the ontology. The rationale behind this metric is that “more advanced ontologies have a large number of properties”. The OntoSelect algorithm allows to specify the information need by supplying a whole corpus. The concepts that are the most relevant for a corpus are determined by statistical processing of the corpus. Then, coverage is measured by comparing the number of concept/property labels of the ontology with the query terms extracted from the corpus. This selection algorithm relies on the evaluation approach proposed in [BRE04].

 Topic Coverage based Selection

Finally, ontologies can be ranked based on the level to which they cover a certain topic. To determine this, most approaches consider the labels of ontology concepts and compare them to a set of query terms that represent the domain.

CMM

The Class Match Measure (CMM) of AktiveRank denotes how well an ontology covers a set of query terms. It is computed as the number of concepts in each ontology whose label either exactly or partially matches the query terms. Note that the matching is purely syntactic and no attention is paid to discovering synonyms or indeed to make sure that the concept is used in the same sense as intended by the query term.

OntoSelect

The OntoSelect algorithm allows to specify the information need by supplying a whole corpus. The concepts that are the most relevant for a corpus are determined by statistical processing of the corpus. Then, coverage is measured by comparing the number of concept/property labels of the ontology with the query terms extracted from the corpus. This selection algorithm relies on the evaluation approach proposed in [5].

OntoKhoj

In OntoKhoj, word senses have been considered when ranking ontologies to cover a topic. In their algorithm they accommodate a manual sense disambiguation process, then, according to the sense chosen by the user, hypernyms and synonyms are selected from WordNet. The algorithm first tries to determine ontologies that contain the supplied keyword. If no matches are found, the algorithm queries for the synonyms of the term and then for its hypernyms. The algorithm was designed for a single word and it does not take into account relations.

Swoogle

Swoogle also offers a limited search facility that can be interpreted as topic coverage. Given a search keyword Swoogle can retrieve ontologies that contain a concept (or a relation) matching the given keyword. The matches are lexical and one can select between different levels of matches (exact, when the keyword matches exactly the concept label; prefix, when the keyword appears at the beginning of the concept label; suffix, when the keyword appears at the end of the concept label; and fuzzy, when the keyword appears at any position in the concept label).

The PowerAqua algorithm

While still under development, the ontology selection algorithm which is part of the PowerAqua question answering tool [LOP06] should be mentioned here. This algorithm aims to find the ontologies that cover a set of triples derived from a question. The minimum requirement is that any of the triples submitted as a query should be completely covered by an ontology. If elements of a triple are discovered in distinct ontologies than the triple is broken down in two more specific triples and the selection is reiterated. The output can contain more than one ontologies if different triples are covered by different ontologies. The selection itself is more semantic than existing approaches because it relies on WordNet senses, it checks for coverage of relations as well as concepts and considers the position of concepts within an ontology hierarchy to perform the selection.

4.3 Re-engineering

'Ontological Reengineering' [14] is defined as the process of retrieving and transforming a conceptual model of an existing and implemented ontology into a new, more correct and more complete conceptual model which is reimplemented. The ontological reengineering process should be carried out bearing in mind the use of the existing ontology by the system (ontology or software) that reuses it.

This process consists basically of three activities:

Reverse engineering. The aim of this activity is to output a possible conceptual model on the basis of the code in which the ontology is implemented.

Restructuring. Its objective is to correct and reorganize the knowledge contained in the initial conceptual model, and detect missing knowledge.

Forward engineering. Its aim is to output a new implementation of the ontology on the basis of the new conceptual model.

The following figure [15] shows the ontological reengineering process:



Reengineering is a functionality which allows NeOn users to transform existing Knowledge Organization Systems (KOS), such as thesauri, glossaries, database schemas, subject directories, etc., into ontologies, by means of semi-automatic methods. Reengineering KOSes into ontologies enables users to bootstrap large domain ontologies by using informal distinctions that have been grown by domain experts for their needs, such as document annotation or indexing (e.g. the FAO Agrovoc thesaurus, cf. WP7), data modelling (e.g. a database schema), agreement on the meaning of terms (e.g. a data dictionary or a glossary), subject classification (e.g. the DMOZ directory or Flickr folksonomies), etc. A NeOn user should be able to import an existing domain KOS from its original format, and should be guided in the appropriate mapping from the original KOS data model to an ontology data model supported by the NeOn platform. Moreover, the user should be helped in expliciting the original intended use of that KOS, and in extracting the most relevant parts of the KOS with reference to existing reusable ontologies (cf. T2.2.1) or ontology design patterns (cf. T2.5).


Available Models and Tools:

To date, there is no comprehensive review on KOS reengineering. The following is a partial list of existing models, approaches and tools.


OntoLift

OntoLift is a tool, developed in the EU WonderWeb project, to transform database schemas into RDFS models (see [STO02] and [VOL04]). Some papers can be downloaded from http://wonderweb. semanticweb.org/deliverables/D12.shtml.


ONIONS

Onions ([PIS98], [GAN99]) is an ontology reengineering and merging methodology that has been applied in the nineties to medical KOSes, notably the UMLS Metathesaurus. It defines a collection of methods and principles, which use heterogeneous tools ranging from databases to description logic classifiers, in order to transform KOSes into ontologies. Related work has been done in Freiburg, e.g. as described in [HAH02].


The Fishery Ontology Service project

[GAN04b] defines a methodology, modelled as UML activity diagrams, which improves on some of the ONIONS methods and principles. That methodology has also been used as a showcase for the ontology engineering methodology from the WonderWeb project [GAN04c]. The reengineering activities in FOS are parts of a more general methodology that includes the alignment of reengineered KOSes with foundational ontologies and reusable ontology design patterns.


The SKOS Core

[MIL05] provides a metamodel to encode thesauri in RDF. SKOS Core is a vocabulary for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, 'folksonomies', other types of controlled vocabulary, and also concept schemes embedded in glossaries and terminologies.


The Porting WordNet to the SW TF

The Porting WordNet to the SW TF [ASS06] of the W3C Semantic Web Best Practices and Deployment SWBPD has ported the WordNet database to the semantic web, by mapping the WordNet metamodel to RDFS and OWL. It aims at providing a standard conversion of WordNet for direct use by Semantic Web application developers. By providing a standard conversion that is as complete as possible the TF aims to improve interoperability of SW applications that use WordNet.


The eClass-OWL

[HEP06]'s approach has been recently introduced to maintain the representation of both subsumption-like, and subject-like taxonomies into a unified OWL ontology. It has been applied to business taxonomies.

4.4 Learning

Learning is a functionality which allows NeOn users to share results of their efforts on ontology construction by means of using semi-automatic lerning methods. Learning ontology enables to acquire concepts of entities and relations between entities from semi-structured data, such as collections of raw texts for a given domain. A user should be able to specify which data collection to mine, possibly a specific kind of relation to extract (either by giving a formal description of it or providing a set of seed examples), and possibly a specific kind of entities for which relations should be found (e.g. only location names).


Available models and tools:

More information on ontology learning from text can be found in a collection of papers [1] addressing three perspectives: methodologies that have been proposed to automatically extract information from texts, evaluation methods defining procedures and metrics for a quantitative evaluation of the ontology learning task, and application scenarios that make ontology learning a challenging area in the context of real applications. For a survey of different tools see [6].

Papers on Ontology Learning by researchers at AIFB (Karlsruhe) can be downloaded at http://www.aifb.uni-karlsruhe.de/Personen/Forschungsgebiete/viewForschungsgebiet?fgebiet_id=71.

OntoLearn.

1   2   3   4   5   6   7   8   9

Похожие:

NeOn: Lifecycle Support for Networked Ontologies iconThe Use of uml as a Tool for the Formalisation of Standards and the Design of Ontologies in Agriculture

NeOn: Lifecycle Support for Networked Ontologies iconEnterprise Architecting Lifecycle Management

NeOn: Lifecycle Support for Networked Ontologies icon[edit] Primary lifecycle processes

NeOn: Lifecycle Support for Networked Ontologies iconNetworked Peers For Business

NeOn: Lifecycle Support for Networked Ontologies iconThe importance of trust in the digital networked economy

NeOn: Lifecycle Support for Networked Ontologies iconDans – Data Archiving and Networked Services

NeOn: Lifecycle Support for Networked Ontologies iconVirtual Collaborative Learning Environments for Music: Networked DrumSteps

NeOn: Lifecycle Support for Networked Ontologies iconРуководитель программы: проф. В. А. Иванов Кафедра оптики Научный доц. Ю. Э. Скобло Рецензент: доц. А. А. Пастор Study of the population processes of neon atom

NeOn: Lifecycle Support for Networked Ontologies iconHighly versatile Senior System Administrator, skilled in the architectural design and implementation of high-availability systems for all major networked

NeOn: Lifecycle Support for Networked Ontologies iconNew Media is the term used for networked computerized or digital technologies that permeate society. There are many definitions of New Media, depending on the

Разместите кнопку на своём сайте:
Библиотека


База данных защищена авторским правом ©lib.znate.ru 2014
обратиться к администрации
Библиотека
Главная страница