NeOn: Lifecycle Support for Networked Ontologies




Скачать 340.61 Kb.
НазваниеNeOn: Lifecycle Support for Networked Ontologies
страница8/9
Дата24.09.2012
Размер340.61 Kb.
ТипДокументы
1   2   3   4   5   6   7   8   9
4.7 Argumentation


Available Models and Tools


Toulmin Model

The Toulmin Model, as it is introduced in [1], is a model which tries to explain how people argue. In this model a single argument consists of five components. The data are facts and observations from which a claim is concluded. The warrant is the rule used for this conclusion. It is supported by a backing which explains why the warrant is valid. A rebuttal can be used for listing circumstances under which the claim doesn't apply.

If one uses the Toulmin Model for analyzing a discussion between people, one can see that usually not all components of an argument are explicitly stated right from the beginning. For example, people might try to just formulate a claim and see whether the other participants agree. Only if one participant disagrees further components like the data and the warrant are supplied. During such a discussion single arguments can be combined to lines of arguments where a claim serves as the data for another argument.


Issue-Based Information Systems (IBIS)

While the Toulmin Model focuses on the single argument in a discussion, the IBIS model focuses on structuring and documenting a discussion process (see [2]). A discussion in IBIS is centered around a topic for which a number of issues exist. IBIS helps to structure and document the discussion process so that the participants come to an agreement about the issues. For this purpose, the participants can formulate their positions about an issue which are then supported or objected by arguments.

There exist several relations between issues, positions and arguments. For example a position of a participant responds to one or more of the issues, an issue may specialize another issue and an argument may challenge or justify other arguments, positions or issues. Further relations between the elements of the IBIS model are available in the literature.


Rethorical Structure Theory (RST)

The Rethorical Structure Theory (RST) (see [4]) is used for analyzing the coherence of texts. The smallest unit of text which is analyzed by the RST is called a span. Those spans are connected by certain relations. In most cases a relation connects two spans with each other where one span is called nucleus and the other is called its satellite. The nucleus is the part of a text which makes a claim while the satellite provides some evidence for the claim.

An example relation of RST is the elaboration where the satellite presents additional detail about the claim in the nucleus. Another example is the contrast which relates two nucelus spans with each other so not necessarily a nucleus and a span are related in RST. The RST web site currently lists thirty relations which may appear in texts according to the RST. But in [5] it is shown that not all of those relations are equally often used in a typical discussion. There the originally thirty relations are reduced to five central relations with the highest influence on the outcome of a discussion.

Compendium

See secs. 4.3 and 5.6.

Co4

A methodology and tool for collaborative construction of consensual knowledge (INRIA to fill).

SDRT

(To be filled).

DILIGENT

See secs. 4.2 and 5.6.


References

[1] S. Toulmin. The Uses of Arguments. Cambridge University Press, 1958

[2] W. Kunz & H. Rittel. Issues as elements of information systems, Working Paper 131, Institute of Urban and Regional Development, University of California, Berkeley, 1970

[3] H. Rittel, & M. Webber. Dilemmas in a General Theory of Planning. Policy Sciences 4, 155-169, 1973

[4] W.C. Mann & S.A. Thompson. Rhetorical Structure Theory: Towards a functional theory of text organization. Text, 1988, 8(3), 243-281

[5] S. Pinto, S. Staab & C. Tempich. DILIGENT: Towards a fine-grained methodology for Distributed Loosely-controllled and evolvInG Engineering of oNTologies. In Proceedings of ECAI-2004, 2004.

[6] V. Uren, S. Buckingham Shum, C. Mancini & V.U. Gangmin Li. Modelling Naturalistic Argumentation in Research Literatures. In Proceedings of the 4th Workshop on Computational Models of Natural Argument, 2004.

4.8 Provenance

In general, the provenance of an entity provides information about “where an entity originated, how it was produced and what has happened to it since creation” (see [4]). The idea of keeping track of an entity's provenance is originated in the database community, where it is also studied under the names “data lineage” and “data pedigree” (see [1] and [2]).

Available Models and Tools:

Which data should be collected during the lifetime of an entity is very dependent on the application and possible future queries on the provenance data. Minimal provenance data should contain information about all authors who influenced the current state of an entity together with a timestamp of their contribution. But ideally, the provenance data should have at least a granularity which allows to reproduce the state of an entity for each point in time since its creation (see [3] and [5]). These requirements to a provenance tracking system are fulfilled by a version control system like CVS or the framework for ontology evolution described in [6].

Methodologies for the collaborative evolution of ontologies (e.g. DILIGENT; see [7]) impose further requirements on the collection of provenance data. For example in [8], page 83, it is demanded that not only a change itself is logged but also all external sources which led to the change (e.g. a change in another ontology or an argument between users of the changed ontology). These references to external sources can be used for tracking why a change has occured. They enable the user to understand and agree with made design/evolution decisions.

Although the requirements for provenance tracking systems in the context of collaborative ontology evolution are already formulated (e.g. in [8]), there currently doesn't exist a system which implements exactly these requirements. Only in the database community there exist first prototypes of systems which allow the tracking of provenance across system borders (i.e. keeping track of the provenance of entities which origin in external sources like other databases). Such a prototype is described in [3]. Another work into this direction is described in [4] where it is tried to describe the provenance of an entity by its relationship to other entities like external sources or the entity itself at an earlier point of time.

References

[1] P. Buneman, S. Khanna, and W.-C. Tan. Data Provenance: Some Basic Issues. In Proceedings of Foundations of Software Technology and Theoretical Computer Science, 2000

[2] P. Buneman, S. Khanna, and W.-C. Tan. Why and Where: A Characterisation of Data Provenance. In Proceedings of the International Conference on Database Theory, 2001

[3] P. Bunemann, A. Chapman, J. Cheney, and S. Vansummeren. A Provenance Model for Manually Curated Data. In Proceedings of the International Provenance and Annotation Workshop, 2006

[4] S. Miles. Electronically Querying for the Provenance of Entities. In Proceedings of the International Provenance and Annotation Workshop, 2006

[5] D. Bourilkov, V. Khandelwal, A. Kulkarni, and S. Totala. Virtual Logbooks and Collaboration in Science and Software Development. In Proceedings of the International Provenance and Annotation Workshop, 2006

[6] M. Klein, and N. Noy. A Component-Based Framework for Ontology Evolution. In Proceedings of the Workshop on Ontologies and Distributed Systems, 2003

[7] S. Pinto, S. Staab, and C. Tempich. DILIGENT: Towards a Fine-Grained Methodology for Distributed, Loosely-controlled and Evolving Engineering of Ontologies. In Proceedings of ECAI-2004, 2004

[8] Y. Sure et al. SEKT Methodology: Initial Framework and Evaluation of Guidelines, Deliverable D7.1.2, SEKT, http://www.sekt-project.org

4.9 Data Annotation

Data annotation concerns the task of adding metadata to text. In this context it concerns linking instances in the text to concepts in the ontology, and potentially also finding relations between such concepts. This is known as semantic metadata creation. Typically, instances of named entities in the text such as “John Smith” are marked with their respective concept or entity type, which could be either a quite general concept such as ‘Person’ or a more specific one such as ‘PhD student’. Note that such semantic (or linguistic) metadata can be considered metadata of the resource (usually a text) that is annotated with this information, and it is not the same as ontology metadata.

Semantic metadata creation forms an important part of ontology creation and management. It enables us to combine and associate existing ontologies, to perform more detailed analysis of the text, to extract deeper and more accurate knowledge, and forms the main link between text and ontologies. It is very closely related to the task of ontology population, which concerns adding instances from the text to concepts in the ontology. (See also section 5.4 Ontology Learning).

Data annotation can be performed manually, semi-automatically, or fully automatically. The latter two methods generally involve ontology-based information extraction. This task can be broken down into the following subtasks:

Identification of entity mentions in the text, using classes from the ontology instead of the flat list of types in ‘traditional’ named entity recognition systems;

Reference disambiguation:

Adding new instances to the ontology if needed

Disambiguating with respect to instances in the ontology

Identification of instances of attributes and relations, i.e. datatype and object properties.


Available Models and Tools

Typically data annotation uses information extraction-based tools.

AeroDAML [9] is an annotation tool created by Lockheed Martin which applies IE techniques to automatically generate DAML annotations from web pages. The aim is to provide naive users with a simple tool to create basic annotations without having to learn about ontologies, in order to reduce time and effort and to encourage people to semantically annotate their documents. AeroDAML links most proper nouns and common types of relations with classes and properties in a DAML ontology.

AeroDAML consists of the AeroText IE system, together with components for DAML generation. A default ontology which directly correlates to the linguistic knowledge base used by the extraction process is used to translate the extraction results into a corresponding RDF model that uses the DAML+OIL syntax. This RDF model is then serialised to produce the final DAML annotation. The AeroDAML ontology is comprised of two layers: a base layer comprising the common knowledge base of AeroText, and an upper layer based on WordNet. AeroDAML can generate annotations consisting of instances of classes such as common nouns and proper nouns, and properties, of types such as coreference, Organisation to Location, Person to Organization.

Amilcare [4] is an adaptive IE system which has been integrated in several different annotation tools for the Semantic Web. It uses machine learning (ML) to learn to adapt to new domains and applications using only a set of annotated texts (training data). It has been adapted for use in the Semantic Web by simply monitoring the kinds of annotations produced by the user in training, and learning how to reproduce them. The traditional version of Amilcare adds XML annotations to documents (inline markup); the Semantic Web version (used by Melita) leaves the original text unchanged and produces the extracted information as triples of the form (standoff markup). This means that it is left to the annotation tool and not the IE system to decide on the format of the ultimate annotations produced.

In the Semantic Web version, no knowledge of IE is necessary; the user must simply define a set of annotations, which may be organised as an ontology where annotations are associated with concepts and relations. The user then manually annotates the text using some interface connected to Amilcare, as described in the following systems. Amilcare works by preprocessing the texts using GATE's IE system ANNIE [10], and then uses a supervised machine learning algorithm [2] to induce rules from the training data.

KIM [12] is an extendable platform for knowledge management which offers facilities for metadata creation, storage, and semantic-based search. It also includes a set of front-ends for online use, that offer semantically enhanced browsing. The information extraction in KIM is based on the GATE framework [5]. The essence of the KIM IE is the recognition of named entities with respect to the KIM ontology. The entity instances all bear unique identifiers that allow annotations to be linked both to the entity type and to the exact individual in the instance base. For new (previously unknown) entities, new identifiers are allocated and assigned; then minimal descriptions are added to the semantic repository. The annotations are kept separately from the content, and an API for their management is provided.

Melita [3] is an ontology-based tool for semantic annotation, which provides a mechanism for a user to interact with an IE system (Amilcare). It consists of two main parts: an ontology viewer and a document editor. The two most interesting features of Melita are that it enables the user to tune the IE system to provide different levels of proactivity, and to schedule texts to provide timeliness (i.e. learning with minimum delay). The annotation cycle follows two phases: manual annotation (training of the system) and active annotation (where the system takes over the annotation automatically). At some point, the system will start suggesting annotations to the user (active annotation) and the user can correct these as necessary. The system can suggest annotations as either reliable or unreliable, depending on its confidence level about that annotation. Reliable annotations need to be explicitly removed by the user, while unreliable annotations need to be explicitly added.

MnM [11] is a semantic annotation tool which provides support for annotating web pages with semantic metadata. This support is semi-automatic, in that the user must provide some initial training information by manually annotating documents before the IE system (Amilcare) can take over. It integrates a web browser, an ontology editor, and tools for IE, and has been described as "an early example of next-generation ontology editors" [11], because it is web-based and provides facilities for large-scale semantic annotation of web pages.

The philosophy behind MnM is that semantic annotation of web pages can, and should, be carried out by users without specialist skills in either language technology or knowledge engineering. It therefore aims to provide a simple system to perform knowledge extraction tasks at a semi-automatic level.

The ontology population process is semi-automatic and may require intervention form the user. Firstly, it only deals with a pre-defined set of concepts in the ontology. Secondly, the system is not perfect and may miss instances in the text, or allocate them wrongly. Retraining can be carried out at any stage, however.

S-CREAM (Semi-automatic CREAtion of Metadata) [8] is a tool which provides a mechanism for automatically annotating texts, given a set of training data which must be manually created by the user. It uses a combination of two tools: Onto-O-Mat, a manual annotation tool which implements the CREAM framework for creating relational metadata, and Amilcare.

As with the other Amilcare-based tools, S-CREAM is trainable for different domains, provided that the user creates the necessary training data. It essentially works by aligning conceptual markup (which defines relational metadata) provided by Ont-O-Mat with semantic markup provided by Amilcare. This problem is not trivial because the two representations may be very different. Relational metadata may provide information about relationships between instances of classes, for example that a certain hotel is located in a certain city. S-CREAM thus supports metadata creation with the help of a traditional IE system, and also provides other functionalities such as web crawler, document management system, and a meta-ontology.

Magpie [7] is a suite of tools which supports the interpretation of webpages and "collaborative sense-making". It annotates webpages with metadata in a fully automatic fashion and needs no manual intervention by matching the text against instances in the ontology. It automatically populates an ontology from relevant web sources, and can be used with different ontologies. The principle behind it is that it uses an ontology to provide a very specific and personalised viewpoint of the webpages the user wishes to browse. This is important because different users often have different degrees of knowledge and/or familiarity with the information presented, and have different browsing needs and objectives.

The PANKOW system (Pattern-based Annotation through Knowledge on the Web) [1] exploits surface patterns and the redundancy on the Web to categorise automatically instances from text with respect to a given ontology. The patterns are phrases like: the (e.g., the Ritz hotel) and is a (e.g., Novotel is a hotel). The system constructs patterns by identifying all proper names in the text (using a part-of-speech tagger) and combining each one of them with each of the 58 concepts from their tourism ontology into a hypothesis. Each hypothesis is then checked against the Web via Google queries and the number of hits is used as a measure of the likelihood of this pattern being correct.

The SemTag system [6] performs large-scale semantic annotation with respect to the TAP ontology {http://tap.stanford.edu/tap/papers.html}. It first performs a lookup phase annotating all possible mentions of instances from the TAP ontology. In the second, disambiguation phase, SemTag uses a vector-space model to assign the correct ontological class or determine that this mention does not correspond to a class in TAP. The disambiguation is carried out by comparing the context of the current mention with the contexts of instances in TAP with compatible aliases, using a window of 10 words either side of the mention.

The SemTag system is based on a high-performance parallel architecture - Seeker, where each node annotates about 200 documents per second. The demand for such parallelism comes from the big volumes of data that need to be dealt with in many applications and make automatic semantic annotation the only feasible option. A parallel architecture of a similar kind is currently under development for KIM and, in general, it is an important ingredient of large-scale automatic annotation approaches.


References

[1] P. Cimiano, S. Handschuh, and S. Staab. Towards the Self-Annotating Web. In Proceedings of WWW’04, 2004.

[2] F. Ciravegna. Adaptive information extraction from text by rule induction and generalisation. In Proceedings of the 17th International Joint Conference on Artificial Intel ligence (IJCAI 2001), 2001.

[3] F. Ciravegna, A. Dingli, D. Petrelli, and Y. Wilks. User-System Cooperation in Document Annotation Based on Information Extraction. In 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), pages 122–137, Siguenza, Spain, 2002.

[4] F. Ciravegna and Y. Wilks. Designing Adaptive Information Extraction for the Semantic Web in Amilcare. In S. Handschuh and S. Staab, editors, Annotation for the Semantic Web. IOS Press, Amsterdam, 2003.


[5] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL’02), 2002.

[6] S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, T. Kanungo, S. Ra jagopalan, A. Tomkins, J. A. Tomlin, and J. Y. Zien. SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation. In Proceedings of WWW’03, 2003.

[7] J. Domingue, M. Dzbor, and E. Motta. Magpie: Supporting Browsing and Navigation on the Semantic Web. In N. Nunes and C. Rich, editors, Proceedings ACM Conference on Intel ligent User Interfaces (IUI), pages 191–197, 2004.

[8] S. Handschuh, S. Staab, and F. Ciravegna. S-CREAM — Semi-automatic CREAtion of Metadata. In 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), pages 358–372, Siguenza, Spain, 2002.

[9] P. Kogut and W. Holmes. AeroDAML: Applying Information Extraction to Generate DAML Annotations from Web Pages. In First International Conference on Knowledge Capture (K-CAP 2001), Workshop on Knowledge Markup and Semantic Annotation, Victoria, B.C., 2001.

[10] D. Maynard, V. Tablan, H. Cunningham, C. Ursu, H. Saggion, K. Bontcheva, and Y. Wilks. Architectural Elements of Language Engineering Robustness. Journal of Natural Language Engineering – Special Issue on Robust Methods in Analysis of Natural Language Data, 8(2/3):257–274, 2002. http://gate.ac.uk/sale/robust/robust.pdf.

[11] E. Motta, M. Vargas-Vera, J. Domingue, M. Lanzoni, A. Stutt, and F. Ciravegna. MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup. In 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), pages 379–391, Siguenza, Spain, 2002.

[12] B. Popov, A. Kiryakov, A. Kirilov, D. Manov, D. Ognyanoff, and M. Goranov. KIM – Semantic Annotation Platform. Natural Language Engineering, 2004.

4.10 Social Network Analysis

Social network or link analysis in NeOn enables analysis of links [information flow] between ontology engineers working on an ontology. In general social network analysis deals with mappings and measuring of relationships and flows between nodes of the social network that are people, organizations, computers or other information processing agents. Methods provide for visual and for mathematical analysis of relationships between the agents.


Available Models and Tools

A method to understand networks and their participants is to evaluate the location of actors in the network. Measuring the network location is finding the centrality of a node. These measures help determine the importance, or prominence, of a node in the network. There are different measures that are commonly used in social network analysis such as betweeness, centrality closeness, centrality degree, centralization, clustering coefficient, cohesion, reach, etc.

Degree Centrality - Node is central in a network, if it is active enough in the sense that is has a lot of links to other nodes

Closeness Centrality - The most central nodes according to closeness centrality can quickly interact to all others because they are, on average, close to all others.

Betweenness Centrality - A node is central, if it lies on several shortest paths among other pairs of nodes.

Network Centralization - Computed node centralities in a network can have large or small variance. Network, where a low number of nodes have much higher centrality than other nodes is highly centralized.

Network Reach - The degree any member of a network can reach other members of the network.

Network clustering coefficient - The clustering coefficient is a measure of the likelihood that two associates of a node are associates themselves. A higher clustering coefficient indicates a greater 'cliquishness'.

Network cohesion - Refers to the degree to which actors are connected directly to each other by cohesive bonds. Groups are identified as ‘cliques’ if every actor is directly tied to every other actor, or ‘social circles’ if there is less stringency of direct contact.

Modularity - Modularity is a property of a network and a specific proposed division of that network into communities. It measures when the division is a good one, in the sense that there are many edges within communities and only a few between them.


Algorithms for detecting community structure in networks

Divisive method based algorithms

Girvan Newman algorithm[5] - Algorithm progressively removes edges with highest edge betweenness from the original graph.If a network contains communities or groups that are only loosely connected by a few inter-group edges, then all shortest paths between different communities must go along one of these few edges. Thus, the edges connecting communities will have high edge betweenness. By removing these edges, we separate groups from one another and so reveal the underlying community structure of the graph.

The algorithm of Radicchi et al. - Algorithm removes edges that belong to a relatively low number of loops, for they are likely to be edges between communities.

Agglomerative hierarchical clustering method based algorithms

Modularity optimization algorithm[4] - Starting with a state in which each vertex is the sole member of one of n communities, we repeatedly join communities together in pairs, choosing at each step the join that results in the greatest increase (or smallest decrease) in modularity. This method can be applied to very large networks.

Single linkage methods - The idea behind this technique is to develop a measure of similarity between pairs of vertices, based on the network structure one is given. Many different such similarity measures are possible. Once one has such a measure then, starting with an empty network of n vertices and no edges, one adds edges between pairs of vertices in order of decreasing similarity, starting with the pair with strongest similarity. Structural equivalence is an example of a similarity measure. Two vertices are said to be structurally equivalent if they have the same set of neighbors.

Other methods

Spectral bisection algorithm[7] - A method based on the eigendecomposition of the Laplacian matrix. Eigenvector, corresponding to the second lowest eigenvalue, determines a partition of nodes into two communities.

For NeOn it is important to consider approaches that enable combining structure analysis with content analysis [2]. FAO have agreed to provide data relative to interaction between parties with a common goal which NeOn partners will be able to use for SNA.


References

[1] Soumen Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext Data,Morgan-Kaufmann Publishers, 2002

[2] Matt Richardson, Pedro Domingos, Combining Link and Content Information in Web Search, In M. Levene and A. Poulovassilis (eds.), Web Dynamics (pp. 179-193), 2004. New York: Springer.

[3] Stanley Wasserman, Katherine Faust, Social Network Analysis: Methods and Applications, United Kingdom: Cambridge University Press, 1994

[4] Aaron Clauset, M. E. J. Newman, and Cristopher Moore: Finding community structure in very large networks, Phys. Rev. E 70, 066111 (2004)

[5] M. Girvan and M. E. J. Newman, Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821{7826 (2002).

[6] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, De—ning and identifying communities in networks. Preprint cond-mat/0309488 (2003)

[7] A. Pothen, H. Simon, and K.-P. Liou, Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11, 430{452 (1990).

4.11 Lexical domains

Lexical Domains provide a natural way to establish semantic relations among word senses, which can be profitably used for Computational Linguistics. Domains are fields of knowledge, expertise, interest or human discussion, such as POLITICS, ECONOMY, SPORT, which exhibit their own terminology and lexical coherence. Domains establish a semantic relation between word senses by grouping them into the same semantic domain (Sports, Medicine, etc.). The word bank, for example, has ten senses in WordNet 2.0, but three of them, `bank#1', `bank#3' and `bank#6' are grouped into the same domain label, Economy, whereas `bank#2' and `bank#7' are grouped into the domain labels Geography and Geology.

Domain information has been exploited both in computational linguistics and lexicography for sense clustering and word disambiguation purposes, because it provides an orthogonal type of semantic information, and enables concept selection from lexical resources with associated domain information such as WordNet. This may take the following forms:


a) Word sense clustering for polysemous words with more than one sense sharing the same domain;


b) Word sense disambiguation: the selection of a word sense on the basis of domain related vocabulary. For each word that needs to be disambiguated the domain of its context is estimated. Then this domain is compared to the domain of each word sense, and the most similar one is selected [4].


For ontological purposes, lexical domains can be applied for a range of operations.

Once re-engineered, they enable concept selection by means of domain filtering and linking to other domain-specific ontologies.


Available Models and Tools


Available material comes in the form of lexical material, i.e. word senses associated to a set of domains. Several multilingual lexical resources exist with domain information. For example, domains are hierarchically organized in Longman Dictionary of Contemporary English (LDOCE) (Procter, 1978), and a separate resource connected to WordNet (Fellbaum, 1998; Magnini and Cavaglià, 2000; Bentivogli et al., 2004, Strapparava et al., 2004) (see http://tcc.itc.it/research/textec/topics/disambiguation/hierarchy.html) and its multilingual counterpart EuroWordNet [6]. The Van Dale dictionary English-Dutch (http://www.vandale.nl/producten/666860297) uses a flat list of domains.


[1] C. Strapparava and A. Valitutti (2004), WordNet-Affect: an affective extension of WordNet. In Proceedings ofthe 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, May 2004, pp. 1083-1086.

[2] B. Magnini and G. Cavaglià (2000), Integrating Subject Field Codes into WordNet, In Gavrilidou M., Crayannis G., Markantonatu S., Piperidis S. and Stainhaouer G. (Eds.) Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation, Athens, Greece, 31 May - 2 June, 2000, pp. 1413-1418.

[3] L. Bentivogli, P. Forner, B. Magnini and E. Pianta (2004), Revising WordNet Domains Hierarchy: Semantics, Coverage, and Balancing, In Proceedings of COLING 2004 Workshop on "Multilingual Linguistic Resources", Geneva, Switzerland, August 28, 2004, pp. 101-108.

[4] A. Montoyo, M. Palomar (2001), Specification Marks for Word Sense Disambiguation: New Development, In Gelbukh, A. F., CICLing, 2004 of Lecture Notes in Computer Science, 182-191. Springer.

[5] C. Fellbaum (1998), WordNet: An Electronic Lexical Database, MIT Press

[6] P. Vossen (1998), EuroWordNet: a Multilingual Database with Lexical Semantic Networks Kluwer Academic Publishers

[7] P. Procter (1978), Longman Dictionary of Contemporary English. Longman Group Limited, Harlow and London, England.

4.12 Ontology localization

Within the Neon architecture, multilinguality occurs in different places in the information structure. These multilingual aspects need to be covered in the Neon architecture:

a. the language of ontology labels for concepts, relations etc.

b. the language of metadata elements, e.g. textual descriptions of ontology content.

c. the language of the documents that have been annotated with ontology labels


For the purpose of ontology localization, i.e. the translation of ontology labels into different languages, translation equivalents are normally selected manually. This can mean a substantial amount of work for the ontology developers. The automization of this process will assist these developers by suggesting one or more candidates for the ontology label, which have been selected on the basis of different resources and criteria.


Available Models and Tools

For the characterization of the language properties of labels, metadata and documents, we want to adhere as much as possible to existing standards. Language codes can be chosen from a standard lists, such as ISO 639.2, created by the Library of Congress.

This list and others have been operationalized by resource metadata description initiatives such as ISLE MetaData Initiative and Open Language Archives Community.

The Ontology Metadata Vocabulary OMV contains information types for the description of ontologies at resource level.

The natural language covered by the labels of a monolingual ontology (part of a) is captured in the OMV attribute “NaturalLanguage” as part of the descriptive metadata. This is useful for the description of stand alone monolingual ontologies that may or may not have been mapped onto other ontologies.

Mappings will implicitly refer to this attribute when pointing to the source and target ontology.

The language of metadata and documents (b and c) is not captured by OMV.

If an ontology contains multilingual labels, as in the case of the AgroVoc thesaurus, the OMV attribute “NaturalLanguage” should be either “multilingual” or a list of language codes covered in the resource. The latter option is useful for locating relevant ontologies in a required language.

In order to specify multilinguality at ontology element level, we will then need two object properties, which encode language specific orthography and language code.

Another task necessary for is to allow (semi-automatic translation of ontology labels, i.e. language dependent names of concepts.

For this purpose, we popose to further develop the tool Label translator, which has been developed by a.o. UPM. This tool applies a number of strategies to supply the user with the proper translation of an ontological label. The tool will also be further developed and used in another project, Musing, with which we will seek collaboration.

Other available tools are:

electronic bi-/multilingual dictionaries such as OneLook

on-line translation: e.g. Altavista's Babelfish

multilingual thesauri: EuroWordNet.


References

[1] Thierry Declerck, Ovidiu Vela. LabelTranslator: Multilingualism in Ontologies. Proceedings of the 4th Interantional Semantic Web Conference. 2005.

4.13 Multilingual ontology integration

Multilingual ontology integration will be performed in tandem with lexical domains’ and ontology localization’s tasks.

This task concerns the alignment of ontologies with concept labels in different languages.

When the labels of a particular ontology have sussessfully been translated into another language, these translations can be compared with labels from ontologies that already exist in the target language. If a mapping can be established on the basis of the translations, ontologies in different languages can be integrated.

The first objective of multilingual ontology integration is to research the automation of the alignment of multilingual ontologies on the basis of label translation.

The second objective of multilingual ontology integration is to research the potential of lexical resources for assisting ontology alignment. After re-engineering into SKOS or ontological format, the usability of the lexical material provided by these resources for linking them to ontologies will be evaluated. For instance, propagation through (Euro)WordNet will possibly provide synonym, hypernym or hyponym information that enables linking where direct translation methods fail.

1   2   3   4   5   6   7   8   9

Похожие:

NeOn: Lifecycle Support for Networked Ontologies iconThe Use of uml as a Tool for the Formalisation of Standards and the Design of Ontologies in Agriculture

NeOn: Lifecycle Support for Networked Ontologies iconEnterprise Architecting Lifecycle Management

NeOn: Lifecycle Support for Networked Ontologies icon[edit] Primary lifecycle processes

NeOn: Lifecycle Support for Networked Ontologies iconNetworked Peers For Business

NeOn: Lifecycle Support for Networked Ontologies iconThe importance of trust in the digital networked economy

NeOn: Lifecycle Support for Networked Ontologies iconDans – Data Archiving and Networked Services

NeOn: Lifecycle Support for Networked Ontologies iconVirtual Collaborative Learning Environments for Music: Networked DrumSteps

NeOn: Lifecycle Support for Networked Ontologies iconРуководитель программы: проф. В. А. Иванов Кафедра оптики Научный доц. Ю. Э. Скобло Рецензент: доц. А. А. Пастор Study of the population processes of neon atom

NeOn: Lifecycle Support for Networked Ontologies iconHighly versatile Senior System Administrator, skilled in the architectural design and implementation of high-availability systems for all major networked

NeOn: Lifecycle Support for Networked Ontologies iconNew Media is the term used for networked computerized or digital technologies that permeate society. There are many definitions of New Media, depending on the

Разместите кнопку на своём сайте:
Библиотека


База данных защищена авторским правом ©lib.znate.ru 2014
обратиться к администрации
Библиотека
Главная страница