Bayesian semantics for the semantic web




НазваниеBayesian semantics for the semantic web
страница1/20
Дата06.10.2012
Размер1.34 Mb.
ТипДокументы
  1   2   3   4   5   6   7   8   9   ...   20


BAYESIAN SEMANTICS FOR THE SEMANTIC WEB

by

Paulo Cesar G. da Costa

A Dissertation
Submitted to the
Graduate Faculty
of
George Mason University
in Partial Fulfillment of
The Requirements for the Degree
of
Doctor of Philosophy
Information Technology

Committee:

_____________________________________________ Director

_____________________________________________

_____________________________________________

_____________________________________________

_____________________________________________

_____________________________________________ Department Chairperson

_____________________________________________ Program Director

_____________________________________________ Dean, School of Information
Technology and Engineering


Date:_________________________________________ Summer Semester 2005
George Mason University
Fairfax, VA

Bayesian Semantics for the Semantic Web

A Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University

By

Paulo Cesar G. da Costa
Master of Science
George Mason University, 1999

Director: Kathryn B. Laskey, Professor
Department of Systems Engineering and Operations Research

Summer Semester 2005
George Mason University
Fairfax, VA

Copyright 2005 Paulo Cesar G. da Costa

All Rights Reserved

Dedication

To Claudia, whose love and support made everything possible.

Acknowledgements

Mark Twain (1835 – 1910) once said: “Only presidents, editors and people with tapeworms have the right to use the editorial ‘we’”. However, this Dissertation can hardly be considered the result of a single’s person work, and in this section I can recognize only some of the people who helped me along the way. Thus, although I do not satisfy any of Mark Twain’s criteria, I will use the editorial ‘we’ throughout this work as a means to acknowledge the many contributors who helped me along the way in my research. In this section I recognize a few contributions that deserve special emphasis.

First among all, I would like to thank Kathryn Laskey for wearing so many hats in the last two years. When I asked her to be my advisor I already knew she was an outstanding professor and gifted researcher. However, as we worked together I began to realize how incredibly fortunate I was to have also met a mentor, co-author, and friend. She not only guided me through the many areas of knowledge I had to visit, but also showed a sensibility that few possess in pushing an advisee to achieve way beyond what he thought he could ever do. Thank you so much for your splendid support and guidance!

I was also fortunate enough to have been blessed with one of the finest commit­tees any PhD candidate could hope to have. Following the random order in which they signed this Dissertation, I had the privilege to have Larry Kerschberg’s impressive back­ground and always-thoughtful insights as invaluable assets I exploited many times during my research. I could also rely on tremendously rewarding guidance from Ken Laskey, whose clear thinking and deep knowledge of the W3C, Semantic Web and Data Inter­change cannot be found in any book. Even if such a book should exist some day, it will probably not match Ken’s ability to make even the hardest problems look embarrassingly simple. Nonetheless, when the hardest problems came from the data mining area I was always at ease, for I could count on Daniel Barbará’s extensive knowledge and experi­ence in this field. These are dwarfed only by his talent for integrating ideas while pre­senting the big picture within which the real problems are best understood. The last committee member to sign the page was Dave Schum, who also signed my MS Thesis in 1999 in the same position. Thus, I was fortunate enough to have twice the opportunity to learn from such a brilliant researcher and larger than life character.

During my PhD studies I also had the chance to interact with top-notch educators who had great influence on my research. One is Tomasz Arciszewski, whose teachings in inventive engineering played a key role in my own approach to difficult challenges. I am also in debt to Carlotta Domeniconi, who took me from ground zero in data mining to a level in which I could understand how its algorithms can help to solve many problems in the Semantic Web. Finally, Dennis Buede, my advisor from the MS years, introduced me to the field of decision-making and was a major collaborator on the Wise Pilot system depicted in Appendix C. I also wish to thank Eswar Sivaraman for constructive criticism that helped me to improve my presentation skills.

I would also like to thank Tod Levitt for his contribution as a reviewer of this dissertation and some of the papers that resulted from my research. My gratitude is two-fold, since as CEO of IET he permitted me to use their flagship product, Quiddity*Suite, and to interact with the amazingly competent team he leads. Among that team I must personally thank Ed Wright, who made my learning path to Quiddity*Suite much easier with his always prompt and clear advice; Mike Pool, whose expertise in OWL and excitement about the possibilities of probabilistic reasoning were both helpful and enlightening; and Masami Takikawa and Francis Fung, whose technical excellence and creativity made them outstanding co-authors.

The GMU Bayesian Group was an incredible source of enlightening discussions and a forum that helped me not only to improve my research but also to enjoy and understand the value of sharing knowledge among different research interests and areas. Among that group, special thanks go to Chris Cuppan, Aleks Lazarevich, Sepideh Mirza, Andy Powell and Mehul Revankar for reading and commenting on this work and the related papers. Ghazi Alghamdi, Stephen Cannon, Sajjad Haider, Jim Jones, Tom Shackelford, and Ning Xu also have my gratitude for the help and support and friendship they provided throughout the research period.

None of my research would have been possible without the support from the Brazilian Air Force, whose commander, General Luiz Carlos da Silva Bueno, assigned me to such a challenging endeavor. I must also recognize the support I have received from Lieutenant Generals Adenir Viana and Cleonilson Nicássio; from Colonel Milton Casimiro, and from my dear colleagues, Lt. Cols. Carlos Liberato and Tomaz Gustavo, who certainly had to face a greater workload in my absence. I am also grateful for the remarkable support I received from the Brazilian Air and Defense Attaché in Washington, D.C., Major General Sérgio Freitas, from his Adjunct, Col. Lima de Andrade, and from Master Sargent João Luiz, Alzira Welle and Elenice Gaspar.

My family was a strong point of support in those years. I thank my dear brother Ricardo Costa for his many visits to our house, which reminded us of the wonderful relatives we have in Brazil. I also thank my sister in law, Lívia Costa, for visiting us, and my brothers in law, Marclei Neves and Cleber Júnior for their steady support.

To my parents, Quintino and Vitoria Costa, and to my parents in law, Cleber and Margo Neves, I must apologize for having stolen their grandchildren for two very long years. I am sure they more than anyone have been eagerly awaiting my graduation day. I hope they forgive me for demanding such a sacrifice.

Indeed, Paulo and Laura Costa are such an awesome pair that I am sure anyone would miss them terribly. Paulo, a terrific son who keeps reminding me how I must have been when I was 14, never complained about having an absent-minded father at an age at which a boy needs a participative one. Laura, whose energy is only matched by her sweetness, constantly used the credentials of a 7-years old to rescue her dad from the books without making him angry. Thank you both for filling our lives with joy.

Finally, my deepest gratitude goes to Claudia, my wife, best friend, and lifetime partner. During our more than seventeen years together she has never ceased to impress me with her tenacity, intelligence, and wisdom. However, during those two very busy years she has not only remained the central hub of our family she has always been, but also the one who comforted, helped, and inspired all of us during the hardest times with her endless love and devotion. To her I dedicate this Dissertation.

Table of Contents

Page

Abstract xv

Chapter 1 A Deterministic Model of a Probabilistic World 17

1.1 From Information to Knowledge 19

1.1.1 Is Semantic Information Really Important? 19

1.1.2 The Semantic Web and ontologies 23

1.2 Issues on Representing and Reasoning Using Ontologies 26

1.3 Why Uncertainty Matters 30

1.4 Research Contributions and Structure of this Dissertation 33

Chapter 2 Background and Related Research 36

1.5 Web Languages 36

1.6 A Brief Introduction to Probabilistic Representations 38

1.7 Bayesian Networks 42

1.7.1 Probabilistic Reasoning with Bayesian Networks 46

1.7.2 Case Study: The Star Trek Scenario 46

1.8 Probabilistic Extensions to Web Languages 51

1.8.1 Probabilistic extensions to Description Logic 51

1.8.2 Probabilistic Extensions to OWL 53

1.9 Probabilistic Languages with near First-Order Expressive Power 56

Chapter 3 Multi-Entity Bayesian Networks 59

1.10 A More “Realistic” Sci-fi Scenario 60

1.11 The Basics of MFrags 62

1.12 Representing Recursion in MEBN Logic 68

1.13 Building MEBN Models with MTheories 73

1.14 Making Decisions with Multi-Entity Decision Graphs. 79

1.15 Inference in MEBN Logic. 82

1.16 Learning from Data. 86

1.17 MEBN Semantics. 93

Chapter 4 The Path to Probabilistic Ontologies 99

1.18 A Polymorphic Extension to MEBN 105

1.18.1 The Modified MTheory Definition 108

1.18.2 The Star Trek MTheory Revisited 114

1.19 Using Quiddity*Suite for Building SSBNs 115

1.19.1 Concepts with Direct Translation 117

1.19.2 Concepts with a More Complex Translation 122

1.19.3 Use of Comments and Other Aspects of Quiddity*Suite 128

Chapter 5 PR-OWL 130

1.20 The Overall Implementation Strategy 131

1.20.1 Why MEBN as the semantic basis for PR-OWL? 133

1.20.2 Implementation Approach 136

1.21 An Upper Ontology for Probabilistic Systems 140

1.21.1 Creating an MFrag 150

1.21.2 Representing a Probability Distribution 158

1.22 A Proposed Operational Concept for Implementing PR-OWL 162

Chapter 6 Conclusion and Future Work 166

1.23 Summary of Contributions 166

1.24 A Long Road with Bright Signs Ahead 168

Bibliography 171

Appendix A Source Code for The Starship Model 190

Appendix B Preliminary Syntax and Semantics for PR-OWL 221

B.1 PR-OWL Classes 221

B.1 PR-OWL Classes 221

B.1.1 Alphabetical List of All PR-OWL Classes 221

B.1.2 Detailed Explanation of PR-OWL Classes 222

B.2 PR-OWL Properties 236

B.2 PR-OWL Properties 236

B.2.1 Alphabetical List of All PR-OWL Properties 236

B.2.2 Detailed Explanation of PR-OWL Properties 238

B.3 Naming Convention (optional) 250

B.3 Naming Convention (optional) 250

B.4 PR-OWL Upper-Ontology Code 253

B.4 PR-OWL Upper-Ontology Code 253

Appendix C Potential Applications for PR-OWL Outside the Semantic Web 295

C.1 PR-OWL for Integration Ontologies: The DTB Project 295

C.1 PR-OWL for Integration Ontologies: The DTB Project 295

C.2 PR-OWL for Multi-Sensor Data Fusion: The Wise Pilot System 300

C.2 PR-OWL for Multi-Sensor Data Fusion: The Wise Pilot System 300


List of Tables

Table Page

Table 1.Conditional Probability Table for Node MDR 48

Table 2.Sample Parts of the Danger To Self MFrag Probability Distribution 67

Table 3.MEBN Elements Directly Translated into Quiddity*Suite 122

Table 4.Metadata Annotation Fields 128

Table 5.Zone_MFrag Nodes in MEBN and PR-OWL 155

Table 6.Classes Used in PR-OWL 221

Table 7.Properties Used in PR-OWL 236


List of Figures

Figure Page

Figure 1.Simplified Text Understanding 20

Figure 2.Simplified Text Understanding after Data Preparation 21

Figure 3.Law of Total Probability 42

Figure 4.Sample Relationships Among Three Random Variables 44

Figure 5.The Naïve Star Trek Bayesian Network 48

Figure 6.The BN to the Four-Starship Case 49

Figure 7.The BN for One-Starship Case with Recursion 50

Figure 8.The Danger To Self MFrag 63

Figure 9.An Instance of the Danger To Self MFrag 66

Figure 10.The Zone MFrag 69

Figure 11.SSBN Constructed from Zone MFrag 70

Figure 12.The Star Trek Generative MTheory 75

Figure 13.Equivalent MFrag Representations of Knowledge 77

Figure 14.The Star Trek Decision MFrag 80

Figure 15.SSBN for the Star Trek MTheory with Four Starships within Range 85

Figure 16.Parameter Learning in MEBN 87

Figure 17.Structure Learning in MEBN 90

Figure 18.SSBNs for the Parameter Learning Example 92

Figure 19.Typical Web Agent’s Knowledge Flow – Ignoring Uncertainty 101

Figure 20.Typical Web Agent’s Knowledge Flow – Computing Uncertainty 103

Figure 21.Star Trek MTheory with the Transporter MFrag – Untyped Version 107

Figure 22.Built-in MFrags for Typed MEBN 109

Figure 23.Star Trek MTheory with the Transporter MFrag – Typed Version 114

Figure 24.Entity Clusters of Star Trek MTheory 119

Figure 25.Mapping the Sensor Report Entity Cluster to a Frame 121

Figure 26.Zone Entity Cluster 124

Figure 27.Overview of a PR-OWL MTheory Concepts 147

Figure 28.Elements of a PR-OWL Probabilistic Ontology 148

Figure 29.Header of the Starship Probabilistic Ontology 150

Figure 30.Initial Starship Screen with Object Properties Defined 151

Figure 31.Zone MFrag Represented in PR-OWL 153

Figure 32.ZoneMD Resident Node 157

Figure 33.Declarative Distributions in PR-OWL 159

Figure 34.A Probabilistic Assignment in a PR-OWL Table 160

Figure 35.Snapshot of a Graphical PR-OWL Plugin 163

Figure 36.The Insider Behavior Ontology (IB) 297

Figure 37.The Organization and Task Ontology (OT) 297

Figure 38.The Insider Threat Detection Process – Initial Setup 298

Figure 39.The Insider Threat Detection Process – Data Interchange 299

Figure 40.The Insider Threat Detection Process – Desired Process 299

Figure 41.General Track Danger Assessment Scheme 302

Figure 42.Individual Track's BN Information Exchange Scheme 303

Figure 43.Wise Pilot system – general scheme 304

Figure 44.Wise Pilot with 4 Tracks 305

Figure 45.Wise Pilot with 5 Tracks 306



List of Abbreviations

AAA – Anti-Aircraft Artillery

A-Box – Assertional Box (DL knowledge base assertional component)

AL – Attributive Languages (family of description logic languages)

ALC – Basic AL with the concept of negation added (C means complement)

API – Application Programming Interface

ARDA – Advanced Research and Development Activity (www.ic-arda.org)

BN – Bayesian Network

CEO – Chief Executive Officer

CPT – Conditional Probability Table

DAG – Direct Acyclic Graph

DAML – Darpa Agency Mark-up Language

DARPA – US Defense Advanced Research Project Agency

DL – Description Logics

DTB – Detection of Threat Behavior

FOL – First-Order Logic

FOPC – First Order Predicate Calculus

GML – Generalized Markup Language

GMU – George Mason University (www.gmu.edu)

HTML – Hypertext Markup Language

IBN – Insider Threat Behavioral Network

IET – Information Extraction and Transport, Inc. (www.iet.com)

IO – Integration Ontology

ISO1 – International Organization for Standardization

MEBN – Multi-Entity Bayesian Networks

MFrag – MEBN Fragment

MTheory – MEBN Theory

OIL – Ontology Interface Layer

OOBN – Object-Oriented Bayesian Networks

OWL – Web Ontology Language

PR-OWL – Probabilistic OWL

RDF – Resource Description Framework

RDFS – RDF-Schema

RV – Random Variables

SGML – Standard Generalized Markup Language

SHOE – Simple HTML Ontology Extensions

SSBN – Situation Specific Bayesian Network

SW – Semantic Web

T-Box – Terminological Box (DL knowledge base terminology component)

W3C – World Wide Web Consortium

WSMO – Web Service Modeling Ontology

WWW – World Wide Web

XML – Extensible Markup Language


Abstract

BAYESIAN SEMANTICS FOR THE SEMANTIC WEB

Paulo Cesar G. da Costa, Ph.D. Student

George Mason University, 2005

Dissertation Director: Dr. Kathryn B. Laskey

Uncertainty is ubiquitous. Any representation scheme intended to model real-world actions and processes must be able to cope with the effects of uncertain phenomena.

A major shortcoming of existing Semantic Web technologies is their inability to represent and reason about uncertainty in a sound and principled manner. This not only hinders the realization of the original vision for the Semantic Web (Berners-Lee & Fischetti, 2000), but also raises an unnecessary barrier to the development of new, powerful features for general knowledge applications.

The overall goal of our research is to establish a Bayesian framework for probabilistic ontologies, providing a basis for plausible reasoning services in the Semantic Web. As an initial effort towards this broad objective, this dissertation introduces a probabilistic extension to the Web ontology language OWL, thereby creating a crucial enabling technology for the development of probabilistic ontologies.

The extended language, PR-OWL (pronounced as “prowl”), adds new definitions to current OWL while retaining backward compatibility with its base language. Thus, OWL-built legacy ontologies will be able to interoperate with newly developed probabilistic ontologies. PR-OWL moves beyond deterministic classical logic (Frege, 1879; Peirce, 1885), having its formal semantics based on MEBN probabilistic logic (Laskey, 2005).

By providing a means of modeling uncertainty in ontologies, PR-OWL will serve as a supporting tool for many applications that can benefit from probabilistic inference within an ontology language, thus representing an important step toward the World Wide Web Consortium’s (W3C) vision for the Semantic Web.

In addition, PR-OWL will be suitable for a broad range of applications, which includes improvements to current ontology solutions (i.e. by providing proper support for modeling uncertain phenomena) and much-improved versions of probabilistic expert systems currently in use in a variety of domains (e.g. medical, intelligence, military, etc).


  1. A Deterministic Model of a Probabilistic World

We can trace attempts by humans to represent the world surrounding them to as early as 31,000 years ago, during the so called Upper Paleolithic period, where the earliest recorded cave drawings were made (Clottes et al., 1995). Moving from pictures representing objects of the real world (i.e. ideograms) to pictures representing the sounds we pronounce (i.e. phonograms), humans developed the first alphabets somewhere near the twentieth century B.C.2 The efficiency of written communication received a dramatic boost with the invention of the printing press by Johannes Guttenberg in 1450.

Printing had been the dominant form for representing and communicating human knowledge until the second half of the last century, when the advent of digital computing became the driving force of what Alvin Toffler (1980) called “the Third Wave” of change in human history (the first being the agricultural revolution and the second being the industrial revolution).

At this point, inquisitive readers might ask why Toffler’s terminology was chosen over the more technically oriented and widely used term “information technology revolution”.

The answer lies in the fact that we want a broader concept for the current era of changes so we can clearly distinguish the phase “information revolution”, which we consider as an almost concluded phenomenon, from “knowledge revolution”, the subsequent phase of the “Third Wave” we are experiencing nowadays.

Until the past few years, computers had been used primarily as media for storing, exchanging, and working with information. The Internet (the network infrastructure) and the World Wide Web (the information space) have played an important role as facilitators in this process. Yet, as the availability of information resources increases, we are starting to face a significant bottleneck in our ability to use it: our own capacity to process huge amounts of data.

Indeed, our cognitive process includes one extra step between receiving data and deciding and/or acting upon it, namely the need for updating our beliefs about the subject(s) of interest given the new information available to us or, in other words, to understand what the incoming data means for our decisions and actions.

In short, data per se is useless to most of our daily tasks until we transform it into knowledge. When we reach our cognitive limit for performing this task, we are experiencing what is called “information overload”.

During the “information revolution”, human beings have largely performed the transformation from data to decision-relevant knowledge, working in a data-centric scheme that we call the “information paradigm”. The “knowledge revolution” will be seen in the future as the phase in which this tedious task was successfully assigned to computers, allowing humans to shift their focus from data-centric activities to knowledge-centric activities, thus allowing them to work under the more efficient “knowledge paradigm”.

1.1From Information to Knowledge

The rapid expansion of corporate computer networks and the World Wide Web (WWW) is increasing the problem of information overload, and in this race between the availability of data and our capacity of transforming it into knowledge, humanity has developed many methods for using our ever-growing computational power to make our lives easier. Yet, in spite of the many efforts in this direction, we still have to rely heavily on the human brain for breaking the information to knowledge barrier. This led us to the question: What is missing for IT techniques to be able to help us to overcome the information paradigm and begin to work under the knowledge paradigm?

We argue that the answer lies in devising ways for the computers not only to “crunch the bytes” but also to “understand” what those bytes mean. Obviously, computers don’t really understand the meaning conveyed by the bytes they “crunch”. This is just a widely used metaphor to express the idea that making semantic information explicit and computationally accessible (i.e. better organizing the structure of data) is a powerful, more elegant way of utilizing that data. In other words, if we want to extract knowledge from data we must develop technologies that allow computers to make use of semantic, contextual information attached to the data being processed.

1.1.1Is Semantic Information Really Important?

Text Classification has been one of the hottest research topics in the academic community, particularly after the end of the last decade. The obvious explanation for this is the explosion of the WWW’s use since that period, where the rapid, continuously increasing availability of data is exerting a tremendous pressure to improve the capability of knowledge retrieval technologies. The current state-of-art paradigm for text classification of a huge corpus of text data utilizes a Vector Space Representation of documents. In this scheme, text documents are transformed into a single file called “bag of words”. Then, dimensionality reduction techniques are applied to that file, which is finally subjected to knowledge retrieval techniques aimed at pointing out possible partitions of the feature space.

One limitation of most techniques based on the Bag of Words paradigm is that they fail to consider the semantic meaning of the text. That is, if two documents share roughly the same words, they will be mapped to nearby locations in the resulting space, even if they are not related to the same subject, whereas two closely related documents that do not share the same words (e.g. documents with a high use of synonyms) would be mapped in different regions. The toy example in Figure 1 illustrates this problem.



  1. Simplified Text Understanding

Suppose we use Tr as a corpus of training data related to the class “computer science”. Then applying the usual techniques for text classification to this corpus will result in a model that can be used for classifying other documents, which will also go through the same algorithms, such as Martin Porter’s algorithm for stemming (Porter, 1980), stop word removal (remove non-descriptive words like articles, prepositions, etc), pruning infrequent words, etc. Figure 2 illustrates the kind of output that might be produced by such a system, for both the training data Tr and the data to be classified (documents D1 and D2).



  1. Simplified Text Understanding after Data Preparation

Just by inspection we can see that D1 shares only one out of its 23 words with the vocabulary within the training data Tr, which means a commonality of just 4.3%. Therefore, even though the two texts share the same subject, our word comparison algorithm would classify D1 as not being related to the class being represented by Tr, given the fact that they have few words in common.

Yet, if we do the same comparison between the training data Tr and the agriculture-related document D2, we will see that 13 out of 16 D2’s words (81.3%) are also in the training data Tr, which will cause our algorithm to incorrectly classify T3 as closely related to the class being represented by Tr.

Vector Space Representation algorithms are actually used with corpuses of training data typically containing hundreds or thousands of words, instead of our toy example’s 21 words. So misclassifications like the one in our three-text example are not very likely. Indeed, as demonstrated by Sebastiani in his recent survey on automated text categorization (Sebastiani, 2002)3, syntax-based algorithms usually achieve true positive rates between 75% and 87% in text categorization problems. Still, unlikely does not mean impossible and even low error rates can be quite undesirable, especially in domains where just a few errors may be the difference between success and failure, such as terrorist screening or Intrusion Detection systems.

In the Data Mining field, the need for considering semantic information has been recognized by many researchers. There is active research into techniques aimed to extract semantic information from the data corpus itself that are focused on external data sources such as ontologies (discussed in the next session). Examples of the first group include Latent Semantic Kernels (Cristianini et al., 2001), Probabilistic LSI (Hofmann, 1999), automatic cross-language retrieval (Littman et al., 1997), and some variations of Kernel Methods (Joachims, 1998). In the second group we will usually find studies advocating the use of the Wordnet (Miller et al., 1990; Fellbaum, 1998) as a semantic source, such Siolas e d’Alché-Buc (2000) and Hotho et al. (2002, 2003).

In short, despite the successes of syntax-only algorithms, the potential increase in discrimination power that semantic information might bring must not be ignored. In highly sensitive domains such as counter-terrorism, this increase could be the key for finding the needle in the haystack without having unacceptable false alarm rates.

The former is just one example of an application of techniques for which automated incorporation of semantics would be useful. There is a widespread understanding of the importance of semantics for many information-processing applications. The following sections review two research areas closely related to this dissertation: the Semantic Web and Ontology Engineering.

1.1.2The Semantic Web and ontologies

The W3C defines the Semantic Web4 as a collaborative effort between the W3C itself and a large number of researchers and industrial partners that will extend the current web to provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.

The current WWW uses markup languages5 such as HTML and XML (both being “semantic-unaware” languages) as a means to convey syntax rules and conventions to extract, transform and interchange data. In this scheme, humans are the sole party responsible for dealing with the knowledge implied from that data. However, given our restrictions in dealing with huge amount of data, it is becoming not only desirable but also necessary to make use of the increasing computational power of our current machines to perform such a task. The realization of this concept is the W3C’s vision of the Semantic Web as stated by Tim Berners-Lee (Berners-Lee & Fischetti, 2000, page 177):

“…computers and networks have as their job to enable the information space … But doesn’t it make sense … to put their analytical power to work making sense of the vast content … on the web? …This creates what I call a Semantic Web – a web of data that can be processed directly or indirectly by machines … The first step is putting data on the Web in a form that machines can naturally understand…”6.

We can infer from this definition how important representing the structure of data and metadata is going to be in this new approach for the distributed information use and sharing. Indeed, the W3C further states that the Semantic Web (SW) can only reach its full potential if it becomes a place where data can be shared and processed by automated tools as well as by people.

As an example of automated tools, we can consider the case in which software agents would have the ability to perform inference on the data stored in Web sites. To do so, such agents have to “understand” the semantics of the data, in contrast to only relying on its syntax. For instance, a software agent responsible for booking a trip to Florida must be able to infer when the word “Florida” actually means the Southern State of USA, the Portuguese word meaning “decorated with flowers”, a type of large bean, or the homonymous Uruguayan province.

According to the W3C (Heflin, 2004), ontologies are envisioned as the technology providing the cement for building the Semantic Web. Ontologies contain a common set of terms for describing and representing a domain in a way that allows automated tools to use the stored data in a more context-aware fashion, intelligent software agents to afford better knowledge management, and many other possibilities brought by a standardized, more intensive use of metadata.

The term Ontology was borrowed from philosophy. Its roots can be traced back to Aristotle’s metaphysical studies of the nature of being and knowing7. Nonetheless, use of the term ontology in the information systems domain is relatively new, with the first appearance occurring in 1967 (Smith, 2004, page 22).

One can find many different definitions for the concept of ontology applied to information systems, each emphasizing a specific aspect its author judged as being more important. For instance, Gruber (1993) defines an ontology as a formal specification of a conceptualization or, in other words, a declarative representation of knowledge relevant to a particular domain. Uschold and Gruninger (1996) define an ontology as a shared understanding of some domain of interest. Sowa (2000, page 492) defines an ontology as a product of a study of things that exist or may exist in some domain.

With so many possibilities for defining what an ontology is, one way of avoiding ambiguity is to focus on the objectives being sought when using it. For the purposes of the present research effort, the most important aspect of ontologies is their role as a structured form of knowledge representation. Thus, our definition of ontologies is a pragmatic one that emphasizes the purposes for which ontologies are used in the Semantic Web.

  1   2   3   4   5   6   7   8   9   ...   20

Похожие:

Bayesian semantics for the semantic web iconQueries: Enabling Querying for Semantic Associations on the Semantic Web

Bayesian semantics for the semantic web iconSemantic Web : a guide to the Future of xml, Web Services, and Knowledge Management

Bayesian semantics for the semantic web iconThe Semantic Model contains a high level description of the Actions that operate on the objects and attributes in the model. This document does not describe the mapping of the semantics onto a specific protocol or network environment

Bayesian semantics for the semantic web iconЗадача: Предоставляет пользователю унифицированную среду содержащую сервисы и функциональности, варьирующиеся от информационных сервисов и сервисов извлечения
По аналогии с Semantic Web, Semantic Grid может быть определен как расширение современных grid, где информация и сервисы имеют четкий...
Bayesian semantics for the semantic web iconThis document contains a list of references to publications and reports about Bayesian Net technology, and especially Bayesian Net applications. The report will

Bayesian semantics for the semantic web iconValentina Janev, Sanja Vranes, Semantic Web Tools and Technologies for Competence Management, Lambert Academic Publishing GmbH, isbn: 978-8454-4166-5, Sarbrucken, Germany, 2011

Bayesian semantics for the semantic web iconИнтеллектуальное реферирование: онтологический подход и его реализация в решениях Ontos
Обсуждению вопросов создания системы реферирования под управлением онтологий, разработанной в рамках решений Ontos для Semantic Web,...
Bayesian semantics for the semantic web iconWeb dizains web design
Основные компоненты web-страницы и способы их визуального представления на страницах сайта
Bayesian semantics for the semantic web iconDependency Tree Semantics and Underspecification

Bayesian semantics for the semantic web iconSemantics of contour lines' spatial relation

Разместите кнопку на своём сайте:
Библиотека


База данных защищена авторским правом ©lib.znate.ru 2014
обратиться к администрации
Библиотека
Главная страница