**Definition 1**: An *ontology* is an explicit, formal representation of knowledge about a domain of application. This includes: Types of entities that exist in the domain; Properties of those entities; Relationships among entities; Processes and events that happen with those entities; where the term entity refers to any concept (real or fictitious, concrete or abstract) that can be described and reasoned about within the domain of application.◼ Ontologies are used for the purpose of comprehensively describing knowledge about a domain in a structured and sharable way, ideally in a format that can be read and processed by a computer. The above definition can be considered as a special type of ontology, which we could label *Semantic Web Ontology*, but for the purposes of this dissertation we will use the more general term ontology. 1.2Issues on Representing and Reasoning Using Ontologies In our definition, the explicit requirement of reasoning about a given concept makes schema-oriented technologies such as XML-Schema or RDFS fall short in terms of expressiveness. For instance, a very detailed XML-Schema may include the vocabulary and the hierarchical structure of concepts within a domain of application, but still misses OWL features such as information on disjointness and uniqueness of classes, cardinality of properties^{8}, and others that are necessary to allow inferences to be drawn from those concepts. Similarly, as pointed out by Shelley Powers, using RDFS may allow the development of a very rich vocabulary, but it won’t be as precise or as comprehensive as one that incorporates ontological elements from ontology languages such as OWL (Powers, 2003, page 229). Apart from the extra expressivity that is necessary to perform reasoning with the concepts represented in an ontology, the many similarities with database schemas makes it difficult to draw a clear distinction between ontologies and database schemas. Spyns, Meersman, and Jarrar (2002) provide an interesting discussion of how the two concepts differ. They regard data models (i.e. databases, XML schemas, etc) as specifications of the structure and integrity constraints of data sets. Thus, a database schema is developed to address the specific needs and tasks for which the data set is being used, which in turn depends heavily on the enterprise being modeled. In contrast, ontologies are intended to be applied across a broad range of tasks within a domain, and usually contain a vocabulary (terms and labels), a definition of the concepts and their respective relationships within that domain. The main objective of an ontology is to provide a formal, agreed and shared resource, which forces it to be as generic and task-independent as possible. Although an ontology typically is developed to a focused task, it is desirable for an ontology to capture rich semantic content in a manner that could be reused across tasks. In developing a database schema, the goal is different. Schema developers focus on organizing information in ways that optimize support for the types of queries that are expected to arise in specific applications for which the database is being designed. Achieving such goal typically requires a special application to be written on top of the database mechanism that (for a relational database) implements the principles of relational algebra. Furthermore, a database schema is typically developed under a closed world assumption, in which the only possible instances of a given relation are those implied by the objects existing in the database. (i.e. if something is not represented there then it doesn’t exist). Ontologies, on the other hand, do not necessarily carry the assumption that not being represented entails non-existence. Not having the closed world assumption means, for example, that queries about which there is insufficient information in an ontology to be proved cannot be assumed as being false. As a consequence, we should expect situations in which incomplete information within an ontology prevents a definitive answer to a query to be rather normal. This is a clear sign that uncertainty is an intrinsic component of ontology engineering, and therefore ontology languages must include sound and principled mechanisms for dealing with uncertainty. One commonality between ontologies and database schemas is the need to provide for interoperability among systems based on different schemas and/or ontologies. In an increasingly interconnected world, the ability to exchange data as seamlessly as possible is one of the most desired features of a knowledge representation. Integrating systems created and managed by separate organizations, evolving in different scenarios, and geared to different needs and perspectives is a task that poses many challenges, even when dealing with apparently very similar structures. To illustrate their vision of how the Semantic Web will operate, Tim Berners-Lee, James Hendler, and Ora Lassila (2001) describe a scenario in which two siblings (Pete and Lucy in the example) use SW agents to help them schedule medical appointments for their mother. These agents perform tasks such as Web search, scheduling consolidation, constraint matching, and trust assessment. Presently, these kinds of tasks rely heavily on human intervention. According to the Semantic Web vision, automated Web agents will perform them. For this vision to be feasible, it is clear that all Web services involved must share the same meaning for the concepts involved in these activities. That is, each sibling’s SW agents should treat concepts such as “20-mile radius”, “ appointment time”, “ location”, “less important”, etc. the same way as they are treated by the diverse Web services they would have to interact with (e.g. the doctor’s Web agent, the credit card company web services, etc.). Unfortunately, even in tightly controlled settings (e.g. small, closed environments with controlled vocabularies), semantic inconsistencies (such as different concepts with the same name, or different names for the same concept) occur frequently. Current approaches to solve this semantic mapping problem, such as enforcing compliance with standards defined by regulatory authorities (e.g. DOD directives such as 8320.1^{9}) or employing generic matching schemes, have consistently fallen short of what is needed to realize the SW vision. Even though some ontology languages do offer constructs that help to import one ontology into another, __they lack a principled means for grading the similarity between concepts or to make plausible inferences about the mapping between them__. Providing such a means is an important step towards making the semantic mapping problem a less expensive, tedious, error-prone process. In short, the lack of a principled representation for uncertainty in the field of ontological engineering is a major weakness hindering the efforts towards better solutions for the semantic mapping problem. More generally, lack of support for uncertainty management is a serious impediment to make the Semantic Web a reality. 1.3Why Uncertainty Matters One of the main technical differences between the current World Wide Web and the Semantic Web is that while the former relies on syntactic-only protocols, the latter adds meta-data annotations as a means to convey shared, precisely defined terms or, in other words, semantic awareness to improve the interoperability among Web resources. From a syntactic standpoint, Grape as a fruit is equivalent to Grape as in John Grape. Semantically aware schemes must be able to represent and appropriately process differences such as this. This is not a trivial task. For semantic interoperability to work correctly we need shared sources of precisely defined concepts, which is exactly where ontologies play a key role. Yet, when comparing two ontologies containing the term “Grape”, deterministic reasoning algorithms will either consider it to be a fruit or an undefined object (which is not the same as a non-fruit), with no intermediate grading. This is fine when complete information is available, which is frequently the case under the closed world assumption but much less common in the open world environment, where incomplete information is the rule. In the open world case, purely logical systems may represent phenomena such as exceptions and unknown states with generic labels such as “other”, but will lose the ability to draw strong conclusions. In probabilistic systems such phenomena would carry a probabilistic qualifier, which allows valid conclusions to be drawn and also adds more flexibility to the model. There are important issues regarding open-world probabilistic reasoning that have not yet been completely addressed (c.f. Laskey & Lehner, 1994), but probabilistic systems are a promising approach for reasoning in open world. Despite these shortcomings of logic-based systems, the current development of the future Semantic Web (which will support automated reasoning in most of its activities) is based on classical logic. For example, OWL, a W3C recommendation (Patel-Schneider* et al.*, 2004), has no built-in support for probabilistic information and reasoning, a major shortcoming for a technology that is expected to operate in a complex, open world environment. As we will see in the next chapters, OWL has its roots in its own web language predecessors (i.e. XML, RDF), and in traditional knowledge representation formalisms that have historically not considered uncertainty. Examples of these formalisms include Frame systems (Minsky, 1975), and Description Logics, which evolved from the so-called “Structured Inheritance Networks” (Brachman, 1977). This historical background somewhat explains the lack of support for uncertainty in OWL, a serious limitation for a language expected to support applications in an environment where one cannot simply ignore incomplete information. As an example of a similar situation in which a knowledge based system had to evolve in order to cope with incomplete information we can refer to the Stanford University's MYCIN^{10} project (Shortliffe* et al.*, 1975) in the medical domain. MYCIN evolved from DENTRAL^{11}, which was deterministic, but according to Buchanan and Shortliffe (1984, page 209): “As we began developing the first few rules for MYCIN, it became clear that the rules we were obtaining from our collaborating experts ... the inferences described were often uncertain”. When faced with this problem of expressing this uncertainty, the initial approach adopted by most decision-making systems’ developers was the subjectivist approach of probability theory (Adams, 1976). Yet, probability theory was then considered intractable so other methods were used, such as Certainty Factors (Buchanan & Shortliffe, 1984) and Dempster-Shafer’s belief functions (Dempster, 1967; Shafer, 1976). These initial approaches were superseded by the development of graphical probability models. The key innovation of graphical models is the ability to express knowledge about uncertain propositions using modular components, each involving a small number of elements that can be composed into complex models for reasoning about many interrelated propositions. This ability to express knowledge as modular, local units provides major improvements in tractability, and also makes the knowledge engineering task feasible. The graphical model formalism has been extended to other calculi such as Dempster-Shafer’s belief functions, fuzzy logic, and qualitative probability. Shenoy & Demirer (2001) provide a unified graphical formalism that covers many different uncertainty calculi. Graphical probability models have made probability tractable, thus addressing the initial concerns of many researchers. Now, many medical systems use probability (Heckerman* et al.*, 1995b; Helsper & van der Gaag, 2001; Lucas* et al.*, 2001). The evolution from deterministic reasoning to probabilistic reasoning has enabled information systems to make use of uncertain, incomplete information. This seems to be a promising path for the Semantic Web, which will inevitably confront the same uncertainty-related concerns faced by the AI field. 1.4Research Contributions and Structure of this Dissertation Although our research is focused in the Semantic Web, we are tackling a problem that precedes even the current WWW: the quest for more efficient data exchange. Clearly, solving that problem requires more precise semantics and flexible ways to convey information. While the WWW provided a new presentation medium and technologies such as XML presented new data exchange formats, both failed to address the semantics of data being exchanged. The SW is meant to fill this gap, and the realization of its goals will require major improvements in technologies for data exchange. Unfortunately, for historical reasons and due to the lack of expressivity of probabilistic representations in the past, current ontology languages have no built-in support for representing or reasoning with uncertain, incomplete information. In the uncertainty-laden environment in which the SW will operate, this is a major shortcoming preventing realization of the SW vision. Indeed, in almost any domain represented in the SW there will exist a vast body of knowledge that would be completely ignored (neither represented nor reasoned upon) due to the SW language’s inability to deal with it. As a means of addressing this problem, the long-term goal of our research is to establish a Bayesian framework for probabilistic ontologies, which will provide a basis for plausible reasoning services in the Semantic Web. Clearly, the level of acceptance and standardization required for achieving this objective requires a broader effort led by the W3C, probably resulting in a W3C Recommendation formally extending the OWL language. Thus, the present dissertation should be seen as an initial effort towards that broader objective. In the next Chapter, we provide a brief introduction to Web languages and probabilistic representations in general. Then, we change our focus to a brief coverage on the attempts to find a common ground between the SW and probabilistic representations, which also includes a view on the trend towards more expressive forms of the latter. Chapter Three provides the necessary background on Multi-Entity Bayesian Networks (MEBN), the probabilistic first-order logic that is the mathematical backbone of PR-OWL. As a means to provide a smooth introduction to the fairly complex concepts of MEBN logic, we needed to explore a domain of knowledge that would be both easily understood and politically neutral, while still rich enough to include scenarios that would demand a highly expressive language. Thus, we constructed a running case study based on the Star Trek^{12} television series. Our explanations and examples assume no previous familiarity with the particulars of the Star Trek series. We start Chapter Four with our definition of a probabilistic ontology, a key concept in our research. Then, we cover solutions to two major issues preventing the construction of probabilistic ontologies. Because MEBN was built with flexibility in mind, it has little standardization or support for many of the advanced features of OWL. This was a major obstacle for developing a MEBN-based extension to OWL, and we addressed it by developing an extended version of MEBN logic. Our version, which we explain in the first section of the Chapter, incorporates typing, polymorphism and other features that are desirable for an ontology language. In the second and last section we addressed the lack of a probabilistic reasoner that implements all the advanced features found in MEBN logic. In that section, we explain how we used Quiddity*Suite, a powerful probabilistic toolkit developed at Information Extraction and Transport (IET), as a MEBN logic implementation and, consequently, showed its potential to be probabilistic reasoner for Semantic Web applications. More detailed aspects are conveyed in Appendix A. In Chapter Five, we built upon the results of Chapter Four and present our results in developing PR-OWL. There, our probabilistic extension to OWL was defined as an upper probabilistic ontology, which we documented in Appendix B. We also presented an operational concept on how we foresee the use of our framework and a proposed strategy for implementing probabilistic ontologies for the Semantic Web. Finally, in Chapter Six we convey a summary of this dissertation’s results and present in Appendix C some possible uses of the technology proposed here for solving problems in areas outside the Semantic Web research, such as the semantic mapping and the multi-sensor data fusion problems. Taken together, the contributions brought by this research constitute an initial step for solving the current inability of SW languages to represent uncertainty and reason under it in a principled way. Furthermore, as we suggested in the beginning of this section, these contributions also have the potential to greatly improve the efficiency with which data is exchanged, thus implying their applicability to a broader set of problems beyond the Semantic Web. Background and Related Research 1.5Web Languages Information in the World Wide Web is encoded via markup languages, which use tags (markups) to embed metadata into a document. The concept of markup languages^{13} was initially implemented by IBM in 1969 with the development of the Generalized Markup Language (Goldfarb, 1996), which gained in popularity throughout the seventies. Then, the growing demand for a more powerful standard led to the development of the Standard Generalized Markup Language (SGML), which was adopted as an ISO standard in 1986 (ISO:8879). SGML was a powerful language but also a very complex one, which hindered its use in popular applications. The breakthrough that sparked the popularization of markup languages was the creation of the Hypertext Markup Language (HTML) in 1989 by Tim Berners-Lee and Robert Caillau (Connoly* et al.*, 1997). HTML is a very simple subset of SGML that is focused on the presentation of documents. It rapidly became the standard language for the World Wide Web. Yet, as the WWW became ubiquitous, the limitations of HTML became apparent, the major one being its inability to deal with data interchange due to its limited support for metadata. Even though the W3C launched new HTML versions, these were not aimed to provide support to data exchange, since HTML was not originally designed for data interchange^{14}. Even though the WWW original intent had a focus on documents, HTML’s inaptitude for data interchange became a major shortcoming at the same pace the WWW became an ideal medium for data interchange. The answer for the HTML limitations was the development of the Extensible Markup Language (XML), which is much simpler than SGML but still capable of expressing information about the contents of a document and of supporting user-defined markups. XML became a W3C recommendation in 1998. In addition to its use for data packaging (e.g. the .plist files in Mac OS X and many configuration files in Windows XP), it has become the acknowledged standard for data interchange. With the establishment of the Semantic Web road map by the W3C in 1998, it became clear that more expressive markup languages were needed. As a result, the first Model Syntax Specification for the Resource Description Framework (RDF) was released in 1999 as a W3C recommendation. Unlike the data-centric focus of XML, RDF is intended to represent information and to exchange knowledge. Accounts of the differences between RDF and XML are widely available on the WWW (e.g. Gil & Ratnakar, 2004). In addition to a knowledge representation language, the Semantic Web effort also needed an ontology language to support advanced Web search, software agents, and knowledge management. The latest step towards fulfilling that requirement was the release of OWL as a W3C recommendation in 2004. OWL superseded DAML+OIL (Horrocks, 2002), a language that merged the two ontology languages being developed in the US (DAML) and Europe (OIL)^{ 15}. According to Hendler (2004), earlier languages have been used to develop tools and ontologies for specific user communities, and therefore were not defined to be compatible with the architecture of the World Wide Web in general, and the Semantic Web in particular. In contrast, OWL uses the RDF framework to provide a more general, interoperable approach by making ontologies compatible with web standards, scalable to web needs, and with the ability to be distributed across many systems. The interested reader will find information on OWL at the W3C OWL website (Miller & Hendler, 2004). Yet, as we stated before, OWL suffers from the limitations of deterministic languages and thus lacks the advantages of probabilistic reasoning. 1.6A Brief Introduction to Probabilistic Representations Schum described probability as a subject that has “a very long past but a very short history” (Schum, 1994, page 35). An abstract notion of probability may be traced back at least to Paleolithic times, in the sense that early cultures are known to have used artifacts for gambling or forecasting the future. In contrast, he adds, the first scientific works on what we now call probability theory have a more recent history, dating back to “only” 400 years ago in the pioneer writings of mathematicians Blaise Pascal (1623-1662) and Pierre de Fermat (1608-1665). It was only in the 20^{th} century that the major formal axiom systems for probability were developed (e.g. Cox, 1946; Kolmogorov, 1960/1933). Four hundred years of scientific research and the broad acceptance of a formal axiom system have not brought a common agreement on the philosophical foundations of probability theory. Instead, many different interpretations have arisen during this time, and none has succeeded in putting an end to the discussion about what probability really is. The interested reader will find an excellent account of the historical development of the competing theories in Hacking (1975), while valuable comparative studies can be found in the works of Fine (1973), Weatherford (1982), and Cohen (1989). The *classical* approach regards probability as the ratio of favorable cases to total, equipossible cases (Laplace, 1996/1826; Ball, 2003/1908). The *logical* approach regards probability as a logical relation between statements of evidence and hypothesis (Carnap, 1950; Keynes, 2004/1921). The *frequentist* view regards probability as the limiting frequency of successful outcomes in a long sequence of trials (von Mises, 1981/1928). The *propensity* view (Popper, 1957, 1959; Hacking, 1965; Lewis, 1980) regards probability as a physical tendency for certain events to occur. Finally, the *subjectivist* school understands probability as the degree of belief of an ideal rational agent about hypotheses for which the truth-value is unknown (Ramsey, 1931; Savage, 1972/1954; de Finetti, 1974). Despite the differences in philosophical interpretation, the mathematics is common to all approaches. This work is related to the task of representing uncertain, incomplete knowledge that can come from diverse agents. For this reason, we adopt the subjectivist view of probability. We have chosen subjective probability as our representation for uncertainty because of its status as a mathematically sound representation language and formal calculus for rational degrees of belief, and because it gives different agents the freedom to have different beliefs about a given hypothesis. Although the interpretation taken in this dissertation is subjectivist, the methodology presented here is consistent with other interpretations of probability. For example, some might prefer a frequency or a propensity interpretation for probabilities that arise from processes considered to be intrinsically random. Such individuals would naturally build probabilistic ontologies only for processes they regard as intrinsically random. Others might prefer a logical interpretation of a probabilistic domain theory. In the end, the above-mentioned discussion of what probability “really is” may be better framed as an argument over what kind of applications would render justifiable the use of a probabilistic axiom system and its underlying mathematics. Many different axiomatic formulations have been proposed that give rise to subjectivist probability as a representation for rational degrees of belief. Examples include the axiom systems of Ramsey (1931), Kolmogorov (1960/1933), Cox (1946), Savage (1972/1954), and De Finetti (de Finetti, 1990/1954). As an illustration, the following axiom system is due to Watson & Buede (1987): For any two uncertain events, **A** is more likely than **B**, or **B** is more likely than **A**, or they are equally likely. If **A**_{1}_{ }and **A**_{2} are any two mutually exclusive events, and **B**_{1} and **B**_{2} are any other mutually exclusive events; and if **A**_{1} is not more likely than **B**_{1}, and **A**_{2} is not more likely than **B**_{2}; then (**A**_{1} and **A**_{2}) is not more likely than (**B**_{1} and **B**_{2}). Further, if either **A**_{1} is less likely than **B**_{1} or **A**_{2} is less likely than **B**_{2}, then (**A**_{1} and **A**_{2}) is less likely than (**B**_{1} and **B**_{2}). A possible event cannot be less likely than an impossible event. Suppose **A**_{1}, **A**_{2}, … is an infinite decreasing sequence of events; that is , if **A**_{i}_{ }occurs, then **A**_{i-1}_{ }occurs, for any **i**. Suppose further that **A**_{i} is not less likely than some other event **B**, again for any **i**. Then the occurrence of all the infinite set of events **A**_{i}, **I** = 1,2,…, , is not less likely than **B**. There is an experiment, with a numerical outcome, such that each possible value of that outcome, in a given range, is equally likely. All the properties of the probabilistic system used by Bayesian Networks, Influence Diagrams, and MEBN, can be derived from those axioms. Among those, two transformations are crucial for the notion of probabilistic inference: the *Law of Total Probability* and the *Bayes Rule*. The Law of Total Probability, also known as multiplicative law (Page, 1988, page 17), gives the marginal probability distribution of a subset of random variables from joint distribution on a superset by summing over all possible values of the random variables not contained in the subset. Figure 3 illustrates the concept. Bayes rule provides a method of updating the probability of a random variable when information is acquired about a related random variable. The standard format of Bayes rule is:
**P(B)** is called prior probability of **B**, as it reflects our belief in event **B** before obtaining information on event **A**. Likewise, **P(B|A)** is the posterior probability of **B**, and represents our new belief on event **B** after applying Bayes rule with the information collected from event **A**. Law of Total Probability Bayes rule provides the formal basis for the active and rapidly evolving field of Bayesian probability and statistics. In the Bayesian view, inference is a problem of belief dynamics. Bayes rule provides a principled methodology for belief change in the light of new information. Good introductory material on Bayesian Statistics can be found in works of Press (1989), Lee (2004), and Gelman (2003), while a more philosophically oriented reader will be also interested in the collection of essays on foundational studies in Bayesian decision theory and statistics by Kadane *et al.* (1999). The above concepts provide the formal mathematical basis for the most widely used Bayesian Inference technique today: Bayesian Networks 1.7Bayesian Networks Bayesian networks provide a means of parsimoniously expressing joint probability distributions over many interrelated hypotheses. A Bayesian network consists of a directed acyclic graph (DAG) and a set of local distributions. Each node in the graph represents a random variable. A random variable denotes an attribute, feature, or hypothesis about which we may be uncertain. Each random variable has a set of mutually exclusive and collectively exhaustive possible values. That is, exactly one of the possible values is or will be the actual value, and we are uncertain about which one it is. The graph represents direct qualitative dependence relationships; the local distributions represent quantitative information about the strength of those dependencies. The graph and the local distributions together represent a joint distribution over the random variables denoted by the nodes of the graph. Bayesian networks have been successfully applied to create consistent probabilistic representations of uncertain knowledge in diverse fields such as medical diagnosis (Spiegelhalter* et al.*, 1989), image recognition (Booker & Hota, 1986), language understanding (Charniak & Goldman, 1989a, 1989b), search algorithms (Hansson & Mayer, 1989), and many others. Heckerman *et. al.* (1995b) provides a detailed list of recent applications of Bayesian Networks. One of the most important features of Bayesian networks is the fact that they provide an elegant mathematical structure for modeling complicated relationships among random variables while keeping a relatively simple visualization of these relationships. Figure 4 gives three simple examples of qualitatively different probability relationships among three random variables.
Sample Relationships Among Three Random Variables As a means for realizing the communication power of this representation, one could compare two hypothetical scenarios in which a domain expert with little background in probability tries to interpret what is represented in Figure 4. Initially, suppose that she is allowed to look only to the written equations below the pictures. In this case, we believe that she will have to think at least twice before making any conclusion on the relationships among events **A**, **B**, and **C**. On the other hand, if she is allowed to look only to the pictures, it seems fair to say that she will immediately perceive that in the leftmost picture, for example, event **B** is independent of events **A** and **C**, and event **C** depends on event **A**. Also, simply comparing the pictures would allow her to see that, in the center picture, **A** is now dependent on **B**, and that in the rightmost picture **B** influences both **A** and **C**. Advantages of easily interpretable graphical representation become more apparent as the number of hypothesis and the complexity of the problem increases. One of the most powerful characteristics of Bayesian Networks is its ability to update the beliefs of each random variable via bi-directional propagation of new information through the whole structure. This was initially achieved by an algorithm proposed by Pearl (1988) that fuses and propagates the impact of new evidence providing each node with a belief vector consistent with the axioms of probability theory. Pearl’s algorithm performs exact Bayesian updating, but only for singly connected networks. Subsequently, general Bayesian updating algorithms have been developed. One of the most commonly applied is the Junction Tree algorithm (Lauritzen & Spiegelhalter, 1988). Neapolitan (2003) provides a discussion on many Bayesian propagation algorithms. Although Cooper (1987) showed that exact belief propagation in Bayesian Networks can be NP-Hard, exact computation is practical for many problems of practical interest. Some complex applications are too challenging for exact inference, and require approximate solutions (Dagum & Luby, 1993). Many computationally efficient inference algorithms have been developed, such as probabilistic logic sampling (Henrion, 1988), likelihood weighting (Fung & Chang, 1989; Shachter & Peot, 1990), backward sampling (Fung & del Favero, 1994), Adaptive Importance Sampling (Cheng & Druzdzel, 2000), and Approximate Posterior Importance Sampling (Druzdzel & Yuan, 2003). Those algorithms allow the impact of evidence about one node to propagate to other nodes in multiply-connected trees, making Bayesian Networks a reliable engine for probabilistic inference. The prospective reader will find comprehensive coverage of Bayesian Networks in a large and growing literature on this subject, such as Pearl (1988), Neapolitan (1990, 2003), Oliver & Smith (1990), Charniak (1991), Jensen (1996, 2001), or Korb & Nicholson (2003). 1.7.1Probabilistic Reasoning with Bayesian Networks Bayesian Networks have received praise for being a powerful tool for performing probabilistic inference, but they do have some limitations that impede their application to complex problems. As Bayesian networks grew in popularity, their limitations became increasingly apparent. Although a powerful tool, BNs are not expressive enough for many real-world applications. More specifically, Bayesian Networks assume a simple attribute-value representation – that is, each problem instance involves reasoning about the same fixed number of attributes, with only the evidence values changing from problem instance to problem instance. This type of representation is inadequate for many problems of practical importance. Many domains require reasoning about varying numbers of related entities of different types, where the numbers, types and relationships among entities usually cannot be specified in advance and may have uncertainty in their own definitions. As will be demonstrated below, Bayesian networks are insufficiently expressive for such problems. 1.7.2Case Study: The Star Trek Scenario Choosing a particular real-life domain would pose the risk of getting bogged down in domain-specific detail. For this reason, we opted to construct a case study based on the popular television series *Star Trek*. Nonetheless, the examples presented here have been constructed to be accessible to anyone having some familiarity with space-based science fiction. We begin our exposition narrating a highly simplified problem of detecting enemy starships. In this simplified problem, the main task of a decision system is to model the problem of detecting Romulan starships (here considered as hostile by the United Federation of Planets) and assessing the level of danger they bring to our own starship, the Enterprise. All other starships are considered either friendly or neutral. Starship detection is performed by the Enterprise’s suite of sensors, which can correctly detect and discriminate starships with an accuracy of 95%. However, Romulan starships may be in “cloak mode,” which makes them invisible to the Enterprise’s sensors. Even for the most current sensor technology, the only hint of a nearby starship in cloak mode is a slight magnetic disturbance caused by the enormous amount of energy required for cloaking. The Enterprise has a magnetic disturbance sensor, but it is very hard to distinguish background magnetic disturbance from that generated by a nearby starship in cloak mode. This simplified situation is modeled by the BN in Figure 5^{16}, which also considers the characteristics of the zone of space where the action takes place. Each node in our BN has a finite number of mutually exclusive, collectively exhaustive states. The node Zone Nature (ZN) is a root node, and its prior probability distribution can be read directly from Figure 5 (e.g. 80% for deep space). The probability distribution for Magnetic Disturbance Report (MDR) depends on the values of its parents ZN and Cloak Mode (CM). The strength of this influence is quantified via the conditional probability table (CPT) for node MDR, shown in Table 1. Similarly, Operator Species (OS) depends on ZN, and the two report nodes depend on CM and the hypothesis on which they are reporting. The Naïve Star Trek Bayesian Network Graphical models provide a powerful modeling framework and have been applied to many real world problems involving uncertainty. Yet, the model depicted above is of little use in a “real life” starship environment. After all, hostile starships cannot be expected to approach Enterprise one at a time so as to render its simple BN model usable. If four starships were closing in on the Enterprise, the BN of Figure 5 would have to be replaced by the one shown in Figure 6. Conditional Probability Table for Node MDR
**Zone Nature** |
**Cloak Mode** |
**Magnetic Disturb. Rep.** | Low | Medium | High | Deep Space | True | 80.0 | 13.0 | 7.0 | False | 85.0 | 10.0 | 5.0 | Planetary Systems | True | 20.0 | 32.0 | 48.0 | False | 25.0 | 30.0 | 45.0 | Black Hole Boundary | True | 5.0 | 10.0 | 85.0 | False | 6.9 | 10.6 | 82.5 | |