Unfortunately, building a BN for each possible number of nearby starships is not only a daunting task but also a pointless one, since there is no way of knowing in advance how many starships the Enterprise is going to encounter and thus which BN to use at any given time. In short, BNs lack the expressive power to represent entity types (e.g., starships) that can be instantiated as many times as required for the situation at hand. The BN to the Four-Starship Case In spite of its naiveté, we will briefly hold on to the premise that only one starship can be approaching the Enterprise at a time, so that the model of Figure 5 is valid. Furthermore, we will assume that the Enterprise is traveling in deep space, and its sensor reports imply that there is no trace of any nearby starship (i.e. the state of node SR state is *Nothing*). Further, there’s a newly arrived report indicating a strong magnetic disturbance (i.e. the state of node MDR is *High*). Table 1 shows that the likelihood ratio for a high MDR is 7/5 = 1.4 in favor of a starship in cloak mode. Although this favors a cloaked starship in the vicinity, the evidence is not overwhelming. Repetition is a powerful way to boost the discriminatory power of weak signals. As an example from airport terminal radars, a single pulse reflected from an aircraft usually arrives back to the radar receiver very weakened, making it hard to set apart from background noise. However, a steady sequence of reflected radar pulses is easily distinguishable from background noise. Following the same logic, it is reasonable to assume that an abnormal background disturbance will show random fluctuation, whereas a disturbance caused by a starship in cloak mode would show a characteristic temporal pattern. Thus, when there is a cloaked starship nearby, the MDR state at any time depends on its previous state. A BN similar to the one in Figure 7 could capitalize on this for pattern recognition purposes. The BN for One-Starship Case with Recursion Dynamic Bayesian Networks (DBNs) allow nodes to be repeated over time (Murphy, 1998). The model of Figure 7 has both static and dynamic nodes, and thus is a *partially* dynamic Bayesian network (PDBN), also known as a temporal Bayesian network (Takikawa* et al.*, 2002). While DBNs and PDBNs are useful for temporal recursion, a more general recursion capability is needed, as well as a parsimonious syntax for expressing recursive relationships. This section has provided just a glimpse of the issues that confront an engineer attempting to apply Bayesian networks to realistically complex problems. We did not provide a comprehensive analysis of the limitations of Bayesian networks for solving complex problems, since this brief overview is enough for making the point that even relatively simple situations might require more expressiveness than BNs can provide. A much more powerful representational formalism is offered by first-order logic (FOL), which has the ability to represent entities of different types interacting with each other in varied ways. Sowa states that first-order logic “has enough expressive power to define all of mathematics, every digital computer that has ever been built, and the semantics of every version of logic, including itself” (Sowa, 2000, page 41). For this reason, FOL has become the *de facto* standard for logical systems from both a theoretical and practical standpoint. However, systems based on classical first-order logic lack a theoretically principled, widely accepted, logically coherent methodology for reasoning under uncertainty. As a result, a number of languages have appeared that extend the expressiveness of standard BNs in various ways. Two different streams of research on combining logic with probability are covered in the following sections. 1.8Probabilistic Extensions to Web Languages 1.8.1Probabilistic extensions to Description Logic Most of the probabilistic extensions aimed at the ontology engineering domain are based on Description Logic (DL), which Baader and Nutt (2001, page 47) define as a family of knowledge representation formalisms that represent the knowledge of an application domain (the “world”) by first defining the relevant concepts of the domain (its terminology), and then using these concepts to specify properties of objects and individuals occurring in the domain (the world description). Description Logic divides a knowledge base into two components: a terminological box, or T-Box, and the assertional box, or A-Box. The first introduces the terminology (i.e. the vocabulary) of an application domain, while the latter contains assertions about instances of the concepts defined in the T-Box. Description Logic is a subset of FOL that provides a very good combination of decidability and expressiveness, and is the basis of OWL-DL. One of its extensions is Probabilistic Description Logic (Heinsohn, 1994; Jaeger, 1994), which extends the description logic ALC, a member of the AL-languages (Schmidt-Schauß & Smolka, 1991) that is obtained by including the full existential quantification and the union constructors to the basic AL (attributive language). Another description logic language with a probabilistic extension is SHOQ(D) (Horrocks & Sattler, 2001). SHOQ(D) is the basis of DAML+OIL (Horrocks, 2002), the language that came from merging two ontology languages being developed in the US (DAML) and Europe (OIL) and has been superseded by OWL. Its probabilistic extension is called P-SHOQ (Giugno & Lukasiewicz, 2002), and is able to represent probabilistic information about concept and role instances (i.e. A-Box). P-Classic (Koller* et al.*, 1997), another example of a probabilistic extension to DL, uses Bayesian inference mechanisms for extending the description logic CLASSIC. In short, each probabilistic component is associated with a set P of p-classes, and each p-class C in set P is represented using a Bayesian network. A common characteristic of the above approaches is that they extend description logic, which is a decidable subset of first-order logic (FOL). Description logics are highly effective and efficient for the classification and assumption problems they were designed to address. However, their ability to represent and reason about other commonly occurring kinds of knowledge is limited. One restrictive aspect of DL languages is their limited ability to represent constraints on the instances that can participate in a relationship. As an example, suppose we want to express that for a starship to be a threat to another starship in a specific type of situation it is mandatory that the two individuals of class starship involved in the situation are not the same. Making sure the two starships are different in a specific situation is only possible in DL if we actually create/specify the tangible individuals involved in that situation. Indeed, stating that two “fillers” (i.e. the actual individuals of class Starship that will “fill the spaces” of concept starship in our statement) are not equal without specifying their respective values would require constructs such as *negation* and *equality role-value-maps*, which cannot be expressed in description logic. While equality role-value-maps provides additional useful means to specify structural properties of concepts, their inclusion makes the logic undecidable (Calvanese & De Giacomo, page 223). 1.8.2Probabilistic Extensions to OWL The ontology language OWL is a W3C recommendation and has been receiving a great deal of attention, as the intended basis for the Semantic Web. Interestingly enough, there are relatively few research efforts aimed at extending OWL to represent uncertainty. Among those is the research being done by Zhongli Ding and Yung Peng (2004) at the University of Maryland. Their main objective is to translate a Bayesian Network model to an OWL ontology. The approach involves augmenting OWL semantics to allow probabilistic information to be represented via additional markups. The result would be a probabilistic annotated ontology that could then be translated to a Bayesian network. Such a translation would be based on a set of translation rules that would rely on the probabilistic information attached to individual concepts and properties within the annotated ontology. The authors note that after successfully achieving the translation, the resulting Bayesian network will be associated with a joint probability distribution over the application domain. The authors acknowledge that a full translation of an ontology to a standard BN is impossible given the limitations of the latter in terms of expressivity. This corroborates the comments made earlier in Section 2.3 on the limited expressiveness of Bayesians networks being a major shortcoming for the technology to be used in more complex problems. Indeed, we have already seen that ontologies provide an explicit formal specification for how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships among them, which implies the need for a highly expressive language to capture all the relevant information of a given domain. OWL, and other ontology languages as well, rely in variations of FOL, so only a probabilistic FOL would be able to capture all the information included in an OWL ontology. Also focusing on Bayesian extensions geared towards the Semantic Web is the work of Gu *et al*. (2004), which has a very similar approach to Ding’s. Another effort in this direction is the set of RDF extensions being developed by Yoshio Fukushige (2004). In both cases, the representational limitations of Bayesian Networks limit the ability to express more complex probabilistic models, constraining their solutions to very specific classes of problems. It is primarily in those aspects that this work differs from the above approaches Even though we share the idea of extending Web languages to accept probabilistic information, the main objective of this research is to show a feasible solution to the more general problem of the lack of probabilistic support for applications currently being developed for the Semantic Web. At this point, it should be clear that such an objective would only be possible with the use of a powerful probabilistic language that does not have the representational limitations of Bayesian Networks. One such approach is the work of Mike Pool at IET in developing an OWL-based implementation of Quiddity*Suite (Pool & Aikin, 2004), which is primarily being used in the IET’s KEEPER project (Pool, 2004). As much as our own work, Mike’s extensions provide a very expressive method for representing uncertainty in OWL ontologies, while our approaches diverge in two aspects. First, we are focused on the more general problem of enabling probabilistic ontologies for the SW, whereas Mike’s work is mostly geared towards the use of Quiddity*Suite to represent OWL ontologies in projects such as KEEPER. The second major difference is that while we use MEBN logic as the underlying semantics of our work, Mike relies on Quiddity*Suite’s syntax and semantics to provide the logical framework for his extensions. Given our common use of Quiddity*Suite as a reasoner and shared interests in developing coherent forms of representing uncertainty in ontologies we have not only been mutually aware of our progresses but also had a great level of collaboration during the research period of this Dissertation. 1.9Probabilistic Languages with near First-Order Expressive Power In recent years, a number of languages have appeared that extend the expressiveness of standard directed and undirected graphical models in various ways. These languages include Hidden Markov models (Baum & Petrie, 1966; Rabiner, 1989; Elliott* et al.*, 1995) which have been largely used in pattern recognition applications. HMMs can be viewed as a special case of dynamic Bayesian networks, or DBNs (Murphy, 1998). A HMM is a DBN having hidden states with no internal structure that *d*-separate observations at different time steps. Partially dynamic Bayesian networks, also called temporal Bayesian networks (Takikawa et al., 2002) extend DBNs to include static variables. These formalisms augment standard Bayesian networks with a capability for temporal recursion. BUGS (Buntine, 1994a; Gilks* et al.*, 1994; Spiegelhalter* et al.*, 1996) is a software package that implements the Plates language. Plates represent repeated fragments of directed or undirected graphical models. Visually, a plate is represented as a rectangle enclosing a set of repeated nodes. Strengths of plates are the ability to handle continuous distributions without resorting to discretization, and support for parameter learning in a wide variety of parameterized statistical models. The main weakness is the lack of a direct, explicit way to represent uncertainty about model structure. Probabilistic efforts towards more expressive languages also involve undirected graph models, which are usually applied to problems where there is not a natural direction for probabilistic influences, such as image processing and terrain reasoning. One example is pattern theory (Grenander, 1995), which focuses on creating mathematical knowledge representations of complex systems, analyzing the mathematical properties or the resulting regular structures, and applying them to practically occurring patterns found in the real world. Patterns are expressed through their typical behavior as well as through their variability around their typical form, and algorithms are derived for the understanding, recognition, and restoration of the observed patterns. The theory employs undirected graphs as a representational tool. Object-Oriented Bayesian Networks (Koller & Pfeffer, 1997; Bangsø & Wuillemin, 2000; Langseth & Nielsen, 2003) represent entities as instances of object classes with class-specific attributes and probability distributions. Probabilistic Relational Models (PRM) (Pfeffer* et al.*, 1999; Getoor* et al.*, 2000; Getoor* et al.*, 2001; Pfeffer, 2001) integrate the relational data model (Codd, 1970) and Bayesian networks. PRMs extend standard Bayesian Networks to handle multiple entity types and relationships among them, providing a consistent representation for probabilities over a relational database. PRMs cannot express arbitrary quantified first-order sentences and do not support recursion. Although PRMs augmented with DBNs can support limited forms of recursion, they still do not support general recursive definitions. Jaeger (1997) extends relational probabilistic models to allow recursion, but it is limited to finitely many random variables. DAPER (Heckerman* et al.*, 2004) combines the entity-relational model with DAG models to express probabilistic knowledge about structured entities and their relationships. Any model constructed in Plates or PRM can be represented by DAPER. Thus, DAPER is a unifying language for expressing relational probabilistic knowledge. DAPER expresses probabilistic models over finite databases, and cannot represent arbitrary FOPC expressions involving quantifiers. Therefore, like other languages discussed above, DAPER does not achieve full FOPC representational power. Most of the abovementioned work is still under development, and has provided undeniable improvements in the flexibility and expressiveness of probabilistic representation. Multi-Entity Bayesian Networks, which we will cover in the next chapter, is another formal system that combines FOL and probability theory. Among the major reasons that led us to have adopted MEBN as the formal basis for PR-OWL are its ability to express joint distributions over models of arbitrary finitely axiomatizable first order theories and to add new axioms by Bayesian conditioning.
Multi-Entity Bayesian Networks Multi-Entity Bayesian Networks integrate first order logic with Bayesian probability. MEBN logic expresses probabilistic knowledge as a collection of MEBN fragments (MFrags) organized into MEBN Theories (MTheories). An MFrag represents a conditional probability distribution of the instances of its resident random variables given the values of instances of their parents in the Fragment graphs and given the context constraints. A collection of MFrags represents a joint probability distribution over an unbounded, possibly infinite number of instances of its random variables. The joint distribution is specified by means of the local distributions together with the conditional independence relationships implied by the fragment graphs. Context terms are used to specify constraints under which the local distributions apply. A collection of MFrags that satisfies consistency constraints ensuring the existence of a unique joint probability distribution over its random variables is called an MTheory. MTheories can express probability distributions over truth values of arbitrary First Order Logic sequences and can be used to express domain-specific ontologies that capture statistical regularities in a particular domain of application. In addition, MTheories can represent particular facts relevant to a given reasoning problem. Conditioning a prior distribution represented by an MTheory on its findings is the basis of probabilistic inference with MEBN logic. Support for decision constructs in MEBN is provided via Multi-Entity Decision Graphs (MEDG), which are related to MEBN the same way influence diagrams are related to Bayesian Networks. An MEDG can be applied in any application that requires optimizing a set of alternatives (i.e. an MEDG policy) over the given constraints of a specific situation. MEBN logic also provides means of learning the structure of a MEBN Theory on the basis of data (i.e. Bayesian learning), while parameter learning can be expressed as inference in MEBN theories that contain parameter random variables. In short, MEBN logic has the potential to serve as the mathematical backbone for future Semantic Web applications that can deal with plausible reasoning. Yet, as we will explain later in this work, some issues such as the lack of built-in support for typing and polymorphism had to be addressed before making use of the logic’s strengths. In order to understand those issues and to achieve a thorough understanding of MEBN primitives, this Chapter uses the Star Trek model presented in Section 2.3 as the background to explain the principles of the logic and its representational power as well. 1.10A More “Realistic” Sci-fi Scenario The limited model of the previous section would be of little use in increasing the Captain’s awareness of the level of danger faced by the *Enterprise*. In addition to the model’s naïve assumptions, there were clear omissions such as the assessment of the threat posed by a given starship, its ability and willingness to attack our own vessel, etc. These and other pertinent issues are addressed in the context of a richer scenario for which the power of MEBN is required. Like present-day Earth, 24^{th} Century outer space is not a politically trivial environment. It is clear that the previous model failed to consider the many different alien species with diverse profiles that populate the Universe as portrayed in the television series. Although MEBN logic can represent the full range of species inhabiting the Universe in the 24^{th} century, for purposes of this work only a few sample groups will suffice. Therefore, in the following pages the explicitly modeled species will be restricted to Friends^{17}, Cardassians, Romulans, and Klingons, while addressing encounters with other possible races using the general label *Unknown*. Cardassians are constantly at war with the Federation, so any encounter with them is considered a hostile event. Fortunately, they do not possess cloaking technology, which makes it easier to detect and discriminate them. Romulans possess cloaking technology and are more ambiguous, behaving in a hostile manner in roughly half their encounters with Federation starships. Klingons, who also possess cloaking technology, have a peace agreement with the Federation of Planets, but their treacherous and aggressive behavior makes them less reliable than friends. Finally, when facing an unknown species, the historical log of such events shows that out of every ten new encounters, only one was hostile. Apart from the species of its operators, a truly “realistic” model would consider each starship’s type, offensive power, the ability to inflict harm to the *Enterprise* given its range, and numerous other features pertinent to the model’s purpose. 1.11The Basics of MFrags MEBN logic represents the world as comprised of entities that have attributes and are related to other entities. Random variables (RV) represent features of entities and relationships among entities. Knowledge about attributes and relationships is expressed as a collection of MFrags organized into MTheories. An MFrag represents a conditional probability distribution for instances of its resident RVs given their parents in the fragment graph and the context nodes. An MTheory is a set of MFrags that collectively satisfies consistency constraints ensuring the existence of a unique joint probability distribution over instances of the RVs represented in each of the MFrags within the set. Like a BN, an MFrag contains nodes, which represent RVs, arranged in a directed graph whose edges represent direct dependence relationships. An isolated MFrag can be roughly compared with a standard BN with known values for its root nodes and known local distributions for its non-root nodes. For example, the MFrag of Figure 8 represents knowledge about the degree of danger to which our own starship is exposed. The fragment graph has seven nodes. The four nodes at the top of the figure are context nodes; the two shaded rectangular nodes below the context nodes are the input nodes; and the bottom node is a resident node. The Danger To Self MFrag A node in an MFrag may have a parenthesized list of arguments. These arguments are placeholders for entities in the domain. For example, the argument *st* to* HarmPotential*(*st, t*) is a placeholder for an entity that has a potential to harm, while the argument *t* is a placeholder for the time step this instance represents. To refer to an actual entity in the domain, the argument is replaced with a *unique identifier*. By convention, unique identifiers begin with an exclamation point, and no two distinct entities can have the same unique identifier. The result of substituting unique identifiers for a RV’s arguments is one or more *instances* of that RV. For example, *HarmPotential*(!*ST*1, !*T*1) and *HarmPotential*(!*ST*2, !*T*1) are two instances of *HarmPotential*(*st, t*) that both occur in the time step !*T*1. The resident nodes of an MFrag have local distributions that define how their probabilities depend on the values of their parents in the fragment graph. In a complete MTheory, each random variable has exactly one *home MFrag*, where its local distribution is defined.^{18} Input and context nodes (e.g., *OpSpec*(*st*) or *IsOwnStarship*(*s*)) influence the distribution of the resident nodes, but their distributions are defined in their own home MFrags. Context nodes represent conditions that must be satisfied for the influences and local distributions of the fragment graph to apply. Context nodes may have value *True*, *False*, or *Absurd*.^{19} Context nodes having value *True* are said to be satisfied. As an example, if the unique identifier for the *Enterprise* (i.e., !*ST*0) is substituted for the variable *s* in *IsOwnStarship*(*s*), the resulting hypothesis will be true. If, instead, a different starship unique identifier (say, !*ST*1) is used, then this hypothesis will be false. Finally, if the unique identifier of a non-starship (say, !*Z*1) replaces *s*, then this statement is absurd (i.e., it is absurd to ask whether or not a zone in space is one’s own starship). To avoid cluttering the fragment graph, the states of context nodes are not shown, contrary to what happens with input and resident nodes. This is mainly because they are Boolean nodes whose values are relevant only for deciding whether to use a resident random variable’s local distribution or its default distribution. No probability values are shown for the states of the nodes of the fragment graph in Figure 8. This is because nodes in a fragment graph do not represent individual random variables with well-defined probability distributions. Instead, a node in an MFrag represents a generic class of random variables. To draw inferences or declare evidence, we must create instances of the random variable classes. To find the probability distribution for an instance of *DangerToSelf*(*s, t*), the first step is to identify all instances of *HarmPotential*(*st, t*) and *OpSpec*(*st*) for which the context constraints are satisfied. If there are none, then the default distribution that assigns value *Absurd* with probability 1 must be used. Otherwise, to complete the definition of the MFrag of Figure 8, a local distribution must be specified for its lone resident node, *DangerToSelf*(*s, t*). The pseudo-code of Figure 8 defines a local distribution for the danger to a starship due to all starships that influence its danger level. Local distributions in standard BNs are typically represented by static tables, which limits each node to a fixed number of parents. On the other hand, an instance of a node in an MTheory might have any number of parents. Thus, MEBN implementations (i.e. languages based on MEBN logic) must provide an expressive language for defining local distributions. In this work, the use of pseudo-code is intended to convey the idea of using local expressions to specify probability distributions, while not committing to a particular syntax. Lines 3 to 5 cover the case in which there is at least one nearby starship operated by Cardassians and having the ability to harm the *Enterprise*. This is an uncomfortable situation for Capitan Picard, the Enterprise Commander, and his starship, where the probability of an unacceptable danger to self is 0.90 plus the minimum of 0.10 and the result of multiplying 0.025 by the total number of starships that are harmful and operated by Cardassians. Also the remaining belief (i.e. the difference between 100% and the belief in state *Unacceptable* is divided between *High* (80% of the remainder) and *Medium* (20% of the remainder) whereas belief in *Low* is zero. The remaining lines use similar formulas to cover the other possible configurations in which there exist starships with potential to harm *Enterprise* (i.e. *HarmPotential*(*st, t*) = *True*). The last conditional statement of the local expression covers the case in which no nearby starships can inflict harm upon the *Enterprise* (i.e. all nodes *HarmPotential*(*st, t*) have value *False*). In this case, the value for *DangerToSelf*(*s, t*) is *Low* with probability 1. Figure 9 depicts an instantiation of the Danger To Self MFrag for which there are four starships nearby, three of them operated by Cardassians and one by the Romulans. Also, the Romulan and two of the Cardassian starships are within a range at which they can harm the *Enterprise*, whereas the other Cardassian starship is too far away to inflict any harm. An Instance of the Danger To Self MFrag Following the procedure described in Figure 8, the belief for state *Unacceptable* is .975 (.90 + .025*3) and the beliefs for states *High*, *Medium*, and *Low* are .02 ((1-.975)*.8), .005 ((1-.975)*.2), and zero respectively. In short, the pseudo-code covers all possible input node configurations by linking the danger level to the number of nearby starships that have the potential to harm our own starship. The formulas state that if there are any Cardassians or Romulan starships within *Enterprise*’s range, then a glimpse of what would the distribution for danger level given the number of nearby starships looks like is depicted in Table 2. Sample Parts of the Danger To Self MFrag Probability Distribution
**Relevant Starships Nearby** |
**Danger Level Dist.** | At least 1 Cardassian | [0.925, 0.024, 0.006, 0] | At least 2 Cardassians | [0.99, 0.008, 0.002, 0] | At least 3 Cardassians | [0.975, 0.2, 0.05, 0] | More than 4 Cardassians | [1, 0, 0, 0] | No Cardassians but at least 1 Romulan | [.73, .162, .081, .027] | No Cardassians but at least 1 Romulans | [.76, .144, .072, .024] | … | … (see formula) | No Cardassians but 10 or more Romulans | [1, 0, 0, 0] | No Cardassians or Romulans, one Unknown | [.02, .48, .48, .02] | … | … (see formula) | No Cardassians or Romulans, 10+ Unknown | [.20, .30, .30, .20] | … | …(see formula) | |