2.2Bayesian Networks  Directed graphs There are different ways of viewing a BN. From a syntax perspective, a BN is comprised of nodes, arcs, and parameters. From a semantics perspective, there are two equivalent ways of understanding a BN. One way is that it is a compact representation of a JPD. The other is that it is a representation of dependencyindependency assumptions of a JPD. In this section I focus on the latter because this thesis is about the structural aspect of the model. I also present the theorem that ties the different ways of understanding BNs for the purpose of completeness. 2.2.1BN as a representation of dependency independency assumptions If the nodes in a DAG correspond with a set of random variables U then the DAG is called a Bayesian Network (BN) Structure.
The criterion Definition: Given a graph D=(U,E), X,Y nodes in U and a set of nodes Z in U, a trail from node X to Y is blocked by Z, if one of the following conditions is true:
There is a node Z that does not have headtohead structure on the trail and Z in Z. There is a node W that has a headtohead structure on the trail and W is not in Z and neither are any of W’s descendents. Otherwise it is said to be activated by Z.
The separation criterion for DAGs is a directiondependent criterion called dseparation. X and Y are dseparated by Z in a DAG D if every trail from X to Y in D is blocked. (This is denoted as dsepD(X,YZ)). The same is defined for a disjoint sets of nodes X, Y, Z. If for every node X in X and every node Y in Y all trails between the two nodes are blocked by Z, then X and Y are dseparated by Z (and is denoted as dsepD(X,YZ)) [65] (p.203), see figure 2.2.
Figure 2.3 dseparation. The trails in a, b and c are blocked because condition 1 is true i.e. there is a node Z on the trail from X to Y such that Z does not have headtohead structure and Z in Z. The trail in d is blocked because condition 2 is true i.e. there is a node W on the trail from X to Y such that W has a headtohead structure, W is not in Z and neither are any of W’s descendents. Note that X and Y can be sets of nodes [82].
Markov Condition Given any DAG D, according to the dseparation criterion, every node X in D and its nondescendents nodes are dseparated by the parents of X. (that is, for every X dsepD(X,NonDescendent(X)Parents(X)) is true). This is called the Markov Condition and denoted Markov(D).
dseparation and conditional independency assumptions Definition: Given a JPD F of U and a DAG D=(U,E) then D is a Bayesian Network of F if every dseparation condition displayed in D corresponds to a valid conditional independence relationship in F. In other words, for every disjoint sets of nodes X, Y, Z, if X is dseparated from Y by Z in D then X is conditionally independent of Y given Z in F. (dsepD(X,YZ) IndF(X,YZ))
The dseparation criterion for reading conditional independency assumptions from the graph topology enables the representation of the following relationships:
Independent: A and B are not connected by any trail in the graph, A and B do not influence each other. Dependent: A → B, A and B can directly influence each other (A influences B and according to Bayes’ Rule B influences A . Conditionally independent: A → C → B, A and B can influence each other through C. They become uninfluential on each other if C is known Marginally independent A → C ← B, A and B do not influence each other if nothing is known. They can become influential on each other if C or one of its descendents is known. This is a dependency relationship among variables, which is induced by information.
A special dseparating set is the Markov Blanket. The Markov blanket of each node in the graph is comprised of all the parents of the node, all the children of the node and all the other parents of all the children. Once the Markov blanket of a node is known, this node becomes dseparated from the rest of the nodes in the graph [50](p.14).
Initial assumptions Theorem: Given a JPD F of U and a DAG D=(U,E) then D is a Bayesian Network of F if Markov(D) holds in F, that is X is conditionally independent of all its nondescendents if the parents of X are known. In other words, given a BN structure D that holds the Markov(D) with respect to some JPD F, for every disjoint sets X, Y and Z, if X and Y are dseparated by Z in D then X and Y are conditionally independent given Z in F. (dsepD(X,YZ) IndF(X,YZ)).
As in the undirected graphs representation, this theorem provides an algorithm for constructing a BN Structure for a given JPD F considering only an initial set of assumptions. It also ensures that all dseparation relationships that can be read by the dseparation criterion represent valid conditional independency assumptions in F.
The DAG structure provides an elegant way of representing a set of initial independenceassumptions. The dseparation criterion, together with some efficient algorithms, provided the means to find all independencies that can evolve from the graph topology or to check if a given independency exists. However, it is important to note that in some distributions (even simple ones) not all independencies can be represented by a DAG [72](p.126). Nevertheless, the main use of the DAG, from a computational perspective, is to enable exploiting as many independencies as possible in order to make the inference task more efficient. The fact that some independencies are missed only means that in these cases the computation process might be less efficient. However, since all independencies that are in the DAG are also in F the computation process will always be correct.
2.2.2Parameters Considerations (The Quantitative Part) The quantitative part is comprised of a conditional probability table (CPT) attached to each node in the DAG. This table specifies the conditional probabilities of the states of the node given all possible combinations of the states of its parents. These tables encode the strength of the connection between the variable represented by the node and the variables that are represented by its parents in the graph. If the node has no parents, the CPT contains the unconditioned probabilities of the states of the node.
2.2.3BN as a compact representation of a JPD Theorem: Let U be a set of variables, a JPD F on U and a DAG D =(U,E) then D is a Bayesian network of F iff For Equation (7) provides a factorised and compact way for representing the JPD, using the attached CPTs.
For the purpose of this thesis BNs are described herein as a representation of dependency independency assumptions. However, for other purposes it is more intuitive to describe it as a compact representation of a JPD. For this reason BNs are defined in most of the literature as follows: A BN is a DAG in which the following statements are fulfilled: [50, 74, 82] ((p.18), (p.364), (p.436)) The nodes of the graph represent random variables. Each variable has a finite set of mutually exclusive states. A set of directed arcs connects pairs of nodes. Each node has a CPT attached to it. Under these conditions every entry in the JPD can be calculated, i.e. any probabilistic reasoning can be answered using equation (7).
2.2.4Reasoning The nodes in the model can be logically divided into query nodes and observation nodes. We know nothing about the state of the query nodes but we want to reason about them. The nodes we can observe are known after being observed. The inference then updates the probabilities of the query nodes in light of the information about the known nodes. Thus, together with inference algorithms, a BN enables the following patterns of reasoning that can be explained in causal terms or graphical terms:
Diagnostic reasoning – Information is flowing from effects to causes or from children to parents. Causal reasoning (or prediction) – Information is flowing from causes to effects or from ‘parents’ to ‘children’. Intercausal reasoning (also called explaining away) – information is flowing between causes of a common effect or between ‘parents’ of the same ‘child’. Mixed reasoning – Information is flowing between nodes combining the above structures.
Figure 2.4 Different reasoning patterns. Simple structures that demonstrate the different patterns of reasoning that are possible in a BN [82].
2.2.5Node elimination algorithm The definition of Markov(D) assumptions to hold in F provides a simple procedure to construct a BN structure, given a JPD F over a set of variables U. The variables in U are set in the order X_{1},X_{2},…,X_{n}. The first node X_{1} is chosen. The second node X_{2} is then chosen. If X_{2} depends on X_{1} according to the JPD, then an arc from X_{1} to X_{2} is drawn. The process continues this way; the X_{i} node is chosen and each of the nodes X_{1},…,X_{i1} is examined to see if X_{i} depends on it. An arc is drawn from each node on which X_{i} is found to be dependent, to the X_{i} node. The result is a BN for the given JPD. This process is relevant if the JPD is given.
In real situations the JPD is unknown and dependencies and independencies are subject to human judgment. In such situations, the procedure described above is still valid, but the answer to the question whether X_{i} depends on any of the nodes that had been previously added to the network depends on an expert’s judgment and on the added nodes. In any case, because arcs are drawn from existing nodes to the one added, the procedure for constructing a BN as described above is heavily dependent on the ordering of the variables and the resultant structure can be sparse for a good ordering and highly connected for a different ordering, e.g. [82](p.442).
Since, for an ‘n’ number of variables there are n! possible orderings, one would need to explore all of them in order to determine which structure is the best. Naturally, this is not feasible. Therefore, other approaches to the construction of BNs had to be sought.
