This paper introduces a new probabilistic graphical model called gated Bayesian network (GBN). This model evolved from the need to represent real world processes that include several distinct phases. In essence a GBN is a model that combines several Bayesian networks (BN) in such a manner that they may be active or inactive during queries to the model. We use objects called gates to combine BNs, and to activate and deactivate them when predefined logical statements are satisfied. These statements are based on combinations of posterior probabilities of the variables in the BNs. Although GBN is a new formalism there are features of GBNs that are similar to other formalisms and research, including influence diagrams, context-specific independence and structural adaptation.
Gated Bayesian networks (GBNs) are a recently introduced extension of Bayesian networks that aims to model dynamical systems consisting of several distinct phases. In this paper, we present an algorithm for semi-automatic learning of GBNs. We use the algorithm to learn GBNs that output buy and sell decisions for use in algorithmic trading systems. We show how using the learnt GBNs can substantially lower risks towards invested capital, while at the same time generating similar or better rewards, compared to the benchmark investment strategy buy-and-hold.
Gated Bayesian networks (GBNs) are a recently introduced extension of Bayesian networks that aims to model dynamical systems consisting of several distinct phases. In this paper, we present an algo- rithm for semi-automatic learning of GBNs. We use the algorithm to learn GBNs that output buy and sell decisions for use in algorithmic trading systems. We show how using the learnt GBNs can substantially lower risks towards invested capital, while at the same time generating similar or better rewards, compared to the benchmark investment strat- egy buy-and-hold.
Bayesian networks (BNs) are advantageous when representing single independence models, however they do not allow us to model changes among the relationships of the random variables over time. Due to such regime changes, it may be necessary to use different BNs at different times in order to have an appropriate model over the random variables. In this paper we propose two extensions to the traditional hidden Markov model, allowing us to represent both the different regimes using different BNs, and potential driving forces behind the regime changes, by modelling potential dependence between state transitions and some observable variables. We show how expectation maximisation can be used to learn the parameters of the proposed model, and run both synthetic and real-world experiments to show the model’s potential.
When there are several experts in a specific domain, each may believe in a different Bayesian network (BN) representation of the domain. In order to avoid having to work with several BNs, it is desirable to aggregate them into a single BN. One way of finding the aggregated BN is to start by finding the structure, and then find the parameters. In this paper, we focus on the second step, assuming that the structure has been found by some previous method.
DemocraticOP is a new way of combining experts’ parameters in a model. The logic behind this approach is borrowed from the concept of democracy in the real world. We assume that there is a ground truth and that each expert represents a deviation from it - the goal is to try to find the ground truth based on the experts’ opinions. If the experts do not agree, then taking a simple average of their opinions (as occurs in classical aggregation functions such as LinOP and LogOP) is flawed. Instead, we believe it is better to identify similar opinions through clustering, and then apply averaging, or any other aggregation function, over the cluster with the highest number of members to obtain the aggregated parameters that are closest to the ground truth. In other words, respect the majority as is done in democratic societies instead of averaging over all experts’ parameters. The new approach is implemented and tested over several BNs with different numbers of variables and parameters, and with different numbers of experts. The results show that DemocraticOP outperforms two commonly used methods, LinOP and LogOP, in three key metrics: the average of absolute value of the difference between the true probability distribution and the one corresponding to the aggregated parameters, Kullback-Leibler divergence, and running time.
This paper aims at justifying LWF and AMP chain graphs by showing that they do not represent arbitrary independence models. Specifically, we show that every chain graph is inclusion optimal wrt the intersection of the independence models represented by a set of directed and acyclic graphs under conditioning. This implies that the independence model represented by the chain graph can be accounted for by a set of causal models that are subject to selection bias, which in turn can be accounted for by a system that switches between different regimes or configurations.
We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMAL-OPTIMAL) vs. finding all features relevant to the target variable (ALL-RELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene expression analysis. For both problems, we identify classes of data distributions for which there exist consistent, polynomial-time algorithms. We also prove that ALL-RELEVANT is much harder than MINIMAL-OPTIMAL and propose two consistent, polynomial-time algorithms. We argue that the distribution classes considered are reasonable in many practical cases, so that our results simplify feature selection in a wide range of machine learning tasks.
n/a
We address some computational issues that may hinder the use of AMP chain graphs in practice. Specifically, we show how a discrete probability distribution that satisfies all the independencies represented by an AMP chain graph factorizes according to it. We show how this factorization makes it possible to perform inference and parameter learning efficiently, by adapting existing algorithms for Markov and Bayesian networks. Finally, we turn our attention to another issue that may hinder the use of AMP CGs, namely the lack of an intuitive interpretation of their edges. We provide one such interpretation.
This paper deals with chain graphs under the Andersson-Madigan-Perlman (AMP) interpretation. In particular, we present a constraint based algorithm for learning an AMP chain graph a given probability distribution is faithful to. Moreover, we show that the extension of Meeks conjecture to AMP chain graphs does not hold, which compromises the development of efficient and correct score + search learning algorithms under assumptions weaker than faithfulness. We also study the problem of how to represent the result of marginalizing out some nodes in an AMP CG. We introduce a new family of graphical models that solves this problem partially. We name this new family maximal covariance-concentration graphs because it includes both covariance and concentration graphs as subfamilies.
We present a new family of models that is based on graphs that may have undirected, directed and bidirected edges. We name these new models marginal AMP (MAMP) chain graphs because each of them is Markov equivalent to some AMP chain graph under marginalization of some of its nodes. However, MAMP chain graphs do not only subsume AMP chain graphs but also multivariate regression chain graphs. We describe global and pairwise Markov properties for MAMP chain graphs and prove their equivalence for compositional graphoids. We also characterize when two MAMP chain graphs are Markov equivalent. For Gaussian probability distributions, we also show that every MAMP chain graph is Markov equivalent to some directed and acyclic graph with deterministic nodes under marginalization and conditioning on some of its nodes. This is important because it implies that the independence model represented by a MAMP chain graph can be accounted for by some data generating process that is partially observed and has selection bias. Finally, we modify MAMP chain graphs so that they are closed under marginalization for Gaussian probability distributions. This is a desirable feature because it guarantees parsimonious models under marginalization.
In an independence model, the triplets that represent conditional independences between singletons are called elementary. It is known that the elementary triplets represent the independence model unambiguously under some conditions. In this paper, we show how this representation helps performing some operations with independence models, such as nding the dominant triplets or a minimal independence map of an independence model, or computing the union or intersection of a pair of independence models, or performing causal reasoning. For the latter, we rephrase in terms of conditional independences some of Pearl's results for computing causal e ects.
Marginal AMP chain graphs are a recently introduced family of models that is based on graphs that may have undirected, directed and bidirected edges. They unify and generalize the AMP and the multivariate regression interpretations of chain graphs. In this paper, we present a constraint based algorithm for learning a marginal AMP chain graph from a probability distribution which is faithful to it. We show that the marginal AMP chain graph returned by our algorithm is a distinguished member of its Markov equivalence class. We also show that our algorithm performs well in practice. Finally, we show that the extension of Meeks conjecture to marginal AMP chain graphs does not hold, which compromises the development of efficient and correct score+search learning algorithms under assumptions weaker than faithfulness. (C) 2015 Elsevier Inc. All rights reserved.
Consider a classification problem involving only discrete features that are represented as random variables with some prescribed discrete sample space. In this paper, we study the complexity of two feature selection problems. The first problem consists in finding a feature subset of a given size k that has minimal Bayes risk. We show that for any increasing ordering of the Bayes risks of the feature subsets (consistent with an obvious monotonicity constraint), there exists a probability distribution that exhibits that ordering. This implies that solving the first problem requires an exhaustive search over the feature subsets of size k. The second problem consists of finding the minimal feature subset that has minimal Bayes risk. In the light of the complexity of the first problem, one may think that solving the second problem requires an exhaustive search over all of the feature subsets. We show that, under mild assumptions, this is not true. We also study the practical implications of our solutions to the second problem.
This book contains contributions by current and former colleagues and PhD students of professor Nahid Shahmehri in celebration of her 60th birthday. Although it would be difficult to cover the full range of her academic contributions, we have at least been able to hint the importance and the breadth of her work. We have chosen the title ‘Advances in Secure and Networked Information Systems - The ADIT Perspective’ as many of the contributions of Nahid and her group have been in these areas, given a fairly broad interpretation of “networked information systems”. In this collection we have gathered both republications of past work as well as newly written articles.
In [6], MCMC sampling is applied to approximately calculate the ratio of essential graphs (EGs) to directed acyclic graphs (DAGs) for up to 20 nodes. In the present paper, we extend that work from 20 to 31 nodes. We also extend that work by computing the approximate ratio of connected EGs to connected DAGs, of connected EGs to EGs, and of connected DAGs to DAGs. Furthermore, we prove that the latter ratio is asymptotically 1. We also discuss the implications of these results for learning DAGs from data.
Any regular Gaussian probability distribution that can be represented by an AMP chain graph (CG) can be expressed as a system of linear equations with correlated errors whose structure depends on the CG. However, the CG represents the errors implicitly, as no nodes in the CG correspond to the errors. We propose in this paper to add some deterministic nodes to the CG in order to represent the errors explicitly. We call the result an EAMP CG. We will show that, as desired, every AMP CG is Markov equivalent to its corresponding EAMP CG under marginalization of the error nodes. We will also show that every EAMP CG under marginalization of the error nodes is Markov equivalent to some LWF CG under marginalization of the error nodes, and that the latter is Markov equivalent to some directed and acyclic graph (DAG) under marginalization of the error nodes and conditioning on some selection nodes. This is important because it implies that the independence model represented by an AMP CG can be accounted for by some data generating process that is partially observed and has selection bias. Finally, we will show that EAMP CGs are closed under marginalization. This is a desirable feature because it guarantees parsimonious models under marginalization.
This paper deals with chain graphs under the classic Lauritzen-Wermuth-Frydenberg interpretation. We prove that the strictly positive discrete probability distributions with the prescribed sample space that factorize according to a chain graph G with dimension d have positive Lebesgue measure wrt R-d. whereas those that factorize according to G but are not faithful to it have zero Lebesgue measure wrt R-d. This means that, in the measure-theoretic sense described, almost all the strictly positive discrete probability distributions with the prescribed sample space that factorize according to G are faithful to it.
This paper deals with chain graphs under the classic Lauritzen-Wermuth-Frydenberg interpretation. We prove that almost all the regular Gaussian distributions that factorize with respect to a chain graph are faithful to it. This result has three important consequences. First, chain graphs are more powerful than undirected graphs and acyclic directed graphs for representing regular Gaussian distributions, as some of these distributions can be represented exactly by the former but not by the latter. Second, the moralization and c-separation criteria for reading independencies from a chain graph are complete, in the sense that they identify all the independencies that can be identified from the chain graph alone. Third, some definitions of equivalence in chain graphs coincide and, thus, they have the same graphical characterization.
Suppose that multiple experts (or learning algorithms) provide us with alternative Bayesian network (BN) structures over a domain, and that we are interested in combining them into a single consensus BN structure. Specifically, we are interested in that the consensus BN structure only represents independences all the given BN structures agree upon and that it has as few parameters associated as possible. In this paper, we prove that there may exist several non-equivalent consensus BN structures and that finding one of them is NP-hard. Thus, we decide to resort to heuristics to find an approximated consensus BN structure. In this paper, we consider the heuristic proposed by Matzkevich and Abramson, which builds upon two algorithms, called Methods A and B, for efficiently deriving the minimal directed independence map of a BN structure relative to a given node ordering. Methods A and B are claimed to be correct although no proof is provided (a proof is just sketched). In this paper, we show that Methods A and B are not correct and propose a correction of them.
This paper deals with chain graphs under the alternative Andersson-Madigan-Perlman(AMP) interpretation. In particular, we present a constraint based algorithm for learningan AMP chain graph a given probability distribution is faithful to. We also show that theextension of Meek's conjecture to AMP chain graphs does not hold, which compromises thedevelopment of ecient and correct score+search learning algorithms under assumptionsweaker than faithfulness.
In many cases what matters is not whether a false discovery is made or not but the expected proportion of false discoveries among all the discoveries made, i.e. the so-called false discovery rate (FDR). We present an algorithm aiming at controlling the FDR of edges when learning Gaussian graphical models (GGMs). The algorithm is particularly suitable when dealing with more nodes than samples, e.g. when learning GGMs of gene networks from gene expression data. We illustrate this on the Rosetta compendium [8].
The covariancegraph (aka bi-directed graph) of a probability distribution p is the undirected graphG where two nodes are adjacent iff their corresponding random variables are marginally dependent in p. (It is worth mentioning that our definition of covariancegraph is somewhat non-standard. The standard definition states that the lack of an edge between two nodes of G implies that their corresponding random variables are marginally independent in p. This difference in the definition is important in this paper.) In this paper, we present a graphical criterion for readingdependencies from G, under the assumption that p satisfies the graphoid properties as well as weak transitivity and composition. We prove that the graphical criterion is sound and complete in certain sense. We argue that our assumptions are not too restrictive. For instance, all the regular Gaussian probability distributions satisfy them.
We present a graphical criterion for reading dependencies from the minimal directed in-dependence map G of a graphoid p, under the assumption that G is a polytree and p satisfies weak transitivity. We prove that the criterion is sound and complete. We argue that assuming weak transitivity is not too restrictive.
Acyclic directed mixed graphs (ADMGs) are the graphs used by Pearl (Causality: models, reasoning, and inference. Cambridge University Press, Cambridge, 2009) for causal effect identification. Recently, alternative acyclic directed mixed graphs (aADMGs) have been proposed by Peña (Proceedings of the 32nd conference on uncertainty in artificial intelligence, 577–586, 2016) for causal effect identification in domains with additive noise. Since the ADMG and the aADMG of the domain at hand may encode different model assumptions, it may be that the causal effect of interest is identifiable in one but not in the other. Causal effect identification in ADMGs is well understood. In this paper, we introduce a sound algorithm for identifying arbitrary causal effects from aADMGs. We show that the algorithm follows from a calculus similar to Pearl’s do-calculus. Then, we turn our attention to Andersson–Madigan–Perlman chain graphs, which are a subclass of aADMGs, and propose a factorization for the positive discrete probability distributions that are Markovian with respect to these chain graphs. We also develop an algorithm to perform maximum likelihood estimation of the factors in the factorization.
Motivation: For the last few years, Bayesian networks (BNs) have received increasing attention from the computational biology community as models of gene networks, though learning them from gene-expression data is problematic. Most gene-expression databases contain measurements for thousands of genes, but the existing algorithms for learning BNs from data do not scale to such high-dimensional databases. This means that the user has to decide in advance which genes are included in the learning process, typically no more than a few hundreds, and which genes are excluded from it. This is not a trivial decision. We propose an alternative approach to overcome this problem. Results: We propose a new algorithm for learning BN models of gene networks from gene-expression data. Our algorithm receives a seed gene S and a positive integer R from the user, and returns a BN for the genes that depend on S such that less than R other genes mediate the dependency. Our algorithm grows the BN, which initially only contains S, by repeating the following step R + 1 times and, then, pruning some genes, find the parents and children of all the genes in the BN and add them to it. Intuitively, our algorithm provides the user with a window of radius R around S to look at the BN model of a gene network without having to exclude any gene in advance. We prove that our algorithm is correct under the faithfulness assumption. We evaluate our algorithm on simulated and biological data (Rosetta compendium) with satisfactory results. © The Author 2005. Published by Oxford University Press. All rights reserved.
We study cross-validation as a scoring criterion for learning dynamic Bayesian network models that generalize well. We argue that cross-validation is more suitable than the Bayesian scoring criterion for one of the most common interpretations of generalization. We confirm this by carrying out an experimental comparison of cross-validation and the Bayesian scoring criterion, as implemented by the Bayesian Dirichlet metric and the Bayesian information criterion. The results show that cross-validation leads to models that generalize better for a wide range of sample sizes. © 2005 Elsevier B.V. All rights reserved.
We propose a framework for learning from data and validating Bayesian network models of gene networks. The learning phase selects multiple locally optimal models of the data and reports the best of them. The validation phase assesses the confidence in the model reported by studying the different locally optimal models obtained in the learning phase. We prove that our framework is asymptotically correct under the faithfulness assumption. Experiments with real data (320 samples of the expression levels of 32 genes involved in Saccharomyces cerevisiae, i.e. baker’s yeast, pheromone response) show that our framework is reliable.
We propose an algorithm for learning the Markov boundary of a random variable from data without having to learn a complete Bayesian network. The algorithm is correct under the faithfulness assumption, scalable and data efficient. The last two properties are important because we aim to apply the algorithm to identify the minimal set of random variables that is relevant for probabilistic classification in databases with many random variables but few instances. We report experiments with synthetic and real databases with 37, 441 and 139352 random variables showing that the algorithm performs satisfactorily.
Many optimization problems are what can be called globally multimodal, i.e., they present several global optima. Unfortunately, this is a major source of difficulties for most estimation of distribution algorithms, making their effectiveness and efficiency degrade, due to genetic drift. With the aim of overcoming these drawbacks for discrete globally multimodal problem optimization, this paper introduces and evaluates a new estimation of distribution algorithm based on unsupervised learning of Bayesian networks. We report the satisfactory results of our experiments with symmetrical binary optimization problems. © 2005 by the Massachusetts Institute of Technology.
We present a sound and complete graphical criterion for reading dependencies from the minimal undirected independence map G of a graphoid M that satisfies weak transitivity. Here, complete means that it is able to read all the dependencies in M that can be derived by applying the graphoid properties and weak transitivity to the dependencies used in the construction of G and the independencies obtained from G by vertex separation. We argue that assuming weak transitivity is not too restrictive. As an intermediate step in the derivation of the graphical criterion, we prove that for any undirected graph G there exists a strictly positive discrete probability distribution with the prescribed sample spaces that is faithful to G. We also report an algorithm that implements the graphical criterion and whose running time is considered to be at most O(n(2)(e + n)) for n nodes and e edges. Finally, we illustrate how the graphical criterion can be used within bioinformatics to identify biologically meaningful gene dependencies.
We propose algorithms for learning Markov boundaries from data without having to learn a Bayesian network first. We study their correctness, scalability and data efficiency. The last two properties are important because we aim to apply the algorithms to identify the minimal set of features that is needed for probabilistic classification in databases with thousands of features but few instances, e.g. gene expression databases. We evaluate the algorithms on synthetic and real databases, including one with 139,351 features. © 2006 Elsevier Inc. All rights reserved.
This paper presents and proves an extension of Meek’s conjecture to chain graphs under the Lauritzen-Wermuth-Frydenberg interpretation. The proof of the conjecture leads to the development of a structure learning algorithm that finds an inclusion optimal chain graph for any given probability distribution satisfying the composition property. Finally, the new algorithm is experimentally evaluated.
Learning an optimal chain graph from data is an important hard computational problem. We present a new approach to solve this problem for various objective functions without making any assumption on the probability distribution at hand. Our approach is based on encoding the learning problem declaratively using the answer set programming (ASP) paradigm. Empirical results show that our approach provides at least as accurate solutions as the best solutions provided by the existing algorithms, and overall provides better accuracy than any single previous algorithm.