Part 4b : Approximate inference in bayesian networks

Cards (26)

  • For many real-world problems, exact inference in impossible :
    a solution : look for ways to compute approximate the tractable computational effort
    • Optimisation-based Methods (Variational Inference)
    • Stochastic Sampling Methods (Particle-based Methods)
  • Approximate Inference via Stochastic Sampling
    • approximate P(X) via a limited set of points randomly drawn from P
    • (random) samples/particles
    • estimate any desired probability P(x) from these samples
  • consistent
    A sampling aglortihm that satisfies
  • In the limit (inf) the Empirical Distribution P^\hat Pshould converge to the Target Distribution P
  • Sample a Single Variable :
    • Unifrom distribution --> standard pesudo-random number generator
    • Discrete (categorical) distribution
  • Discrete (categorical) distribution
    • partition [0,1] into k subintervals with lengths proportional to a certain probabilities
    • generate a random real numer s uniformly from the interval
    • if is in the ithi^{th}inveral, then the sampled value is xix^i
  • Forward Sampling Algorithm :
    1. Choose a topological ordering X of the variables
    2. For i=1 to N : sample a value xix_i from Val(Xi)Val(X_i)according to the condition distribution P(Xipa(Xi))P(X_i | pa(X_i))given the parent values sampled previously
  • Forward sampling is consistent
    The probability distrbution P according to which the samples are generated is equal to the true distribution P represented by the bayesian network
  • Generating oen random sample can be in in O(nklog(d))O(nk log(d))
  • How to estimate the probability P(x1, ..,xn) of an atomic event (x1, .. xn)
    1. Sample a set D of M evenets randomly via Forward Samplling
    2. Count the number of times K that our event match in D
    3. estimate P beta via its relvative frequence K/M
  • Rejection sampling :
    given : Bayesian network, a query
    1. sample : probabilistically sample a set D of M events
    2. Filter/Reject : discard all samples not compatable with the evidence
    3. estimate : for each value combination, count the number K of samples in D that match x
    P^(Xe)=\hat P (X|e) =KDe \frac{K}{|D_e|}
  • Rejection sampling
    +
    • extremely simple
    • consistent
    -
    • wasteful
  • Problem with naive likelihood weighting : true probability in certain situations is much higher. Naive algorithm is unare of this
  • Sample LW algorithm :
    compute : A weighted sample <(y,e),w><(y,e),w>
    algorithm :
    • choose a topological ordering
    • intilisate weight w of current sample to w = 1.0
    • For i =1 to N :
    if Xi is one of the evidence variables :
    set Xi to its given evidence value
    update the weight
    else : sample a value for X from Val(X_i) according to P
  • Likelihood weighting
    +
    • consisten
    • less wastefull than rejection sampling
    -
    • evidence affects only samples below
    • many samples wil be nearly irrelevant
  • Importance sampling:
    • P target Distribution
    • Q Proposal Distribution
  • Importance sampling : computing the weight
    • w(x_i) must correct for the fact that x_i was drawn from the distribution Q
    w(xi)=w(x_i) =P(xi)Q(xi) \frac{P(x_i)}{Q(x_i)}
  • Mulitaled Network BE=eB_{E=e}
    • variables EiE_i have no parents in BE=eB_{E=e}
    • The CPD of each EiE_iin BE=eB_{E=e} gives probability 1.0 Ei=E_i =ei e_i and probability 0 to all other eiVal(Ei)e'_i \in Val(E_i)
    • The parents and CPDs of all other variables X∉EX \not \in Eare as in B
  • Markov Chain Monte Carlo (MCMC) methods :
    • sample directly from
    • constuct a seuqneces of samples
    • with xi drawn from some proposal distrbution
    • do this such that Qi+1 is closer to the target then Qi
    --> needs to converge to our target distribution
  • Gibbs Sampling :
    • Start with sample x(0)=x^{(0)} =(y(0),e) (y^{(0)},e) draw from the mutilated network
    • iteratively try to imporve the current sample by iterating over each non-evidence variable an resampling a value for it, given the current values of all the other variables
  • Markov Blanket Mb(A)
    the set of variables constigin of A' s parents, its children, and its children's other parents
  • The gibbs resampling distribution
    P(Yixyi)=P(Y_i | x_{-y_i}) =P(Yimb(Yi)) P(Y_i | mb(Y_i))
  • A Markov Chain is a model of discrete time probabilistic process that gives a set of possible states at any time t. Fulfils the Markov property
  • Markov chain is defined by a state space and a transition model
  • Metro-polis-Hastings is a general alogrithm for constructing Markov chains for any desired Stationary Distribution
  • Burn-in Time
    number of steps T until we collect a sample from the chain correlated to the mixing time