Part 5a : Parameter learning in bayesian networks

Cards (10)

  • Problem : Given a gixed dependency graph strucutre of a Bayesian Network learn the parameters/CPDs from a set of example events
  • Parameter learning is also a sub-problem in :
    • structure learning
    • learning from incomplete observations
  • if given :
    • the graph structure G= (X,E) of a Bayesian network M, edges, random variables
    • A training set D (iid)
    find :
    • complete set of parameter value for the model such that the resulting model maximies the objective function
    Two families of methods :
    • maximum likelihood estimation
    • bayesian parameter estimation
  • Thumbtack tossing N times :
    • model consist of 1 binary variable (T= "toss")
    • distribution over 2 values = h , t determined by
    • bernuilli distribution
    method :
    • observe sequence
    • calculate L(θ : D) by writing the chances and rewriting them to have only θ in them .
    • maximize the log likelihood by calculating the derivative and solve for 0
  • How to estimate a large set of parameters such that they together give a Bayesian network with maximum likelihood?
    The set of parameters that maximise the likelihood of the complete model is identical to the parameters that indivudally maximise the likelihood of each variable given its parents
  • ML Parameter estimation for discrete bayesian networks
    find : Parameters θ
    algorithm :
    for each variable X in G, with its parents U :
    for each possible assignment of values u to U :
    estimate the parameters θ as
    θ^=\hat θ =P^(Xu)= \hat P(X|u) =N[X,u]N[u] \frac{N[X,u]}{N[u]} by counting for each value x of X how offten the parents values co-occur with X = x in traning set D
  • The data fragmentation problem :
    • instances matching u in a fixed data set shrinks exponentially
    • will have large number of unspecified distributions or zereos in de CPD
    --> keep number of parents as small as possible + aovid zeroes by smoothing the ML estimates
  • Smoothing the variables to avoid 0' s
    A) alpha
    B) alpha*k
  • Computing the ML estimate for the parameters :
    1. writ down the log-likelihood function
    2. copmute the derivative w.r.t the parameters and set to 0
    3. solve the system
  • In discrete case with the normal distribution with a Gausiian PDF, the ML estimate for the mean is the sample mean, for the standard deviation is the square root of the sample variance