Problem : Given a gixed dependency graph strucutre of a Bayesian Network learn the parameters/CPDs from a set of example events
Parameter learning is also a sub-problem in :
structure learning
learning from incomplete observations
if given :
the graph structure G= (X,E) of a Bayesian network M, edges, random variables
A training set D (iid)
find :
complete set of parameter value for the model such that the resulting model maximies the objective function
Two families of methods :
maximum likelihood estimation
bayesian parameter estimation
Thumbtack tossing N times :
model consist of 1 binary variable (T= "toss")
distribution over 2 values = h , t determined by
bernuilli distribution
method :
observe sequence
calculate L(θ : D) by writing the chances and rewriting them to have only θ in them .
maximize the log likelihood by calculating the derivative and solve for 0
How to estimate a large set of parameters such that they together give a Bayesian network with maximum likelihood?
The set of parameters that maximise the likelihood of the complete model is identical to the parameters that indivudally maximise the likelihood of each variable given its parents
ML Parameter estimation for discrete bayesian networks
find : Parameters θ
algorithm :
for each variable X in G, with its parents U :
for each possible assignment of values u to U :
estimate the parameters θ as
θ^=P^(X∣u)=N[u]N[X,u] by counting for each value x of X how offten the parents values co-occur with X = x in traning set D
The data fragmentation problem :
instances matching u in a fixed data set shrinks exponentially
will have large number of unspecified distributions or zereos in de CPD
--> keep number of parents as small as possible + aovid zeroes by smoothing the ML estimates
Smoothing the variables to avoid 0' s
A) alpha
B) alpha*k
Computing the ML estimate for the parameters :
writ down the log-likelihood function
copmute the derivative w.r.t the parameters and set to 0
solve the system
In discrete case with the normal distribution with a Gausiian PDF, the ML estimate for the mean is the sample mean, for the standard deviation is the square root of the sample variance