approximate P(X) via a limited set of points randomly drawn from P
(random) samples/particles
estimate any desired probability P(x) from these samples
consistent
A sampling aglortihm that satisfies
In the limit (inf) the Empirical Distribution P^should converge to the Target Distribution P
Sample a Single Variable :
Unifrom distribution --> standard pesudo-random number generator
Discrete (categorical) distribution
Discrete (categorical) distribution
partition [0,1] into k subintervals with lengths proportional to a certain probabilities
generate a random real numer s uniformly from the interval
if is in the ithinveral, then the sampled value is xi
Forward Sampling Algorithm :
Choose a topological ordering X of the variables
For i=1 to N : sample a value xi from Val(Xi)according to the condition distributionP(Xi∣pa(Xi))given the parent values sampled previously
Forward sampling is consistent
The probability distrbution P according to which the samples are generated is equal to the true distribution P represented by the bayesian network
Generating oen random sample can be in in O(nklog(d))
How to estimate the probability P(x1, ..,xn) of an atomic event (x1, .. xn)
Sample a set D of M evenets randomly via Forward Samplling
Count the number of times K that our event match in D
estimate P beta via its relvative frequence K/M
Rejection sampling :
given : Bayesian network, a query
sample : probabilistically sample a set D of M events
Filter/Reject : discard all samples not compatable with the evidence
estimate : for each value combination, count the number K of samples in D that match x
P^(X∣e)=∣De∣K
Rejection sampling
+
extremely simple
consistent
-
wasteful
Problem with naive likelihood weighting : true probability in certain situations is much higher. Naive algorithm is unare of this
Sample LW algorithm :
compute : A weighted sample <(y,e),w>
algorithm :
choose a topological ordering
intilisate weight w of current sample to w = 1.0
For i =1 to N :
if Xi is one of the evidence variables :
set Xi to its given evidence value
update the weight
else : sample a value for X from Val(X_i) according to P
Likelihood weighting
+
consisten
less wastefull than rejection sampling
-
evidence affects only samples below
many samples wil be nearly irrelevant
Importance sampling:
P target Distribution
Q Proposal Distribution
Importance sampling : computing the weight
w(x_i) must correct for the fact that x_i was drawn from the distribution Q
w(xi)=Q(xi)P(xi)
Mulitaled Network BE=e
variables Ei have no parents in BE=e
The CPD of each Eiin BE=e gives probability 1.0Ei=ei and probability 0 to all other ei′∈Val(Ei)
The parents and CPDs of all other variables X∈Eare as in B
Markov Chain Monte Carlo (MCMC) methods :
sample directly from
constuct a seuqneces of samples
with xi drawn from some proposal distrbution
do this such that Qi+1 is closer to the target then Qi
--> needs to converge to our target distribution
Gibbs Sampling :
Start with sample x(0)=(y(0),e) draw from the mutilated network
iteratively try to imporve the current sample by iterating over each non-evidence variable an resampling a value for it, given the current values of all the other variables
Markov Blanket Mb(A)
the set of variables constigin of A' s parents, its children, and its children's other parents
The gibbs resampling distribution
P(Yi∣x−yi)=P(Yi∣mb(Yi))
A Markov Chain is a model of discrete time probabilistic process that gives a set of possible states at any time t. Fulfils the Markov property
Markov chain is defined by a state space and a transition model
Metro-polis-Hastings is a general alogrithm for constructing Markov chains for any desired Stationary Distribution
Burn-in Time
number of steps T until we collect a sample from the chain correlated to the mixing time