Part 5 : Learning bayesian networks

    Subdecks (3)

    Cards (61)

    • Manual construction of model for a given problem may be impossible :
      • export not aivable
      • problem too complex

      goal : construct a structured model of the (hidden) distribution most likely underlying the oberserved samples (Automatic Model Learning)
    • Automatic model learning assumptions :
      • unkown distribution
      • training examples are representative of the world
      task : learn the model with a distribution that is an approxamtion to the "training set" model and with a graph structure that reflects the true (in)dependencies in the world
    • Learning as optimisation general approach :
      • define an objective function F(M,D) : a measure that estimates how "good" a given model M is in relation to the given training examples
      • develop an algortihm to find the model that maximes F
      learning is a search/optimisation problem
    • Likelihood of a Model M
      relative to a dataset D is the probability that the model assigns to the set D : L(M:D)=L(M:D) =PM(D) P_M (D)
    • If the examples D are independent and identically distributed (i.i.d), the likelihood L(M:D) is L(M:D)=L(M:D) =PM(D)= P_M(D) =ΠxiDPM(xi) \Pi_{x_i\in D} P_M (x_i)
    • Likelihood : is the product of the probabilities assigned by the model to the individual training examples
    • Problems with the likelihood function :
      • probability will be miniscule
      • arithmetic underflow
      solution : log-likelihood
    • The Log-likelihood l(M,D) of a Model M relative to a dataset D is the logarithm of the likelihood
      l(M:D)=l(M:D) =logL(M:D)= log L(M:D) = logΠxiDPM(xi)log \Pi_{x_i \in D} P_M (x_i) = xiDlogPM(xi)\sum_{x_i \in D} log P_M(x_i)
    • Likelihood and log-likelihood are monotoncially related : l(M: D) has its maximum where L(M:D) is maximal
    • to compensate for overfitting --> a model that generalises
    • Generalisation : model must be more general than simple summary of a training set
      Overfitting : model that exactly fits the training data, but not usefull for queries about new sitiuations
    • Bias : error possibility introduced by restricting expressivity of model class
    • Bias vs Variance
      Put contraints on calss of models allowed to the learned
      • hard constraint : strictly resticts the class of models
      • soft contraints : an additional regularisation term to the objective function that adds a penalty
    • Variance : error possiblity introduced by permitting high expressivity of model class
    • Bias-Variance tradeoff
      • restriction to simple models make hypothesis space smaller and increases the probaility of bias error
      • on the other hand, in a smaller hypothesis space, it is less likely to find an overfitting model
      vs .
      • permitting complex models reduces probability of bias error
      • but introduces variance as a potential source of error
    See similar decks