인공지능 및 기계학습 개론 2(7.1~7.3) 정리

Dobby-HJ

·

2023. 10. 1. 21:00

7.1 Probability Concepts

  • 확률변수 간의 관계를 표현하는 것.

Probabilities

  • We will write $P(A= ture)$ to mean the probability that $A = ture$
  • What is probability?
    • Is is the relative frequency(상대적인 빈도) with which an outcome would be obtained if the process were repeated a large number of times under similar conditions.>>
  • 확률은 frequentist(빈도주의)와 Bayeisan의 관점으로 바라볼 수 있다.

Conditional Probability

  • $P(A = ture | B = ture)$ → B = True인 확률일 때 A = ture일 확률
    • Out of all the outcomes in which $B$ is ture, how many also have $A$ equal to ture
  • Read this as:
    • “Probability of $A$ conditioned on $B$” or “Probability if $A$ given $B$”

Joint Probability

  • We will write $P(A = ture, B = ture)$ to mean
    • the probability of $A = ture$ and $B = ture$
  • In General
    • $P(X| Y) = P(X, Y) | P(Y)$

7.2 Probability Theorems

Computing with Probabilities:

Law of Total Probability

  • Law of Total Probability
    • a.k.a “summing out” or marginalization
    • $P(a) = \sum_b P(a, b) = \sum_b P(a|b) P(b)$
      • 이 식으로 알 수 잇는 것은 joint를 알면 marginalization을 통해 개별에 대해서도 알 수 있다.
  • Consider this case
    • given a joint distribution (e.g., P(a, b, c, d))
    • We can obtain any “marginal” probability (e.g., P(b)) by summing out the other variables
    • $P(b) = \sum_a\sum_c\sum_d P(a,b, c,d)$
  • Also, consider this case
    • given a joint distribution (e.g., P(a,b,c,d))
    • We can obtain any conditional probability of interest
    • $P(c|b) = \sum_a\sum_d P(a,c,d|b) = 1/P(b)\sum_a\sum_dP(a,c,d,b)$ → 증명
    • Where $1 / P(b)$ is just a normalization constant
  • Joint distribution contains the information we need to compute any probability of interest
  • 다만 joint의 경우 parameter가 급격하게 늘어나게 된다. 각각의 사건들의 종류들을 $\Pi$형태로 곱하게 되므로 기하급수적으로 늘어나게 됨.

Computing with Probabilities

Chain Rule or Factorization

  • We can always write
    • $P(a,b,c,\dots, z) = P(a|b,c,\dots,z)P(b,c,\dots,z)$
    • by definition of joint probability
  • 위의 식을 토대로 이제 곱의 형태로 모든 식을 전개할 수 있습니다.
    • $P(a,b,c,\dots, z) = P(a|b,c,\dots,z)P(b|c,\dots, z)P(c|\dots, z)\dots P(z)$
  • This factorization holds for any ordering of the variables
    • Chain rule for probabilities
    • Any joint probability → Can factorized into a series of multiplication

Joint Probability Distribution

Independence

  • Recall the naive Byes classifier
    • why introduce naive assumption?
  • Variables $A$ and $B$ are independent if any of the following hold:
    • $P(A|B) = P(A)$
      • $P(A,B) = P(A)P(B)$
      • $P(B|A) = P(B)$
    • This says that knowing the outcome of $A$ does not tell me anything new about the outcome of $B$ → $A$ 와 $B$는 각각의 사건이 발생함으로 인해서 서로에게 영향을 주지 않는다.

Conditional vs. Marginal Independence

  • Marginal independence
    • $P(A|B) > P(A)$
    • This is not marginally independent → 즉 같으면 Marginally independent합니다.
    • $X$ and $Y$ are independent if and only if $P(X) = P(X|Y)$
    • Consequently, $P(X, Y) = P(X)P(Y)$
  • Conditional independence
    • $P(X | Y, Z) = P(X|Z)$
    • This is conditionally independent

7.3 Interpretation of Bayesian Network

Detour : Naive Bayes Classifier

  • Given:
    • Class Prior $P(Y)$
    • $d$ conditionally independent features $X$ given the class $Y$
    • For each $X_i$, we have the likelihood of $P(X_i/Y)$
  • Naive Byes Classifier Function
    • $f_{NB}(x) = \argmax_{Y=y}P(Y=y)\Pi_{1\le i \le d}P(X_i=x_i|Y=y)$
  • Essential information is modeled by
    • Random Variables
    • Probability distribution of the random variables
    • Independence
  • Any way to represent the model
    • Other than the formula?
    • i.e graphical notation

Bayesian Network

  • A graphical notation of
    • Random Variables
    • Conditional independence
    • To obtain a compact representation of the full joint distributions
  • Syntax
    • A acyclic(비순환) and directed graph
    • A set of nodes
      • A random variables
      • A conditional distribution given its parents
      • $P(X_i|Parents(X_i))$ → 개별 노드들이 나타내고 있는 내용(해당 내용이 Y값에 얼마나 Fit한 지를 나타낼 수 있음)
    • A set of links
      • Direct influence from the parent to the child

Interpretation of Bayesian Network

  • Topology of network encodes conditional independence assertions
    • Often from the domain experts
    • What is related to what and how
  • Interpretation
    • Weather is independent of the other variabels
    • Toothache and Stench are conditionally independent given Cavity
    • Cavity influences the probability of toothache and stench

Component of Bayesian Network

  • Qualitative Components
    • Prior knowledge of causal realations → Prior 지식에 대한 연관 관계를 표현하고 있다.
    • Learning from data → 우리가 전문가라서 Graphical한 directed network를 만들 수도 있습니다.
    • Frequently used structures
    • Structural aspects
  • Quantiative components → 수치적인 정보
    • Conditional probability tables
    • Probability distribution assigned to nodes
  • Probability computing is related to both
    • Quantitative and Qualitative
      → 어떤 질문들에 대해서는 graphical한 정보(방향성)도 사용되고 Quantitative(수치적)인 정보도 함께 사용된다.

정리

  • $P(c|b) = \sum_a\sum_d P(a,c,d|b) = 1/P(b)\sum_a\sum_dP(a,c,d,b)$
    • 위의 식에서 $P(c,b) = \sum_a\sum_bP(a,b,c,d)$인 이유는 결합확률은 주변 변수과의 조건부 확률의 합을 더함으로써 구할 수 있다. 그래서 식이 이렇게 된다.
  • $P(c|b) = \cfrac{P(c,b)}{P(b)} = \rightarrow \cfrac{\sum_a\sum_bP(a,b,c,d)}{P(b)}$
    • Marginal vs Conditional Independence
    • Marginal(주변 독립성)
    • $X, Y$가 Marginal Independence를 가진다면,
      $P(X, Y) = P(X) P(Y)$를 가진다는 뜻입니다.
    • Conditional(조건부 독립성)
    • $X, Y$가 Conditional Independence를 가진다면
      $P(X, Y|Z) = P(X|Z) \times P(Y|Z)$를 가진다는 뜻입니다.
    • 즉, 주변의 어떤 사건($Z$)이 있을 때에 Independence를 가진다면 Conditional Independence하다는 뜻입니다.