
인공지능 및 기계학습 개론 2(7.1~7.3) 정리
Dobby-HJ
·2023. 10. 1. 21:00
7.1 Probability Concepts
- 확률변수 간의 관계를 표현하는 것.
Probabilities
- We will write $P(A= ture)$ to mean the probability that $A = ture$
- What is probability?
- Is is the relative frequency(상대적인 빈도) with which an outcome would be obtained if the process were repeated a large number of times under similar conditions.>>
- 확률은 frequentist(빈도주의)와 Bayeisan의 관점으로 바라볼 수 있다.
Conditional Probability
- $P(A = ture | B = ture)$ → B = True인 확률일 때 A = ture일 확률
- Out of all the outcomes in which $B$ is ture, how many also have $A$ equal to ture
- Read this as:
- “Probability of $A$ conditioned on $B$” or “Probability if $A$ given $B$”
Joint Probability
- We will write $P(A = ture, B = ture)$ to mean
- the probability of $A = ture$ and $B = ture$
- In General
- $P(X| Y) = P(X, Y) | P(Y)$
7.2 Probability Theorems
Computing with Probabilities:
Law of Total Probability
- Law of Total Probability
- a.k.a “summing out” or marginalization
- $P(a) = \sum_b P(a, b) = \sum_b P(a|b) P(b)$
- 이 식으로 알 수 잇는 것은 joint를 알면 marginalization을 통해 개별에 대해서도 알 수 있다.
- Consider this case
- given a joint distribution (e.g., P(a, b, c, d))
- We can obtain any “marginal” probability (e.g., P(b)) by summing out the other variables
- $P(b) = \sum_a\sum_c\sum_d P(a,b, c,d)$
- Also, consider this case
- given a joint distribution (e.g., P(a,b,c,d))
- We can obtain any conditional probability of interest
- $P(c|b) = \sum_a\sum_d P(a,c,d|b) = 1/P(b)\sum_a\sum_dP(a,c,d,b)$ → 증명
- Where $1 / P(b)$ is just a normalization constant
- Joint distribution contains the information we need to compute any probability of interest
- 다만 joint의 경우 parameter가 급격하게 늘어나게 된다. 각각의 사건들의 종류들을 $\Pi$형태로 곱하게 되므로 기하급수적으로 늘어나게 됨.
Computing with Probabilities
Chain Rule or Factorization
- We can always write
- $P(a,b,c,\dots, z) = P(a|b,c,\dots,z)P(b,c,\dots,z)$
- by definition of joint probability
- 위의 식을 토대로 이제 곱의 형태로 모든 식을 전개할 수 있습니다.
- $P(a,b,c,\dots, z) = P(a|b,c,\dots,z)P(b|c,\dots, z)P(c|\dots, z)\dots P(z)$
- This factorization holds for any ordering of the variables
- Chain rule for probabilities
- Any joint probability → Can factorized into a series of multiplication
Joint Probability Distribution
Independence
- Recall the naive Byes classifier
- why introduce naive assumption?
- Variables $A$ and $B$ are independent if any of the following hold:
- $P(A|B) = P(A)$
- $P(A,B) = P(A)P(B)$
- $P(B|A) = P(B)$
- This says that knowing the outcome of $A$ does not tell me anything new about the outcome of $B$ → $A$ 와 $B$는 각각의 사건이 발생함으로 인해서 서로에게 영향을 주지 않는다.
- $P(A|B) = P(A)$
Conditional vs. Marginal Independence
- Marginal independence
- $P(A|B) > P(A)$
- This is not marginally independent → 즉 같으면 Marginally independent합니다.
- $X$ and $Y$ are independent if and only if $P(X) = P(X|Y)$
- Consequently, $P(X, Y) = P(X)P(Y)$
- Conditional independence
- $P(X | Y, Z) = P(X|Z)$
- This is conditionally independent
7.3 Interpretation of Bayesian Network
Detour : Naive Bayes Classifier
- Given:
- Class Prior $P(Y)$
- $d$ conditionally independent features $X$ given the class $Y$
- For each $X_i$, we have the likelihood of $P(X_i/Y)$
- Naive Byes Classifier Function
- $f_{NB}(x) = \argmax_{Y=y}P(Y=y)\Pi_{1\le i \le d}P(X_i=x_i|Y=y)$
- Essential information is modeled by
- Random Variables
- Probability distribution of the random variables
- Independence
- Any way to represent the model
- Other than the formula?
- i.e graphical notation
Bayesian Network
- A graphical notation of
- Random Variables
- Conditional independence
- To obtain a compact representation of the full joint distributions
- Syntax
- A acyclic(비순환) and directed graph
- A set of nodes
- A random variables
- A conditional distribution given its parents
- $P(X_i|Parents(X_i))$ → 개별 노드들이 나타내고 있는 내용(해당 내용이 Y값에 얼마나 Fit한 지를 나타낼 수 있음)
- A set of links
- Direct influence from the parent to the child
Interpretation of Bayesian Network
- Topology of network encodes conditional independence assertions
- Often from the domain experts
- What is related to what and how
- Interpretation
- Weather is independent of the other variabels
- Toothache and Stench are conditionally independent given Cavity
- Cavity influences the probability of toothache and stench
Component of Bayesian Network
- Qualitative Components
- Prior knowledge of causal realations → Prior 지식에 대한 연관 관계를 표현하고 있다.
- Learning from data → 우리가 전문가라서 Graphical한 directed network를 만들 수도 있습니다.
- Frequently used structures
- Structural aspects
- Quantiative components → 수치적인 정보
- Conditional probability tables
- Probability distribution assigned to nodes
- Probability computing is related to both
- Quantitative and Qualitative
→ 어떤 질문들에 대해서는 graphical한 정보(방향성)도 사용되고 Quantitative(수치적)인 정보도 함께 사용된다.
- Quantitative and Qualitative
정리
- $P(c|b) = \sum_a\sum_d P(a,c,d|b) = 1/P(b)\sum_a\sum_dP(a,c,d,b)$
- 위의 식에서 $P(c,b) = \sum_a\sum_bP(a,b,c,d)$인 이유는 결합확률은 주변 변수과의 조건부 확률의 합을 더함으로써 구할 수 있다. 그래서 식이 이렇게 된다.
- $P(c|b) = \cfrac{P(c,b)}{P(b)} = \rightarrow \cfrac{\sum_a\sum_bP(a,b,c,d)}{P(b)}$
- Marginal vs Conditional Independence
- Marginal(주변 독립성)
- $X, Y$가 Marginal Independence를 가진다면,
$P(X, Y) = P(X) P(Y)$를 가진다는 뜻입니다. - Conditional(조건부 독립성)
- $X, Y$가 Conditional Independence를 가진다면
$P(X, Y|Z) = P(X|Z) \times P(Y|Z)$를 가진다는 뜻입니다. - 즉, 주변의 어떤 사건($Z$)이 있을 때에 Independence를 가진다면 Conditional Independence하다는 뜻입니다.
'DeepLearning > 인공지능 개론' 카테고리의 다른 글
인공지능 및 기계학습 개론 2(8.4~8.5) 정리 (5) | 2023.10.15 |
---|---|
인공지능 및 기계학습 개론 2(8.1~8.3) 정리 (1) | 2023.10.14 |
인공지능 및 기계학습 개론 2(7.7~7.9) 정리 (1) | 2023.10.09 |
인공지능 및 기계학습 개론 2(7.4~7.6) 정리 (1) | 2023.10.02 |