인공지능 및 기계학습 개론 2(7.1~7.3) 정리

Dobby-HJ

2023. 10. 1. 21:00

7.1 Probability Concepts

확률변수 간의 관계를 표현하는 것.

Probabilities

We will write $P(A= ture)$ to mean the probability that $A = ture$
What is probability?
- Is is the relative frequency(상대적인 빈도) with which an outcome would be obtained if the process were repeated a large number of times under similar conditions.>>
확률은 frequentist(빈도주의)와 Bayeisan의 관점으로 바라볼 수 있다.

Conditional Probability

$P(A = ture | B = ture)$ → B = True인 확률일 때 A = ture일 확률
- Out of all the outcomes in which $B$ is ture, how many also have $A$ equal to ture
Read this as:
- “Probability of $A$ conditioned on $B$” or “Probability if $A$ given $B$”

Joint Probability

We will write $P(A = ture, B = ture)$ to mean
- the probability of $A = ture$ and $B = ture$
In General
- $P(X| Y) = P(X, Y) | P(Y)$

7.2 Probability Theorems

Computing with Probabilities:

Law of Total Probability

Law of Total Probability
- a.k.a “summing out” or marginalization
- $P(a) = \sum_b P(a, b) = \sum_b P(a|b) P(b)$
  - 이 식으로 알 수 잇는 것은 joint를 알면 marginalization을 통해 개별에 대해서도 알 수 있다.
Consider this case
- given a joint distribution (e.g., P(a, b, c, d))
- We can obtain any “marginal” probability (e.g., P(b)) by summing out the other variables
- $P(b) = \sum_a\sum_c\sum_d P(a,b, c,d)$
Also, consider this case
- given a joint distribution (e.g., P(a,b,c,d))
- We can obtain any conditional probability of interest
- $P(c|b) = \sum_a\sum_d P(a,c,d|b) = 1/P(b)\sum_a\sum_dP(a,c,d,b)$ → 증명
- Where $1 / P(b)$ is just a normalization constant
Joint distribution contains the information we need to compute any probability of interest
다만 joint의 경우 parameter가 급격하게 늘어나게 된다. 각각의 사건들의 종류들을 $\Pi$형태로 곱하게 되므로 기하급수적으로 늘어나게 됨.

Computing with Probabilities

Chain Rule or Factorization

We can always write
- $P(a,b,c,\dots, z) = P(a|b,c,\dots,z)P(b,c,\dots,z)$
- by definition of joint probability
위의 식을 토대로 이제 곱의 형태로 모든 식을 전개할 수 있습니다.
- $P(a,b,c,\dots, z) = P(a|b,c,\dots,z)P(b|c,\dots, z)P(c|\dots, z)\dots P(z)$
This factorization holds for any ordering of the variables
- Chain rule for probabilities
- Any joint probability → Can factorized into a series of multiplication

Joint Probability Distribution

Independence

Recall the naive Byes classifier
- why introduce naive assumption?
Variables $A$ and $B$ are independent if any of the following hold:
- $P(A|B) = P(A)$
  - $P(A,B) = P(A)P(B)$
  - $P(B|A) = P(B)$
- This says that knowing the outcome of $A$ does not tell me anything new about the outcome of $B$ → $A$ 와 $B$는 각각의 사건이 발생함으로 인해서 서로에게 영향을 주지 않는다.

Conditional vs. Marginal Independence

Marginal independence
- $P(A|B) > P(A)$
- This is not marginally independent → 즉 같으면 Marginally independent합니다.
- $X$ and $Y$ are independent if and only if $P(X) = P(X|Y)$
- Consequently, $P(X, Y) = P(X)P(Y)$
Conditional independence
- $P(X | Y, Z) = P(X|Z)$
- This is conditionally independent

7.3 Interpretation of Bayesian Network

Detour : Naive Bayes Classifier

Given:
- Class Prior $P(Y)$
- $d$ conditionally independent features $X$ given the class $Y$
- For each $X_i$, we have the likelihood of $P(X_i/Y)$
Naive Byes Classifier Function
- $f_{NB}(x) = \argmax_{Y=y}P(Y=y)\Pi_{1\le i \le d}P(X_i=x_i|Y=y)$
Essential information is modeled by
- Random Variables
- Probability distribution of the random variables
- Independence
Any way to represent the model
- Other than the formula?
- i.e graphical notation

Bayesian Network

A graphical notation of
- Random Variables
- Conditional independence
- To obtain a compact representation of the full joint distributions
Syntax
- A acyclic(비순환) and directed graph
- A set of nodes
  - A random variables
  - A conditional distribution given its parents
  - $P(X_i|Parents(X_i))$ → 개별 노드들이 나타내고 있는 내용(해당 내용이 Y값에 얼마나 Fit한 지를 나타낼 수 있음)
- A set of links
  - Direct influence from the parent to the child

Interpretation of Bayesian Network

Topology of network encodes conditional independence assertions
- Often from the domain experts
- What is related to what and how
Interpretation
- Weather is independent of the other variabels
- Toothache and Stench are conditionally independent given Cavity
- Cavity influences the probability of toothache and stench

Component of Bayesian Network

Qualitative Components
- Prior knowledge of causal realations → Prior 지식에 대한 연관 관계를 표현하고 있다.
- Learning from data → 우리가 전문가라서 Graphical한 directed network를 만들 수도 있습니다.
- Frequently used structures
- Structural aspects
Quantiative components → 수치적인 정보
- Conditional probability tables
- Probability distribution assigned to nodes
Probability computing is related to both
- Quantitative and Qualitative
  → 어떤 질문들에 대해서는 graphical한 정보(방향성)도 사용되고 Quantitative(수치적)인 정보도 함께 사용된다.

정리

$P(c|b) = \sum_a\sum_d P(a,c,d|b) = 1/P(b)\sum_a\sum_dP(a,c,d,b)$
- 위의 식에서 $P(c,b) = \sum_a\sum_bP(a,b,c,d)$인 이유는 결합확률은 주변 변수과의 조건부 확률의 합을 더함으로써 구할 수 있다. 그래서 식이 이렇게 된다.
$P(c|b) = \cfrac{P(c,b)}{P(b)} = \rightarrow \cfrac{\sum_a\sum_bP(a,b,c,d)}{P(b)}$
- Marginal vs Conditional Independence
- Marginal(주변 독립성)
- $X, Y$가 Marginal Independence를 가진다면,
  $P(X, Y) = P(X) P(Y)$를 가진다는 뜻입니다.
- Conditional(조건부 독립성)
- $X, Y$가 Conditional Independence를 가진다면
  $P(X, Y|Z) = P(X|Z) \times P(Y|Z)$를 가진다는 뜻입니다.
- 즉, 주변의 어떤 사건($Z$)이 있을 때에 Independence를 가진다면 Conditional Independence하다는 뜻입니다.

'DeepLearning > 인공지능 개론' 카테고리의 다른 글

인공지능 및 기계학습 개론 2(8.4~8.5) 정리 (5)	2023.10.15
인공지능 및 기계학습 개론 2(8.1~8.3) 정리 (1)	2023.10.14
인공지능 및 기계학습 개론 2(7.7~7.9) 정리 (1)	2023.10.09
인공지능 및 기계학습 개론 2(7.4~7.6) 정리 (1)	2023.10.02

예비 Dobby-HJ

인공지능 및 기계학습 개론 2(7.1~7.3) 정리

7.1 Probability Concepts

Probabilities

Conditional Probability

Joint Probability

7.2 Probability Theorems

Computing with Probabilities:

Computing with Probabilities

Joint Probability Distribution

Independence

Conditional vs. Marginal Independence

7.3 Interpretation of Bayesian Network

Detour : Naive Bayes Classifier

Bayesian Network

Interpretation of Bayesian Network

Component of Bayesian Network

정리

'DeepLearning > 인공지능 개론' 카테고리의 다른 글

티스토리툴바

7.1 Probability Concepts

Probabilities

Conditional Probability

Joint Probability

7.2 Probability Theorems

Computing with Probabilities:

Computing with Probabilities

Joint Probability Distribution

Independence

Conditional vs. Marginal Independence

7.3 Interpretation of Bayesian Network

Detour : Naive Bayes Classifier

Bayesian Network

Interpretation of Bayesian Network

Component of Bayesian Network

정리

'DeepLearning > 인공지능 개론' 카테고리의 다른 글

DeepLearning/인공지능 개론 카테고리와 연관된 콘텐츠

티스토리툴바