
인공지능 및 기계학습 개론 2(8.1~8.3) 정리
Dobby-HJ
·2023. 10. 14. 01:10
8.1 ~ 2 K-Means Algorithm
Unsupervised Learning
- You don’t know the true value, and you cannot provide examples of the true value.
- Cases, such as
- Discovering clusters
- Discovering latent factors
- Discovering graph strcutures
- Clustering or filtering or completing of
- Finding the representative topic words from text data
- Finding the latent image from facial data
- Completing the incomplete matrix of product-review scores
- Filtering the noise from the trajectory data
- Methodologies
- Clustering: estimating sets and affiliations of instances to the sets
- Filtering: estimating underlying and fundamental signals from the mixture of singals and noises
Clustering Problem
- How to cluster the unlabeled data points
- No concrete knowledge of their classes
- Latent (hidden) variables of classes
- Optimal assignment to the latent classes
K-Means Algorithm
- K-Means algorithm
- Setup $K$ number of centroids(or prototypes) and cluster data points by the distance from the points to the nearest centroid
- K-Nearest는 주변의 $K$개의 원소의 label이 무엇인지를 확인해서, 나는 그 중 가장 많은 label을 가진 것을 prediction 결과로 하는 알고리즘 → 고로 supervised learning
- Formally,
- $J =\sum_{n=1}^N\sum_{k=1}^K r_{nk}||x_n-\mu_k||^2$ → 여기서 $\mu_k$는 centroid의 위치
- Minimize $J$ by optimizing
- $r_{nk}$ : the assignment of data points to clusters
- 이 값에 0 또는 1값을 잘 주어서 위의 $J$를 잘 minimalize하는게 현재 Task의 방향성
- 다른 말로 parameter, decision variable등으로도 표현할 수 있습니다.
- $u_{k}$ : the location of centroids
- $r_{nk}$ : the assignment of data points to clusters
- Iterative optimization
- Why?
- Two variables are interacting
Expectation and Maximization
Expectation and Maximization을 EM algorithm이라고 한다.
Expectation Step과 Maximization Step을 번갈아가면서 Optimizing하는 기법입니다.
$K$-Mean Clustering 과정을 예로 생각해보겠습니다.
- 초기 셋팅 : $\mu_k$들을 Random하게 셋팅
- $r_{nk}$를 현재 셋팅된 $\mu_k$에 맞게 수정함.
- 현재 알게된 $r_{nk}$를 이용해 $\mu_k$를 수정함.
- 2~3 반복
- $J = \sum_{n=1}^N \sum_{k=1}^K r_{nk}||x_n - \mu_k||^2$
- Expectation
- Expectation of the log-likelihood giben the parameters
- Assign the data points to the nearest centroid
- Maximization
- Maximization of the parameters(여기서 parameter에는 $r_{nk}$도 있지만 centroid의 위치값도 됩니다.) with repect to the log-likelihood
- Update the centroid positions given the assignments
- Expectation
- $r_{nk}$
- $r_{nk} = {0,1}$
- Discrete variable
- Logical choice : the nearest centroid $\mu_k$ for a data point of $x_n$
- $\mu_k$
- $\cfrac{\partial j}{\partial \mu_k} = \cfrac{\partial}{\partial \mu_k} \sum_{n=1}^N \sum_{k=1}^K r_{nk}||x_n - \mu_k||^2$ → 미분에 필요없는 $\sum_{k=1}^K$항을 없애 새로이
$=\cfrac{\partial}{\partial \mu_k} \sum_{n=1}^N r_{nk}||x_n - \mu_k||^2 \= \sum_{n=1}^N - 2r_{nk}(x_n - u_k) \= -2 (-\sum_{n=1}^N r_{nk}\mu_k + \sum_{n=1}^N r_{nk}x_n) = 0$ - $\mu_k = \cfrac{\sum_{n=1}^N r_{nk}x_n}{\sum_{n=1}^N r_{nk}}$ → 분자에는 assign된 datapoint들의 위치값들의 합, 분모에는 전체 데이터 포인터의 갯수 이므로 “위치 값들의 평균값”이 됨.
- $\cfrac{\partial j}{\partial \mu_k} = \cfrac{\partial}{\partial \mu_k} \sum_{n=1}^N \sum_{k=1}^K r_{nk}||x_n - \mu_k||^2$ → 미분에 필요없는 $\sum_{k=1}^K$항을 없애 새로이
Progress of $K$-Means Algorithm
- EM iterations to
- Optimize the assignments with respect to the sum of distances
- Optimize the parameters with respect to the sum of distance
- Iteration이 지날수록 평균적인 원소들의 거리가 가까워 지도록 중심점(빨간색, 초록색, 파란색)이 이동중인 것을 볼 수 있다.
Properties of $K$-Means Algorithm
-
of clusters is uncertain
- Initial location of centroids
- Some inital locations might not result in the reasonable results
- Limitation of distance metrics
- Euclidean distance is very limitied knowledge of information
- Hard clustering
- Hard assignment of data points to clusters
- $r_{nk} = {0,1}$
- This can be the smoothly distributed probability
- Any alternatives?
- Soft clustering
- $r_{nk} = {0,1}$
- Hard assignment of data points to clusters
8.3 Multinomial Distribution
Multinomial Distribution
- Binary variable
- Selecting 0 or 1 → binomial distribution
- How about $K$ options?
- $X = (0, 0, 1, 0, 0, 0)$ when $K = 6$ and selecting the third option
- $\sum_k x_k = 1, P(X|\mu) = \Pi_{k=1}^K \mu_k^{x_k}$ such that $\mu_k \ge 0, \sum_k\mu_k = 1$
- A generalization of binomial distribution → Multinomial distribution
- Given a dataset $D$ with $N$ selection, $x_1,\dots, x_n$
→ 이 상황은 각각의 $N$개의 Data를 각각 prediction한 상황(각 사건을 한번에 모두 예측했으므로 Joint Probability로 해석하게 되므로 $\Pi_{n=1}^N$이라는 notation이 위의 수식앞에 붙게 됨.- $P(X|\mu) = \Pi_{n=1}^N\Pi_{k=1}^K \mu_k^{x_{nk}} = \pi_{k=1}^K\mu_k^{\sum_{n=1}^N x_{nk}} = \Pi_{k=1}^K \mu_k^{m_k}$
- When $m_k = \sum_{n=1}^N x_{nk}$
- Number of selecting $k^{th}$ option out of $N$ selections?
- How to determine the maximum likelihood solution of $\mu$? → MLE
- Maximize $P(X|\mu) = \Pi_{k=1}^K \mu_k^{m_k}$
- Subject to $\mu_k \ge 0, \sum_k \mu_k = 1$
- $P(X|\mu)$를 maximize하려고 하는데 $\mu_k \ge 0, \sum_k \mu_k = 1$이런 제약조건이 있으므로 이것을 고려해서 Maximize를 해야하는 상황.
- $P(X|\mu) = \Pi_{n=1}^N\Pi_{k=1}^K \mu_k^{x_{nk}} = \pi_{k=1}^K\mu_k^{\sum_{n=1}^N x_{nk}} = \Pi_{k=1}^K \mu_k^{m_k}$
Lagrange Method
- Method of finding a local maximum subject to constraints(제약 조건)
- Maximize $f(x,y)$
- Subejct to $g(x,y) = c$
- Assuming that $f$ and $g$ have continuous partial derivatives
- Lagrange function and multiplier (do you recall this?)
- $L(x, y, \lambda) = f(x,y) + \lambda(g(x,y) - c)$
- $L(\mu, m, \lambda) = \sum_{k=1}^Km_k\ln\mu_k + \lambda(\sum_{k=1}^K \mu_k - 1)$
- Using tha log likelihodd
- Take the partial first-order derivative of variables, and set it to be sero
- $\frac{d}{d\mu_k}L(\mu, m, \lambda) = \frac{m_k}{\mu_k} + \lambda = 0$ → $\mu_k = -\frac{m_k}{\lambda}$
- Utilize the constraint to get the optimized value
- $\sum_k \mu_k = 1$ → $\sum_k - \frac{m_k}{\lambda} = 1$ → $\sum_km_k = -\lambda$ → $\sum_k\sum_{n=1}^N x_{nk} = -\lambda$
→ $N = -\lambda$ - $\mu_k = \frac{m_k}{N}$ : MLE parameter of multinomial distribution
- 즉, $k$번째 카테고리를 선택한 횟수 / 전체 사건 횟수를 나타내는 $\mu_k$를 optimize를 하면 확률 값이 올라간다.
- $\sum_k \mu_k = 1$ → $\sum_k - \frac{m_k}{\lambda} = 1$ → $\sum_km_k = -\lambda$ → $\sum_k\sum_{n=1}^N x_{nk} = -\lambda$
- Lagrange function and multiplier (do you recall this?)
'DeepLearning > 인공지능 개론' 카테고리의 다른 글
인공지능 및 기계학습 개론 2(8.4~8.5) 정리 (5) | 2023.10.15 |
---|---|
인공지능 및 기계학습 개론 2(7.7~7.9) 정리 (1) | 2023.10.09 |
인공지능 및 기계학습 개론 2(7.4~7.6) 정리 (1) | 2023.10.02 |
인공지능 및 기계학습 개론 2(7.1~7.3) 정리 (0) | 2023.10.01 |