인공지능 및 기계학습 개론 2(8.1~8.3) 정리

Dobby-HJ

2023. 10. 14. 01:10

8.1 ~ 2 K-Means Algorithm

Unsupervised Learning

You don’t know the true value, and you cannot provide examples of the true value.
Cases, such as
- Discovering clusters
- Discovering latent factors
- Discovering graph strcutures
Clustering or filtering or completing of
- Finding the representative topic words from text data
- Finding the latent image from facial data
- Completing the incomplete matrix of product-review scores
- Filtering the noise from the trajectory data
Methodologies
- Clustering: estimating sets and affiliations of instances to the sets
- Filtering: estimating underlying and fundamental signals from the mixture of singals and noises

Clustering Problem

How to cluster the unlabeled data points
- No concrete knowledge of their classes
- Latent (hidden) variables of classes
- Optimal assignment to the latent classes

K-Means Algorithm

K-Means algorithm
- Setup $K$ number of centroids(or prototypes) and cluster data points by the distance from the points to the nearest centroid
K-Nearest는 주변의 $K$개의 원소의 label이 무엇인지를 확인해서, 나는 그 중 가장 많은 label을 가진 것을 prediction 결과로 하는 알고리즘 → 고로 supervised learning
Formally,
- $J =\sum_{n=1}^N\sum_{k=1}^K r_{nk}||x_n-\mu_k||^2$ → 여기서 $\mu_k$는 centroid의 위치
- Minimize $J$ by optimizing
  - $r_{nk}$ : the assignment of data points to clusters
    - 이 값에 0 또는 1값을 잘 주어서 위의 $J$를 잘 minimalize하는게 현재 Task의 방향성
    - 다른 말로 parameter, decision variable등으로도 표현할 수 있습니다.
  - $u_{k}$ : the location of centroids
- Iterative optimization
  - Why?
  - Two variables are interacting

Expectation and Maximization

Expectation and Maximization을 EM algorithm이라고 한다.

Expectation Step과 Maximization Step을 번갈아가면서 Optimizing하는 기법입니다.

$K$-Mean Clustering 과정을 예로 생각해보겠습니다.

초기 셋팅 : $\mu_k$들을 Random하게 셋팅
$r_{nk}$를 현재 셋팅된 $\mu_k$에 맞게 수정함.
현재 알게된 $r_{nk}$를 이용해 $\mu_k$를 수정함.
2~3 반복

$J = \sum_{n=1}^N \sum_{k=1}^K r_{nk}||x_n - \mu_k||^2$
- Expectation
  - Expectation of the log-likelihood giben the parameters
  - Assign the data points to the nearest centroid
- Maximization
  - Maximization of the parameters(여기서 parameter에는 $r_{nk}$도 있지만 centroid의 위치값도 됩니다.) with repect to the log-likelihood
  - Update the centroid positions given the assignments
$r_{nk}$
- $r_{nk} = {0,1}$
- Discrete variable
- Logical choice : the nearest centroid $\mu_k$ for a data point of $x_n$
$\mu_k$
- $\cfrac{\partial j}{\partial \mu_k} = \cfrac{\partial}{\partial \mu_k} \sum_{n=1}^N \sum_{k=1}^K r_{nk}||x_n - \mu_k||^2$ → 미분에 필요없는 $\sum_{k=1}^K$항을 없애 새로이
  $=\cfrac{\partial}{\partial \mu_k} \sum_{n=1}^N r_{nk}||x_n - \mu_k||^2 \= \sum_{n=1}^N - 2r_{nk}(x_n - u_k) \= -2 (-\sum_{n=1}^N r_{nk}\mu_k + \sum_{n=1}^N r_{nk}x_n) = 0$
- $\mu_k = \cfrac{\sum_{n=1}^N r_{nk}x_n}{\sum_{n=1}^N r_{nk}}$ → 분자에는 assign된 datapoint들의 위치값들의 합, 분모에는 전체 데이터 포인터의 갯수 이므로 “위치 값들의 평균값”이 됨.

Progress of $K$-Means Algorithm

EM iterations to
- Optimize the assignments with respect to the sum of distances
- Optimize the parameters with respect to the sum of distance

Iteration이 지날수록 평균적인 원소들의 거리가 가까워 지도록 중심점(빨간색, 초록색, 파란색)이 이동중인 것을 볼 수 있다.

Properties of $K$-Means Algorithm

of clusters is uncertain
Initial location of centroids
- Some inital locations might not result in the reasonable results
Limitation of distance metrics
- Euclidean distance is very limitied knowledge of information
Hard clustering
- Hard assignment of data points to clusters
  - $r_{nk} = {0,1}$
    - This can be the smoothly distributed probability
  - Any alternatives?
  - Soft clustering

8.3 Multinomial Distribution

Multinomial Distribution

Binary variable
- Selecting 0 or 1 → binomial distribution
How about $K$ options?
- $X = (0, 0, 1, 0, 0, 0)$ when $K = 6$ and selecting the third option
- $\sum_k x_k = 1, P(X|\mu) = \Pi_{k=1}^K \mu_k^{x_k}$ such that $\mu_k \ge 0, \sum_k\mu_k = 1$
- A generalization of binomial distribution → Multinomial distribution
Given a dataset $D$ with $N$ selection, $x_1,\dots, x_n$
→ 이 상황은 각각의 $N$개의 Data를 각각 prediction한 상황(각 사건을 한번에 모두 예측했으므로 Joint Probability로 해석하게 되므로 $\Pi_{n=1}^N$이라는 notation이 위의 수식앞에 붙게 됨.
- $P(X|\mu) = \Pi_{n=1}^N\Pi_{k=1}^K \mu_k^{x_{nk}} = \pi_{k=1}^K\mu_k^{\sum_{n=1}^N x_{nk}} = \Pi_{k=1}^K \mu_k^{m_k}$
  - When $m_k = \sum_{n=1}^N x_{nk}$
  - Number of selecting $k^{th}$ option out of $N$ selections?
- How to determine the maximum likelihood solution of $\mu$? → MLE
  - Maximize $P(X|\mu) = \Pi_{k=1}^K \mu_k^{m_k}$
  - Subject to $\mu_k \ge 0, \sum_k \mu_k = 1$
  - $P(X|\mu)$를 maximize하려고 하는데 $\mu_k \ge 0, \sum_k \mu_k = 1$이런 제약조건이 있으므로 이것을 고려해서 Maximize를 해야하는 상황.

Lagrange Method

Method of finding a local maximum subject to constraints(제약 조건)
- Maximize $f(x,y)$
- Subejct to $g(x,y) = c$
- Assuming that $f$ and $g$ have continuous partial derivatives
  1. Lagrange function and multiplier (do you recall this?)
    1. $L(x, y, \lambda) = f(x,y) + \lambda(g(x,y) - c)$
    2. $L(\mu, m, \lambda) = \sum_{k=1}^Km_k\ln\mu_k + \lambda(\sum_{k=1}^K \mu_k - 1)$
      - Using tha log likelihodd
  2. Take the partial first-order derivative of variables, and set it to be sero
    - $\frac{d}{d\mu_k}L(\mu, m, \lambda) = \frac{m_k}{\mu_k} + \lambda = 0$ → $\mu_k = -\frac{m_k}{\lambda}$
  3. Utilize the constraint to get the optimized value
    - $\sum_k \mu_k = 1$ → $\sum_k - \frac{m_k}{\lambda} = 1$ → $\sum_km_k = -\lambda$ → $\sum_k\sum_{n=1}^N x_{nk} = -\lambda$
      → $N = -\lambda$
    - $\mu_k = \frac{m_k}{N}$ : MLE parameter of multinomial distribution
      - 즉, $k$번째 카테고리를 선택한 횟수 / 전체 사건 횟수를 나타내는 $\mu_k$를 optimize를 하면 확률 값이 올라간다.

'DeepLearning > 인공지능 개론' 카테고리의 다른 글

인공지능 및 기계학습 개론 2(8.4~8.5) 정리 (5)	2023.10.15
인공지능 및 기계학습 개론 2(7.7~7.9) 정리 (1)	2023.10.09
인공지능 및 기계학습 개론 2(7.4~7.6) 정리 (1)	2023.10.02
인공지능 및 기계학습 개론 2(7.1~7.3) 정리 (0)	2023.10.01

예비 Dobby-HJ

인공지능 및 기계학습 개론 2(8.1~8.3) 정리

8.1 ~ 2 K-Means Algorithm

Unsupervised Learning

Clustering Problem

K-Means Algorithm

Expectation and Maximization

Progress of $K$-Means Algorithm

Properties of $K$-Means Algorithm

of clusters is uncertain

8.3 Multinomial Distribution

Multinomial Distribution

Lagrange Method

'DeepLearning > 인공지능 개론' 카테고리의 다른 글

티스토리툴바

8.1 ~ 2 K-Means Algorithm

Unsupervised Learning

Clustering Problem

K-Means Algorithm

Expectation and Maximization

Progress of $K$-Means Algorithm

Properties of $K$-Means Algorithm

of clusters is uncertain

8.3 Multinomial Distribution

Multinomial Distribution

Lagrange Method

'DeepLearning > 인공지능 개론' 카테고리의 다른 글

DeepLearning/인공지능 개론 카테고리와 연관된 콘텐츠

티스토리툴바