ML lec 03 - Linear Regression의 cost 최소화 알고리즘의 원리 설명

AI/딥러닝의 기본

ML lec 03 - Linear Regression의 cost 최소화 알고리즘의 원리 설명

만능성구 2020. 4. 22. 14:30

Simplified hypothesis

H(x) = Wx

$cost(W)=1m∑mi=1Wx(i)−y(i)<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi>c</mi><mi>o</mi><mi>s</mi><mi>t</mi><mo stretchy="false">(</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></mfrac></mrow><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></munderover><mrow data-mjx-texclass="ORD"><mi>W</mi><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>−</mo><msup><mi>y</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup></mrow></mrow></math>$

W = 1, cost(W) = 0

$13((1∗1−1)2+(1∗2−1)2+(1∗3−1)2)<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mn>3</mn></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mo stretchy="false">(</mo><mn>1</mn><mo>∗</mo><mn>1</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>∗</mo><mn>2</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo>+</mo><mo stretchy="false">(</mo><mn>1</mn><mo>∗</mo><mn>3</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo stretchy="false">)</mo></mrow></math>$

W = 0, cost(W) = 4.67

$13((0∗1−1)2+(0∗2−1)2+(0∗3−1)2)<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mn>3</mn></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mo stretchy="false">(</mo><mn>0</mn><mo>∗</mo><mn>1</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo>+</mo><mo stretchy="false">(</mo><mn>0</mn><mo>∗</mo><mn>2</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo>+</mo><mo stretchy="false">(</mo><mn>0</mn><mo>∗</mo><mn>3</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo stretchy="false">)</mo></mrow></math>$

W = 2, cost(W) = 4.67

$13((2∗1−1)2+(2∗2−1)2+(2∗3−1)2)<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mn>3</mn></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mo stretchy="false">(</mo><mn>2</mn><mo>∗</mo><mn>1</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo>+</mo><mo stretchy="false">(</mo><mn>2</mn><mo>∗</mo><mn>2</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo>+</mo><mo stretchy="false">(</mo><mn>2</mn><mo>∗</mo><mn>3</mn><mo>−</mo><mn>1</mn><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow></msup><mo stretchy="false">)</mo></mrow></math>$

What cost(W) looks like?

$cost(W)=1m∑mi=1(Wx(i)−y(i))2<math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi>c</mi><mi>o</mi><mi>s</mi><mi>t</mi><mo stretchy="false">(</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></munderover><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi>W</mi><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>−</mo><msup><mi>y</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mrow></math>$

Gradient descent algorithm

Minimize cost function
Gradient descent is used many minimization problems
For a given cost function, cost(W, b),it will be find W, b to minimize cost
It can be applied to more general function: cost(w1, w2, ...)
Gradinet descent algorithm(경사하강 알고리즘) cost를 최소로 만드는 W,b를 찾는다. 일반적인 함수에 적용된다.

How it works?

Start with initial guesses
- Start aat 0,0 (or any other value) //아무 곳에서나 시작할 수 있다
- Keeping changing W and b a little bit to try and reduce cost(W,b) // W,b를 조금씩 변경해서 cost를 줄인다
Each time you change the parameters, you select the gradient which reduces cost(W,b) the most possible //
Repeat //반복
Do so until you converge to local minimum
Has an interesting propery
- Where you start can determine which minumum you end up

미분

$cost(W)=1mm∑i=1(Wx(i)−y(i))2<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi>c</mi><mi>o</mi><mi>s</mi><mi>t</mi><mo stretchy="false">(</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></munderover><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi>W</mi><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>−</mo><msup><mi>y</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mrow></math>$

$V <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>V</mi></math>$

$cost(W)=12mm∑i=1(Wx(i)−y(i))2<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi>c</mi><mi>o</mi><mi>s</mi><mi>t</mi><mo stretchy="false">(</mo><mi>W</mi><mo stretchy="false">)</mo></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mn>2</mn><mi>m</mi></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></munderover><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi>W</mi><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>−</mo><msup><mi>y</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mrow></math>$

Formal definition

$W:=W−α∂∂Wcost(W)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>W</mi><mo>:=</mo><mrow data-mjx-texclass="ORD"><mi>W</mi><mo>−</mo><mi>α</mi><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mi>∂</mi></mrow><mrow data-mjx-texclass="ORD"><mi>∂</mi><mi>W</mi></mrow></mfrac></mrow><mi>c</mi><mi>o</mi><mi>s</mi><mi>t</mi><mo stretchy="false">(</mo><mi>W</mi><mo stretchy="false">)</mo></mrow></math>$

cost(W) : 기울기 '-' 작은 쪽으로 움직이겠다 W가 큰값으로 움직이겠다.

$W:=W−α∂∂W12mm∑i=1(Wx(i)−y(i))2<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>W</mi><mo>:=</mo><mrow data-mjx-texclass="ORD"><mi>W</mi><mo>−</mo><mi>α</mi><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mi>∂</mi></mrow><mrow data-mjx-texclass="ORD"><mi>∂</mi><mi>W</mi></mrow></mfrac></mrow></mrow><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mn>2</mn><mi>m</mi></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></munderover></mrow><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi>W</mi><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>−</mo><msup><mi>y</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><msup><mo stretchy="false">)</mo><mn>2</mn></msup></mrow></mrow></math>$

$W:=W−α12mm∑i=12(Wx(i)−y(i))x(i)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>W</mi><mo>:=</mo><mi>W</mi><mo>−</mo><mi>α</mi><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mn>2</mn><mi>m</mi></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></munderover></mrow><mn>2</mn><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi>W</mi><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>−</mo><msup><mi>y</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo stretchy="false">)</mo><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup></mrow></math>$

$W:=W−α1mm∑i=1(Wx(i)−y(i))x(i)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>W</mi><mo>:=</mo><mi>W</mi><mo>−</mo><mi>α</mi><mrow data-mjx-texclass="ORD"><mfrac><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></mfrac></mrow><mrow data-mjx-texclass="ORD"><munderover><mo data-mjx-texclass="OP">∑</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow></munderover></mrow><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi>W</mi><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo>−</mo><msup><mi>y</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup><mo stretchy="false">)</mo><msup><mi>x</mi><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo></mrow></msup></mrow></math>$

cost함수의 모양이 Convex function 이어야한다. 그래야 어디서든 정답에 도달한다.

'AI > 딥러닝의 기본' 카테고리의 다른 글

ML Lec 5: Logistic Classification (0)	2020.05.20
ML lec 04 - multi-variable linear regression 여러개의 입력(feature)의 Linear Regression (0)	2020.04.29
ML lec 02 - Linear Regression의 Hypothesis 와 cost 설명 (0)	2020.04.22
ML lec 01 - 기본적인 Machine Learning 의 용어와 개념 설명 (0)	2020.04.22
모두를 위한 머신러닝/딥러닝 강의 (1)	2020.04.22

현재글ML lec 03 - Linear Regression의 cost 최소화 알고리즘의 원리 설명

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Since. 24살