๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
IT/AI, ๋”ฅ๋Ÿฌ๋‹

L1 vs L2 ๊ทœ์ œํ™” ์ฐจ์ด ์ •๋ฆฌ! ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€ ํ•ต์‹ฌ ๊ธฐ๋ฒ•

by ๐Ÿ”ฅ๊นก ๋‹ค ๊ณ ! 2025. 4. 15.

๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋‹ค ๋ณด๋ฉด ๊ฐ€์žฅ ํ”ํžˆ ๋งˆ์ฃผ์น˜๋Š” ๋ฌธ์ œ๊ฐ€ ๊ณผ์ ํ•ฉ(overfitting)์ž…๋‹ˆ๋‹ค. ์ด๋•Œ ํšจ๊ณผ์ ์œผ๋กœ ๊ณผ์ ํ•ฉ์„ ์ค„์ด๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋ฐ”๋กœ ๊ทœ์ œํ™”(Regularization)์ž…๋‹ˆ๋‹ค.

ํŠนํžˆ ๋จธ์‹ ๋Ÿฌ๋‹๊ณผ ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๊ทœ์ œํ™” ๊ธฐ๋ฒ•์ด L1 ๊ทœ์ œ์™€ L2 ๊ทœ์ œ์ธ๋ฐ์š”, ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” ์ด ๋‘ ๊ฐ€์ง€ ๊ทœ์ œ ๋ฐฉ๋ฒ•์˜ ๊ฐœ๋…, ์ˆ˜ํ•™์  ์ •์˜, ์ฐจ์ด์ , ์–ธ์ œ ์‚ฌ์šฉํ•˜๋ฉด ์ข‹์€์ง€๋ฅผ ์ •๋ฆฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.


1. ๊ทœ์ œํ™”(Regularization)๋ž€?

๊ทœ์ œํ™”๋Š” ๋ชจ๋ธ์ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๊ณผํ•˜๊ฒŒ ์ ํ•ฉ(overfit)๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์˜ ๋ณต์žก๋„์— ํŒจ๋„ํ‹ฐ(์ œ์•ฝ ์กฐ๊ฑด)๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

๊ฐ„๋‹จํžˆ ๋งํ•ด, ๋„ˆ๋ฌด ํฐ ๊ฐ€์ค‘์น˜(weight)๋ฅผ ๊ฐ–๋Š” ๋ชจ๋ธ์€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์ž‘๊ฒŒ ์œ ์ง€ํ•˜๋„๋ก ๋ฒŒ์ ์„ ์ฃผ๋Š” ๊ฒƒ์ด์ฃ .


2. L1 ๊ทœ์ œ(Lasso)๋ž€?

L1 ๊ทœ์ œ๋Š” ์†์‹ค ํ•จ์ˆ˜์— ๊ฐ€์ค‘์น˜์˜ ์ ˆ๋Œ“๊ฐ’ ํ•ฉ์„ ๋”ํ•ด์ฃผ๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ถˆํ•„์š”ํ•œ ๊ฐ€์ค‘์น˜๋Š” 0์œผ๋กœ ์ˆ˜๋ ดํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ๋ชจ๋ธ์„ ๋” ๋‹จ์ˆœํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

Loss = MSE + λ * Σ|wแตข|

- λ (๋žŒ๋‹ค)๋Š” ๊ทœ์ œ ๊ฐ•๋„๋ฅผ ์กฐ์ ˆํ•˜๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.
- ์ผ๋ถ€ ๊ฐ€์ค‘์น˜๋ฅผ 0์œผ๋กœ ๋งŒ๋“ค์–ด ๋ณ€์ˆ˜ ์„ ํƒ(feature selection)์— ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.
- ๋Œ€ํ‘œ ์•Œ๊ณ ๋ฆฌ์ฆ˜: Lasso ํšŒ๊ท€(Lasso Regression)


3. L2 ๊ทœ์ œ(Ridge)๋ž€?

L2 ๊ทœ์ œ๋Š” ์†์‹ค ํ•จ์ˆ˜์— ๊ฐ€์ค‘์น˜ ์ œ๊ณฑ์˜ ํ•ฉ์„ ๋”ํ•ด์ฃผ๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ๊ฐ€์ค‘์น˜๋ฅผ 0์— ๊ฐ€๊น๊ฒŒ ๋งŒ๋“ค๋˜, ์™„์ „ํžˆ 0์œผ๋กœ ๋งŒ๋“ค์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค.

Loss = MSE + λ * Σ(wแตข²)

- ๊ฐ€์ค‘์น˜์˜ ํฌ๊ธฐ๋ฅผ ์ž‘๊ฒŒ ์œ ์ง€ํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์ผ๋ฐ˜ํ™”ํ•˜๋Š” ๋ฐ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค.
- L1๊ณผ๋Š” ๋‹ฌ๋ฆฌ ๋ชจ๋“  ๋ณ€์ˆ˜์— ๊ธฐ์—ฌ๋„๋ฅผ ๋‚จ๊น€
- ๋Œ€ํ‘œ ์•Œ๊ณ ๋ฆฌ์ฆ˜: Ridge ํšŒ๊ท€(Ridge Regression), ๋”ฅ๋Ÿฌ๋‹์˜ Weight Decay


4. L1 vs L2 ๊ทœ์ œ์˜ ์ฐจ์ด์ 

ํ•ญ๋ชฉ L1 ๊ทœ์ œ (Lasso) L2 ๊ทœ์ œ (Ridge)
ํŒจ๋„ํ‹ฐ ํ•ญ |w| (์ ˆ๋Œ“๊ฐ’) w² (์ œ๊ณฑ)
๊ฐ€์ค‘์น˜ ์ฒ˜๋ฆฌ ์ผ๋ถ€ ๊ฐ€์ค‘์น˜๋ฅผ 0์œผ๋กœ ๋งŒ๋“ฆ ๋ชจ๋“  ๊ฐ€์ค‘์น˜๋ฅผ ์ž‘๊ฒŒ ์œ ์ง€
๋ณ€์ˆ˜ ์„ ํƒ O (ํŠน์„ฑ ์„ ํƒ ํšจ๊ณผ) X (๋ชจ๋“  ํŠน์„ฑ ์‚ฌ์šฉ)
ํ™œ์šฉ ๋ถ„์•ผ ํฌ์†Œ ๋ชจ๋ธ, ํ”ผ์ฒ˜ ์ค‘์š”๋„ ๋ถ„์„ ๋”ฅ๋Ÿฌ๋‹, ์„ ํ˜• ํšŒ๊ท€ ์ผ๋ฐ˜ํ™”

5. ์–ธ์ œ L1, ์–ธ์ œ L2๋ฅผ ์‚ฌ์šฉํ• ๊นŒ?

  • L1 ๊ทœ์ œ๋Š” → ๋งŽ์€ ํ”ผ์ฒ˜ ์ค‘ **์ค‘์š”ํ•œ ๊ฒƒ๋งŒ ๋‚จ๊ธฐ๊ณ  ์‹ถ์„ ๋•Œ**
  • L2 ๊ทœ์ œ๋Š” → ๋ชจ๋“  ํ”ผ์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜๋˜ **๊ณผ์ ํ•ฉ์„ ์ค„์ด๊ณ  ์‹ถ์„ ๋•Œ**
  • L1 + L2 ํ˜ผํ•ฉ: ์—˜๋ผ์Šคํ‹ฑ๋„ท(Elastic Net) ๊ทœ์ œ๋กœ ๋‘ ๋ฐฉ์‹์˜ ์žฅ์ ์„ ๋ชจ๋‘ ์ทจํ•จ

๋”ฅ๋Ÿฌ๋‹์—์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ L2 ๊ทœ์ œ(weight decay)๊ฐ€ ๋” ๋งŽ์ด ์“ฐ์ด๋ฉฐ, ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ๋Š” ๋ณ€์ˆ˜ ์„ ํƒ์„ ์œ„ํ•ด L1 ๊ทœ์ œ ๋˜๋Š” ํ˜ผํ•ฉ ๋ฐฉ์‹์ด ์„ ํ˜ธ๋ฉ๋‹ˆ๋‹ค.


6. ๋งˆ๋ฌด๋ฆฌ ์š”์•ฝ

L1๊ณผ L2 ๊ทœ์ œ๋Š” ๋จธ์‹ ๋Ÿฌ๋‹๊ณผ ๋”ฅ๋Ÿฌ๋‹์—์„œ ๊ณผ์ ํ•ฉ ๋ฐฉ์ง€์™€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด ํ•„์ˆ˜์ ์ธ ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

- L1์€ **๊ฐ€์ค‘์น˜๋ฅผ 0์œผ๋กœ ๋งŒ๋“ค์–ด ๋ถˆํ•„์š”ํ•œ ํ”ผ์ฒ˜ ์ œ๊ฑฐ(ํฌ์†Œ์„ฑ)** - L2๋Š” **๊ฐ€์ค‘์น˜๋ฅผ ์ž‘๊ฒŒ ์œ ์ง€ํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์ผ๋ฐ˜ํ™”** - ๋ชฉ์ ์— ๋งž๊ฒŒ ์„ ํƒํ•˜๊ฑฐ๋‚˜, ๋‘ ๊ฐ€์ง€๋ฅผ ํ˜ผํ•ฉํ•œ ElasticNet๋„ ํ™œ์šฉ ๊ฐ€๋Šฅ

์„ฑ๋Šฅ ์ข‹์€ ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ณ  ์‹ถ๋‹ค๋ฉด, ๊ทœ์ œํ™”์— ๋Œ€ํ•œ ์ดํ•ด๋Š” ํ•„์ˆ˜์ž…๋‹ˆ๋‹ค!