Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
Tags
- 롤 api dart
- valorant api dart
- docker overview
- dart new 키워드
- swift concurrency
- leetcode dart
- generate parentheses dart
- swift 동시성
- dart
- PlatformException(sign_in_failed
- flutter
- flutter statefulwidget
- widget
- lol api dart
- 파이썬 부동소수점
- keychain error
- flutter widget
- dart.dev
- dart new
- 파이썬
- Architectural overview
- flutter bloc
- tft api dart
- flutter android 폴더
- AnimationController
- com.google.GIDSignIn
- 발로란트 api dart
- riot api dart
- flutter ios 폴더
- 롤토체스 api dart
Archives
- Today
- Total
aspe
ML - Transformer 본문
Problems of Attention-Based Models
Because of forget gate, there is still long dependency broblem.
Attention of source sentence and target sentence itself are ignored
RNN is hard to parallelize.
Transformer
It has self attentions of encoder, decoder
Multi-Head Attention
Applying Softmax function to all Query, Key, Value at one time, distinct features are less prominent.
Apply Softmax function with small range several times.
Layer Normalization & Residual Connection
Normalize Layers across batches. -> Avoids Interval Covariate Shift(AIC)
Residual Connection -> Esay to know how input changed, Ensemble effect
Position Encoding
The multi-head attention network cannot naturally make use of the position of the words in the input sequence.
This problem makes same result with different sequence of sentence.
So Transformer has Positional Encoding
All the photos are from Professor Hak-soo Kim's lecture of Konkuk University's Department of Computer Engineering.
Comments