ML - Transformer

카테고리 없음

aspe 2021. 12. 14. 06:02

Problems of Attention-Based Models

Because of forget gate, there is still long dependency broblem.

Attention of source sentence and target sentence itself are ignored

RNN is hard to parallelize.

It has self attentions of encoder, decoder

Applying Softmax function to all Query, Key, Value at one time, distinct features are less prominent.

Apply Softmax function with small range several times.

Normalize Layers across batches. -> Avoids Interval Covariate Shift(AIC)

Residual Connection -> Esay to know how input changed, Ensemble effect

The multi-head attention network cannot naturally make use of the position of the words in the input sequence.

This problem makes same result with different sequence of sentence.

So Transformer has Positional Encoding

All the photos are from Professor Hak-soo Kim's lecture of Konkuk University's Department of Computer Engineering.