Note: Please utilize the template below as a reference for your contribution. Adapt the template when deemed necessary
What are Transformers?
Brief description.
Recommended Path for Learning
- This talk by Leo Dirac is a great starting point for those familiar with other neural language models. He gives a brief recap on the history of bag-of-words, RNNs, and LSTMs before diving into what makes the Transformer architecture so special.
-
This blogpost provides a high level overview of the Transformer architecture highlighting the importance of features like positional encoding.
The illustrated Transformer. -
If you want to begin to get your hands dirty working with these models, here’s a google colab notebook that implements the original Transformer model from the Attention is all you need paper.
Attention is all you need notebook.
Further Learning
Video
- Video 1
- Video 2
Applied papers
- Paper 1
- Paper 2
Online tutorials
- Online tutorial 1
- Online tutorial 2
Theory papers
- Paper 1
- Paper 2