Transformer Models

Edit me

Note: Please utilize the template below as a reference for your contribution. Adapt the template when deemed necessary

What are Transformers?

Brief description.

Recommended Path for Learning

This talk by Leo Dirac is a great starting point for those familiar with other neural language models. He gives a brief recap on the history of bag-of-words, RNNs, and LSTMs before diving into what makes the Transformer architecture so special.

This blogpost provides a high level overview of the Transformer architecture highlighting the importance of features like positional encoding.
The illustrated Transformer.
If you want to begin to get your hands dirty working with these models, here’s a google colab notebook that implements the original Transformer model from the Attention is all you need paper.
Attention is all you need notebook.

Further Learning

Video

Video 1
Video 2

Applied papers

Paper 1
Paper 2

Online tutorials

Online tutorial 1
Online tutorial 2

Theory papers

Paper 1
Paper 2