Edit me

What are Transformers?

Brief description.

  • This talk by Leo Dirac is a great starting point for those familiar with other neural language models. He gives a brief recap on the history of bag-of-words, RNNs, and LSTMs before diving into what makes the Transformer architecture so special.
  • This blogpost provides a high level overview of the Transformer architecture highlighting the importance of features like positional encoding.
    The illustrated Transformer.

  • If you want to begin to get your hands dirty working with these models, here’s a google colab notebook that implements the original Transformer model from the Attention is all you need paper.
    Attention is all you need notebook.

Further Learning

Video

  • Video 1
  • Video 2

Applied papers

  • Paper 1
  • Paper 2

Online tutorials

  • Online tutorial 1
  • Online tutorial 2

Theory papers

  • Paper 1
  • Paper 2