Recurrent Neural Network
===== Recommended Reading =====
* [[https://www.dropbox.com/s/ouj8ddydc77tewo/ExtendedChapter6.pdf?dl=0|An extended version of Chapter 6 with RNN unfolding and bidirectional RNN]]
* [[http://karpathy.github.io/2015/05/21/rnn-effectiveness/|The Unreasonable Effectiveness of Recurrent Neural Networks]] by Andrej Karpathy (2015)
* [[https://towardsdatascience.com/recurrent-neural-networks-and-lstm-4b601dd822a5|Recurrent Neural Networks and LSTM]] by Niklas Donges (2018)
* [[http://colah.github.io/posts/2015-08-Understanding-LSTMs/|Understanding LSTM Networks]] by Christopher Olah (2015)
* [[http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/|Introduction to RNNs]] by Denny Britz (2015)
* [[http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/|Implementing a RNN with Python, Numpy and Theano]] by Denny Britz (2015)
* [[http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/|Backpropagation Through Time and Vanishing Gradients]] by Denny Britz (2015)
* [[http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/|Implementing a GRU/LSTM RNN with Python and Theano]] by Denny Britz (2015)
* [[https://arxiv.org/abs/1701.03452|Simplified Minimal Gated Unit Variations for Recurrent Neural Networks]] by Joel Heck and Fathi Salem (2017)
* [[https://arxiv.org/abs/1706.03762|Attention Is All You Need]] by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an [[http://jalammar.github.io/illustrated-transformer/|illustrated guide]] plus an [[http://nlp.seas.harvard.edu/annotated-transformer/|annotated paper with code]].
* [[https://arxiv.org/abs/2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper).
* [[https://sebastianraschka.com/blog/2023/llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka.
* [[https://thegradient.pub/mamba-explained/|Mamba Explained]] by Kola Ayonrinde.