User Tools

Site Tools


recurrent_neural_network

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
recurrent_neural_network [2019/08/01 20:53]
burkov
recurrent_neural_network [2023/12/24 23:47]
135.23.195.80 [Recommended Reading]
Line 6: Line 6:
  
 ===== Recommended Reading ===== ===== Recommended Reading =====
-ddd+
   * [[https://​www.dropbox.com/​s/​ouj8ddydc77tewo/​ExtendedChapter6.pdf?​dl=0|An extended version of Chapter 6 with RNN unfolding and bidirectional RNN]]   * [[https://​www.dropbox.com/​s/​ouj8ddydc77tewo/​ExtendedChapter6.pdf?​dl=0|An extended version of Chapter 6 with RNN unfolding and bidirectional RNN]]
   * [[http://​karpathy.github.io/​2015/​05/​21/​rnn-effectiveness/​|The Unreasonable Effectiveness of Recurrent Neural Networks]] by Andrej Karpathy (2015)   * [[http://​karpathy.github.io/​2015/​05/​21/​rnn-effectiveness/​|The Unreasonable Effectiveness of Recurrent Neural Networks]] by Andrej Karpathy (2015)
Line 16: Line 16:
   * [[http://​www.wildml.com/​2015/​10/​recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/​|Implementing a GRU/LSTM RNN with Python and Theano]] by Denny Britz (2015)   * [[http://​www.wildml.com/​2015/​10/​recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/​|Implementing a GRU/LSTM RNN with Python and Theano]] by Denny Britz (2015)
   * [[https://​arxiv.org/​abs/​1701.03452|Simplified Minimal Gated Unit Variations for Recurrent Neural Networks]] by Joel Heck and Fathi Salem (2017)   * [[https://​arxiv.org/​abs/​1701.03452|Simplified Minimal Gated Unit Variations for Recurrent Neural Networks]] by Joel Heck and Fathi Salem (2017)
- +  * [[https://​arxiv.org/​abs/​1706.03762|Attention Is All You Need]] by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an [[http://​jalammar.github.io/​illustrated-transformer/​|illustrated guide]] plus an [[http://​nlp.seas.harvard.edu/​annotated-transformer/​|annotated paper with code]]. 
 +  * [[https://​arxiv.org/​abs/​2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper). 
 +  * [[https://​sebastianraschka.com/​blog/​2023/​llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka.