User Tools

Site Tools


recurrent_neural_network

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
recurrent_neural_network [2022/06/15 02:27]
burkov [Recommended Reading]
recurrent_neural_network [2024/03/30 20:10] (current)
burkov [Recommended Reading]
Line 17: Line 17:
   * [[https://​arxiv.org/​abs/​1701.03452|Simplified Minimal Gated Unit Variations for Recurrent Neural Networks]] by Joel Heck and Fathi Salem (2017)   * [[https://​arxiv.org/​abs/​1701.03452|Simplified Minimal Gated Unit Variations for Recurrent Neural Networks]] by Joel Heck and Fathi Salem (2017)
   * [[https://​arxiv.org/​abs/​1706.03762|Attention Is All You Need]] by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an [[http://​jalammar.github.io/​illustrated-transformer/​|illustrated guide]] plus an [[http://​nlp.seas.harvard.edu/​annotated-transformer/​|annotated paper with code]].   * [[https://​arxiv.org/​abs/​1706.03762|Attention Is All You Need]] by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an [[http://​jalammar.github.io/​illustrated-transformer/​|illustrated guide]] plus an [[http://​nlp.seas.harvard.edu/​annotated-transformer/​|annotated paper with code]].
- +  * [[https://​arxiv.org/​abs/​2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper). 
 +  * [[https://​sebastianraschka.com/​blog/​2023/​llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka. 
 +  * [[https://​thegradient.pub/​mamba-explained/​|Mamba Explained]] by Kola Ayonrinde.