Differences

This shows you the differences between two versions of the page.

--- recurrent_neural_network [2022/06/15 02:27]
burkov [Recommended Reading]
+++ recurrent_neural_network [2024/03/30 20:10] (current)
burkov [Recommended Reading]
@@ Line 17: / Line 17: @@
   * [[https://arxiv.org/abs/1701.03452|Simplified Minimal Gated Unit Variations for Recurrent Neural Networks]] by Joel Heck and Fathi Salem (2017)
   * [[https://arxiv.org/abs/1706.03762|Attention Is All You Need]] by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an [[http://jalammar.github.io/illustrated-transformer/|illustrated guide]] plus an [[http://nlp.seas.harvard.edu/annotated-transformer/|annotated paper with code]].
+  * [[https://arxiv.org/abs/2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper).
+  * [[https://sebastianraschka.com/blog/2023/llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka.
+  * [[https://thegradient.pub/mamba-explained/|Mamba Explained]] by Kola Ayonrinde.

The Hundred-Page Machine Learning Book