This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision | |||
|
recurrent_neural_network [2023/12/24 23:47] 135.23.195.80 [Recommended Reading] |
recurrent_neural_network [2024/03/30 20:10] (current) burkov [Recommended Reading] |
||
|---|---|---|---|
| Line 19: | Line 19: | ||
| * [[https://arxiv.org/abs/2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper). | * [[https://arxiv.org/abs/2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper). | ||
| * [[https://sebastianraschka.com/blog/2023/llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka. | * [[https://sebastianraschka.com/blog/2023/llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka. | ||
| + | * [[https://thegradient.pub/mamba-explained/|Mamba Explained]] by Kola Ayonrinde. | ||