This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
self_supervised_learning_word_embeddings [2018/11/27 19:10] burkov created |
self_supervised_learning_word_embeddings [2023/12/24 23:49] 135.23.195.80 [Recommended Reading] |
||
---|---|---|---|
Line 8: | Line 8: | ||
* [[https://arxiv.org/pdf/1411.2738v3.pdf|word2vec Parameter Learning Explained]] by Xin Rong (2016) | * [[https://arxiv.org/pdf/1411.2738v3.pdf|word2vec Parameter Learning Explained]] by Xin Rong (2016) | ||
+ | * [[http://rohanvarma.me/Word2Vec/|Language Models, Word2Vec, and Efficient Softmax Approximations]] by Rohan Varma (2017) | ||
+ | * [[https://arxiv.org/abs/1706.03762|Attention Is All You Need]] by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an [[http://jalammar.github.io/illustrated-transformer/|illustrated guide]] plus an [[http://nlp.seas.harvard.edu/annotated-transformer/|annotated paper with code]]. | ||
+ | * [[https://arxiv.org/abs/2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper). | ||
+ | * [[https://sebastianraschka.com/blog/2023/llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka. |