User Tools

Site Tools


self_supervised_learning_word_embeddings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
self_supervised_learning_word_embeddings [2018/11/28 00:16]
burkov
self_supervised_learning_word_embeddings [2023/12/25 06:56] (current)
burkov [Recommended Reading]
Line 9: Line 9:
   * [[https://​arxiv.org/​pdf/​1411.2738v3.pdf|word2vec Parameter Learning Explained]] by Xin Rong (2016)   * [[https://​arxiv.org/​pdf/​1411.2738v3.pdf|word2vec Parameter Learning Explained]] by Xin Rong (2016)
   * [[http://​rohanvarma.me/​Word2Vec/​|Language Models, Word2Vec, and Efficient Softmax Approximations]] by Rohan Varma (2017)   * [[http://​rohanvarma.me/​Word2Vec/​|Language Models, Word2Vec, and Efficient Softmax Approximations]] by Rohan Varma (2017)
 +  * [[https://​arxiv.org/​abs/​1706.03762|Attention Is All You Need]] by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an [[http://​jalammar.github.io/​illustrated-transformer/​|illustrated guide]] plus an [[http://​nlp.seas.harvard.edu/​annotated-transformer/​|annotated paper with code]].
 +  * [[https://​arxiv.org/​abs/​1810.04805|BERT:​ Pre-training of Deep Bidirectional Transformers for Language Understanding]] by Devlin et all (2018)
 +  * [[https://​s3-us-west-2.amazonaws.com/​openai-assets/​research-covers/​language-unsupervised/​language_understanding_paper.pdf|Improving Language Understanding by Generative Pre-Training]] by Radford et all (2018) (the GPT paper)
 +  * [[https://​arxiv.org/​abs/​1907.11692|RoBERTa:​ A Robustly Optimized BERT Pretraining Approach]] by Liu et all (2019)
 +  * [[https://​arxiv.org/​abs/​1910.13461|BART:​ Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation,​ and Comprehension]] by Lewis et all (2019)
 +  * [[https://​arxiv.org/​abs/​2005.14165|Language Models are Few-Shot Learners]] by Brown et all (2020) (the GPT-3 paper)
 +  * [[https://​arxiv.org/​abs/​2205.11916|Large Language Models are Zero-Shot Reasoners]] by Kojima et al. (2022) (the chain of thought (CoT) prompting paper).
 +  * [[https://​arxiv.org/​abs/​2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper).
 +  * [[https://​sebastianraschka.com/​blog/​2023/​llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka.