Differences

This shows you the differences between two versions of the page.

--- self_supervised_learning_word_embeddings [2023/12/25 00:04]
135.23.195.80 [Recommended Reading]
+++ self_supervised_learning_word_embeddings [2023/12/25 06:56]
burkov [Recommended Reading]
@@ Line 10: / Line 10: @@
   * [[http://rohanvarma.me/Word2Vec/|Language Models, Word2Vec, and Efficient Softmax Approximations]] by Rohan Varma (2017)
   * [[https://arxiv.org/abs/1706.03762|Attention Is All You Need]] by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an [[http://jalammar.github.io/illustrated-transformer/|illustrated guide]] plus an [[http://nlp.seas.harvard.edu/annotated-transformer/|annotated paper with code]].
-  * [[https://arxiv.org/abs/2205.11916|Large Language Models are Zero-Shot Reasoners]] by Kojima et al. (2022).
+  * [[https://arxiv.org/abs/1810.04805|BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]] by Devlin et all (2018)
+  * [[https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf|Improving Language Understanding by Generative Pre-Training]] by Radford et all (2018) (the GPT paper)
+  * [[https://arxiv.org/abs/1907.11692|RoBERTa: A Robustly Optimized BERT Pretraining Approach]] by Liu et all (2019)
+  * [[https://arxiv.org/abs/1910.13461|BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension]] by Lewis et all (2019)
+  * [[https://arxiv.org/abs/2005.14165|Language Models are Few-Shot Learners]] by Brown et all (2020) (the GPT-3 paper)
+  * [[https://arxiv.org/abs/2205.11916|Large Language Models are Zero-Shot Reasoners]] by Kojima et al. (2022) (the chain of thought (CoT) prompting paper).
   * [[https://arxiv.org/abs/2203.15556|Training Compute-Optimal Large Language Models]] by Hoffmann et al. (2022), (the Chinchilla paper).
   * [[https://sebastianraschka.com/blog/2023/llm-reading-list.html|Understanding Large Language Models]] by Sebastian Raschka.

The Hundred-Page Machine Learning Book

User Tools

Site Tools

Differences

Page Tools