skip to content
The Hundred-Page Machine Learning Book
User Tools
Register
Log In
Site Tools
Search
Tools
Edit this page
Old revisions
Backlinks
Recent Changes
Media Manager
Sitemap
Register
Log In
>
Recent Changes
Media Manager
Sitemap
self_supervised_learning_word_embeddings
Self-Supervised Learning: Word Embeddings
Recommended Reading
word2vec Parameter Learning Explained
by Xin Rong (2016)
Language Models, Word2Vec, and Efficient Softmax Approximations
by Rohan Varma (2017)
Attention Is All You Need
by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an
illustrated guide
plus an
annotated paper with code
.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
by Devlin et all (2018)
Improving Language Understanding by Generative Pre-Training
by Radford et all (2018) (the GPT paper)
RoBERTa: A Robustly Optimized BERT Pretraining Approach
by Liu et all (2019)
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
by Lewis et all (2019)
Language Models are Few-Shot Learners
by Brown et all (2020) (the GPT-3 paper)
Large Language Models are Zero-Shot Reasoners
by Kojima et al. (2022) (the chain of thought (CoT) prompting paper).
Training Compute-Optimal Large Language Models
by Hoffmann et al. (2022), (the Chinchilla paper).
Understanding Large Language Models
by Sebastian Raschka.
Edit
Page Tools
Edit this page
Old revisions
Backlinks
Back to top