skip to content
The Hundred-Page Machine Learning Book
User Tools
Register
Log In
Site Tools
Search
Tools
Edit this page
Old revisions
Backlinks
Recent Changes
Media Manager
Sitemap
Register
Log In
>
Recent Changes
Media Manager
Sitemap
self_supervised_learning_word_embeddings
This is an old revision of the document!
Self-Supervised Learning: Word Embeddings
Recommended Reading
word2vec Parameter Learning Explained
by Xin Rong (2016)
Language Models, Word2Vec, and Efficient Softmax Approximations
by Rohan Varma (2017)
Attention Is All You Need
by Vaswani et al. (2017), a state-of-the-art sequence-to-sequence model, plus an
illustrated guide
plus an
annotated paper with code
.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
by Devlin et all (2018)
Improving Language Understanding by Generative Pre-Training
by Radford et all (2018) (the GPT paper)
Language Models are Few-Shot Learners
by Brown et all (2020) (the GPT-3 paper)
Large Language Models are Zero-Shot Reasoners
by Kojima et al. (2022) (the chain of thought (CoT) prompting paper).
Training Compute-Optimal Large Language Models
by Hoffmann et al. (2022), (the Chinchilla paper).
Understanding Large Language Models
by Sebastian Raschka.
Page Tools
Edit this page
Old revisions
Backlinks
Back to top