You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

13.1 Word embeddings and language models

2 min readjuly 25, 2024

Word embeddings revolutionized NLP by representing words as dense vectors. These compact representations capture , enabling machines to understand language nuances. They're the foundation for many NLP tasks, from to .

Language models take word embeddings further, predicting words based on context. RNNs and transformers are key architectures here. While RNNs excel at sequential data, transformers like and GPT have become game-changers, powering advanced applications in text generation and understanding.

Word Embeddings

Word embeddings in NLP

Top images from around the web for Word embeddings in NLP
Top images from around the web for Word embeddings in NLP
  • Dense vector representations of words capture semantic and syntactic information
  • Convert words to numerical format for machine learning models represent words in continuous vector space
  • Low-dimensional vectors (typically 50-300 dimensions) learned from large text corpora
  • Capture word similarities and relationships reduce dimensionality compared to one-hot encoding enable in NLP tasks
  • Used in text classification, named entity recognition, machine translation, and sentiment analysis

Implementation of word2vec and GloVe

  • model utilizes two main architectures: (CBOW) and
  • Training process uses context words to predict target word (CBOW) or target word to predict context words (Skip-gram)
  • technique improves training efficiency
  • model based on word co-occurrence statistics minimizes difference between dot product of word vectors and log of co-occurrence probability
  • Training process creates co-occurrence matrix factorizes matrix to obtain word vectors
  • Evaluation methods include (word similarity tasks, analogy tasks) and (performance on downstream NLP tasks)

Language Models

Language models with RNNs vs transformers

  • RNNs use hidden state to capture sequential information with input, output, and recurrent connections
  • variants include Long Short-Term Memory () and Gated Recurrent Unit ()
  • RNN training process uses Backpropagation Through Time (BPTT) and Truncated BPTT for long sequences
  • architecture employs self-attention mechanism, multi-head attention, positional encoding, and feed-forward neural networks
  • Transformer training process involves and
  • Evaluation metrics include , (machine translation), and (text summarization)

Applications of BERT and GPT

  • BERT uses bidirectional context understanding pre-trained with Masked Language Modeling (MLM) and Next Sentence Prediction (NSP)
  • GPT employs unidirectional (left-to-right) language modeling with generative capabilities
  • Transfer learning in NLP uses pre-trained models as feature extractors or fine-tunes them for specific tasks
  • Pre-trained models applied to text classification, named entity recognition, question answering, and text generation
  • Challenges include computational resources required, domain-specific , and ethical considerations in using large language models
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary