Natural Language Processing (NLP) Fundamentals:
- Text Representation: NLP starts with representing text data in a format that computers can understand. This includes tokenization, stemming, lemmatization, and vectorization.
- Tokenization: Breaking down text into individual words or tokens.
- Stemming and Lemmatization: Reducing words to their base form (e.g., "running" becomes "run").
- Vectorization: Converting text data into numerical vectors for processing.
- Language Models: Statistical models that predict the probability of a word given its context.