Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand and generate human-like text based on vast amounts of data. These models, such as GPT (Generative Pre-trained Transformer) series developed by OpenAI, consist of neural networks with millions or even billions of parameters. They are trained on diverse text data from the internet, including books, articles, and websites, to learn the intricacies of language and context. HIGHLY RECCOMEND THESE GUIDES: A Very Gentle Introduction to Large Language Models without the Hype,+ How Large Language Models work: From zero to ChatGPT via Medium for better visual explainations (last accessed 2/15/24).
[from How Large Language Models Work]
Input Encoding: When you provide text input to the LLM, it undergoes an encoding process where the text is converted into numerical representations understandable by the neural network.
Model Architecture: LLMs typically utilize Transformer architecture. Transformers employ self-attention mechanisms, allowing the model to weigh the significance of each word in the input text concerning other words, capturing contextual information effectively.
Training Process: LLMs are pre-trained on large datasets using unsupervised learning techniques. During pre-training, the model learns to predict the next word in a sequence given the previous words. This process enables the model to understand the structure and semantics of the language.
Fine-Tuning: After pre-training, LLMs can be fine-tuned on specific tasks by providing labeled data. For instance, they can be fine-tuned for tasks like language translation, summarization, or sentiment analysis.
Output Generation: Once the model is trained, it can generate text based on a given prompt. The generated text often exhibits coherence and contextuality, mimicking human-like language.
Size of the training dataset.
The models underlying these tools generate their next-word probabilities based on patterns found in billions of words within a collection of preselected texts known as the training data.
Complexity of the next-word determination.
Rather than simply referencing a probability table, large language models like OpenAI’s GPT-3.5 and GPT-4 first perform billions of calculations using parameters that were determined from the training data in order to transform the initial prompt into a prediction about what word might come next. These are often referred to as “neural networks”. They also analyze the semantic structure of the sentences, which factors into their calculations.
Reliance on humans for training.
Without proper guardrails in place, large language models have a tendency to generate toxic content. In order for these tools to provide polite and safe responses, they go through a process called “reinforcement learning from human feedback.” This process requires human workers to manually review AI-generated content and provide feedback, which is then used to further refine the model. Notably, many of these multibillion dollar US-based tech companies use international contract workers, who are exposed to traumatizing content with low financial compensation and minimal mental health support. (See Time‘s article “OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic.”)
[From the AI Pedagogy Project @ metaLAB @ Harvard ]