[Slides]
- Introduction and motivation
- Class structure and logistics
- Language modeling overview
- Definitions
- A brief history of LMs
- LLM fundamentals
- Overview of ways to train, tune, and prompt LLMs
- Fine-tuning, zero-shot prompts, few-shot prompts, chain-of-thought prompts
- Prompt tuning
- Examples of LLM prompts
Part 2: Get your hands dirty with ChatGPT
[Slides]
Create a prompting task in groups.
[Slides]
How can we best evaluate these models for accuracy, fairness, bias, robustness, and other factors?
Speaker: Rishi Bommasani (Stanford)
Title: Holistically Evaluating Language Models on the Path to Evaluating Foundation Models
Part 2: LLMs in Applications
[Slides]
People are increasingly interacting with human-facing tools that
incorporate LLMs, like ChatGPT, writing assistants, and character
generators. How might we go about evaluating these systems and their
impacts on people? In this session we will consider 10 recent commercial and research applications of LLMs.
Students will be asked to come prepared to critique the designs of one of these
applications along different dimensions that we will describe in week 1.
Recommended Readings:
- On the Opportunities and Risks of Foundation Models
- Discovering Language Model Behaviors with Model-Written Evaluations
- All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text
- Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
- How to do human evaluation: A brief introduction to user studies in NLP
- Dynabench: Rethinking Benchmarking in NLP
- Word Embeddings Quantify 100 years of Gender and Ethnic Stereotypes
[Slides]
Speaker: Michiel Bakker (DeepMind)
Title: Fine-tuning Language Models to Find Agreement among Humans with Diverse Preferences
Part 2: Project Pitch
[Slides]
Students present their project idea and form teams.
[Slides]
This talk will cover broad intuitions about how large language models work. First, we will begin by examining some examples of what language models can learn by reading the internet. Second, we will consider why language models have gained traction recently and what new abilities they have that were not present in the past. Third, we will cover how language models can perform complex reasoning tasks. Finally, the talk will discuss how language models can have an improved user interface via instruction following.
Speaker: Jason Wei (OpenAI)
Title: Emergence in Large Language Models
Part 2: NLP Evaluation Methods and Red Teaming
[Slides]
- Emergent Abilities of Large Language Models
- Chain of Thought Prompting Elicits Reasoning in Large Language Models
- Scaling Instruction-Finetuned Language Models
Recommended Readings:
[Slides]
Speaker: Mina Lee (Stanford)
Title: Designing and Evaluating Language Models for Human Interaction
Part 2: Human Experiments and Evaluation Methods
[Slides]
- Evaluating Human-Language Model Interaction
- CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities
Recommended Readings:
Speaker: Ziv Epstein (PhD Student at MIT Media Lab, Human Dynamics)
Title: Social Science Methods for Understanding Generative AI