Inference
Inference is what happens every time you use an AI — you send a prompt, and the trained model generates a response. Unlike training, inference is fast and relatively cheap. It's the everyday use of a model that was built once but can be used millions of times. When you pay for an AI API or subscription, you're paying for inference.
Videos explaining this concept
E006Notes on AI
Training vs Using a Model
Training and Inference are the two distinct phases of an AI model's lifecycle.
E008Notes on AI
Is AI a Student or an Actor
AI outputs are completions — continuations of patterns, not fact-checked answers retrieved from a database. Like an improv actor following the "yes, and" rule, the model streams tokens to maintain the flow and never stops the scene. Modern models are trained to play the role of a truthful, helpful assistant, but the underlying mechanism is the same: predicting what comes next — and the model is performing truthfulness, not accessing it.
E010Notes on AI
The 5-Sentence Mental Model of GenAI
This episode provides a checkpoint after the foundational episodes, compressing the key concepts into five memorable sentences that serve as a mental compass for AI.