What Makes an AI Product Different
"Traditional software is a vending machine. AI software is a jazz musician — capable, but unpredictable."
Regular software is deterministic: the same input always produces the same output. You can write a unit test for every function. AI changes this fundamentally. The same prompt can produce different outputs each time. There's no single "correct" answer to test against. Building AI products requires a completely different philosophy: you're engineering for distributions of outputs, not fixed behaviors.
Evaluations (Evals)
"An AI feature without evals is a car without a speedometer — you're moving but you have no idea how fast or safely."
Evals are the AI equivalent of unit tests. You define a set of inputs and what "good" output looks like, then run them automatically to measure model performance. Evals can be rule-based (does the output contain X?), model-graded (have another AI judge the quality), or human-evaluated. Running evals before and after every model or prompt change is non-negotiable for production AI.
Deterministic vs Probabilistic
"Temperature is the AI's creativity dial — at 0 it's a calculator, at 1 it's an improv comedian."
Temperature is the key parameter controlling AI output variability. At temperature 0, the model always picks the most probable next token — deterministic, consistent, great for factual tasks. At temperature 1, it samples from the full probability distribution — creative, varied, but less predictable. Most production systems use 0.2–0.5 to balance consistency with naturalness.
RAG — Retrieval-Augmented Generation
"RAG is how you give AI a memory it can actually trust — connected to your real data, not its training."
RAG (Retrieval-Augmented Generation) is the standard architecture for AI products that need to answer questions about specific knowledge. The process: (1) Convert documents to embeddings and store in a vector database. (2) When a user asks a question, retrieve the most relevant document chunks. (3) Inject those chunks into the LLM prompt as context. (4) LLM answers using retrieved context. This prevents hallucination and keeps answers current.
Vector Databases and Embeddings
"A vector database doesn't store your data — it stores the meaning of your data."
LLMs can convert any text into a high-dimensional numerical vector (embedding) that encodes semantic meaning. Similar texts get similar vectors. A vector database stores millions of these embeddings and can instantly find the most semantically similar ones to a query — powering RAG, recommendations, and semantic search. Popular options: Pinecone, Weaviate, Chroma, pgvector.
AI Agents
"An agent isn't a smarter chatbot — it's an AI that can take actions in the world and adapt based on results."
AI agents use the agent loop: Perceive (observe the environment), Reason (plan what to do), Act (take an action with a tool), Observe (check the result), and repeat. Unlike single-turn LLM calls, agents persist across multiple steps and can use tools — browsing the web, running code, querying databases, sending emails. This makes them capable of completing complex, multi-step tasks autonomously.
Reliability and Observability
"You can't improve what you can't measure — and in AI products, you need to measure constantly."
Production AI systems fail in new ways: they hallucinate, produce inconsistent outputs, get slower under load, and degrade as the underlying model changes. Observability means logging every AI interaction, monitoring latency and error rates, tracking output quality metrics over time, and setting up alerts for anomalies. Tools like LangSmith, Arize, and Weights & Biases help teams see what's happening inside their AI pipelines.
Designing for Failure
"AI products that fail gracefully feel trustworthy. AI products that fail silently destroy trust forever."
Great AI product design anticipates failure. For hallucinations: show confidence scores, add citations, or surface a "I'm not sure about this" disclaimer. For timeouts: show a loading state and offer a retry. For off-topic responses: have a graceful redirect. The best AI products are designed with the assumption that the AI will be wrong sometimes, and the UX handles that case with care.
The AI PM / Designer Role
"The best AI builders are the ones who understand the technology deeply enough to know where it breaks."
Working on AI products requires a unique blend of skills. PMs and designers need to understand context windows and their limits, how to write and evaluate prompts, basic eval methodology, the difference between deterministic and probabilistic systems, and ethical implications of their design choices. The most valuable skill: knowing when not to use AI, and when to add human oversight.
Real-World Case Studies
"Every AI product success story is really a story about good evals, good data, and the courage to ship incrementally."
Netflix uses AI for personalized thumbnail selection — the same movie shows different artwork to different users based on their viewing patterns. Spotify's "Discover Weekly" uses collaborative filtering plus LLMs for playlist narration. Linear uses AI to auto-label issues and suggest duplicates. What these share: well-defined evals, A/B testing, and AI augmenting human workflows rather than replacing them.
You've finished AI Products!
You now understand evals, RAG, agents, observability, and how to design AI products that users can trust.
Continue: Claude Tips →