|
Hot take: every time I talk about RAG, I talk about vector databases.
Every. Single. Time.
A guy I follow, Owain Lewis, wrote about 6 different types of RAG recently, and it hit me. He's right. RAG just means retrieval augmented generation. The retrieval part is the variable. I've been talking like there's only one way to do it, and that's not the full picture.
So let me fix that.
RAG is simply: give the LLM information it didn't have before.
Here are three real ways that works in production, and I've actually built all three:
1. Vector databases (the one I always talk about)
Embed your documents, store the vectors, find semantically similar chunks at query time, pass them to the model. Most popular choice in production. Both companies where I've built RAG systems from the ground up used this. It works really well for unstructured text and it scales.
2. SQL as retrieval
Genuinely underrated. Give a good model enough context about your schema - tables, joins, relationships - and it can construct a SQL query from plain English. "All users in the US with over 500 orders" becomes a real query. Model writes it, you run it, results go into context. That's RAG.
Zero vectors.
3. The naive document approach (less dumb than it sounds)
Here's a simple one: at a previous company we were classifying influencers and deciding whether someone's bio and content fit our criteria.
The rules were constantly changing. The business team had all these stipulations in their heads: don't use influencers selling products, skip the hair tutorial accounts or boating accounts... seriously.
We could've tried to bake all that into a system prompt and redeploy every time something changed. Instead we had them keep a running Notion doc of their ever-evolving definitions, edge cases and weird rules.
At classification time, we'd read that doc and feed it into context alongside the influencer's bio.
That's it. No vectors. No database. Just a Notion doc as the retrieval layer. And it worked because the right people could update it without touching the codebase.
So why do I keep defaulting to vector databases?
Because it's the one that actually needs teaching.
SQL you probably already know. Notion docs you definitely already know. But vector databases? Weird mental model. Non-obvious pitfalls. Chunking strategies, embedding models, similarity thresholds. None of this maps to anything most developers have touched before. That's why it gets the most airtime. It's not that the other approaches don't matter. It's that they're much easier to figure out once you already get the core concept.
And the core concept is simple: pass the model relevant context it wouldn't otherwise have. Once that clicks, the implementation is just details.
I'm heading to Vegas this week.
This Sunday I'll be celebrating 12 years of sobriety. You might be wondering what a former addict does in Vegas and the answer is I gain about 10 pounds and read terrible mystery novels.
We're also dragging the family to Death Valley, the hottest place on earth. Gonna do100 burpees in the desert in a sweatshirt to lose that 10 pounds. Will report back on how that goes.
When I get back, we're starting the next cohort.
It's small. A few spots still open. We will be really going off the rails in this cohort: I'm adding multi-modal RAG, evals and hybrid search. We're building everything live as a class - breaking things, fixing things, understanding why they broke. Real engineering.
Let's not beat around the bush. Price is $1,997. We meet next Saturday at 11am PST to kick off and then 3 more Saturdays after that. Office hours, guest speakers. But wait! There's more...
$97 to hold your spot and we can work out the remaining balance in 2 or 3 payments. Money back no questions asked if you don't find it valuable.
|