RAG Deep Dive Series: Chunking Strategies

Part 3: Chunking — The Foundation of Retrieval Quality

Mohamed Elamin

24 Feb 2026 • 18 min read

In Post 2, we built your mental model of RAG architecture. You learned the 5 core components, the two-phase split, and the complete flow from question to answer.

But there's something we glossed over. Something that seems simple on the surface but actually determines whether your RAG system gives brilliant answers or total garbage.

Chunking.

Remember that sick leave policy example from Post 2? The one that worked perfectly? Let me show you what we didn't tell you.

Even worse - if the user asks a different question:

User asks: "Do I need a doctor's note for sick leave?"

System retrieves: Chunk 1 (mentions "sick leave")

LLM sees: "Employees are entitled to 15 days of sick leav"

LLM responds: "Based on the policy, employees are entitled to 15 days of sick leave. I don't see information about doctor's notes in the provided context."

What went wrong:

The information about medical certificates IS in the document (Chunk 3)
But bad chunking split it into a different chunk
That chunk didn't get retrieved
So the LLM gives an incomplete answer without knowing it's incomplete

Same document. Same retrieval system. Different chunking. Completely broken.

This isn't a hypothetical. This is what happens when you naively split text every N characters without thinking about boundaries.

Here's the Thing

Chunking feels like a boring preprocessing step. It's not sexy like embeddings or vector databases. It doesn't have the AI magic of LLMs generating answers.

But chunking is where most RAG systems live or die.

You can have the best embedding model, the fastest vector database, and the most advanced LLM. But if your chunks are broken, you're retrieving garbage. And if you're retrieving garbage? The LLM can't save you.

Remember this from Post 2:

Chunking is the root cause of most retrieval problems. Get it right, and you solve 70% of your RAG issues before they even happen.

What You'll Learn

In this post, we're diving deep into chunking strategies not with code, but with clear explanations of how each strategy works and when to use it.

You'll understand:

The 4 main chunking strategies (fixed-size, recursive, semantic, agentic)
How to choose the right chunk size for your documents
When overlap helps and when it just creates noise
How to handle tables, code blocks, and special content
The decision framework for picking your strategy
Common mistakes that break retrieval

Same promise as always: Starting from zero. No code. Just concepts, analogies, and the mental models you need to make smart decisions.

By the end of this post, you'll understand why chunking matters more than almost any other decision in your RAG pipeline and how to get it right.

Let's start with the fundamentals.

What Is Chunking?
The Chunking Problem
The Four Chunking Strategies
- Strategy 1: Fixed-Size Chunking
- Strategy 2: Recursive Chunking
- Strategy 3: Semantic Chunking
- Strategy 4: Agentic Chunking
Chunk Size: The Goldilocks Problem
Overlap: The Safety Net
Beyond the Basics
The Decision Framework
Key Takeaways
What's Next

What Is Chunking?

Chunking = splitting your documents into smaller pieces before indexing them.

That's it. That's the concept.

But here's why it matters: embedding models and LLMs have token limits.

From Post 2, you know that:

The embedding model converts text to vectors
The vector database stores these vectors
The LLM reads retrieved chunks to generate answers

The problem: You can't just throw an entire 100-page policy manual at an embedding model. It has limits:

Model	Max Tokens
text-embedding-3-small (OpenAI)	8,191 tokens (~6,000 words)
text-embedding-004 (Google)	2,048 tokens (~1,500 words)
embed-v3 (Cohere)	512 tokens (~380 words)

So you have to split. That's chunking.

The Database Analogy

Think of chunking like designing a database schema.

Database Design	Chunking
Defines how data is stored	Defines how knowledge is stored
Affects every query's performance	Affects every retrieval's quality
Hard to change after launch	Hard to re-index thousands of documents
Poor design → slow queries	Poor chunks → wrong answers
Get it right early → smooth sailing	Get it right early → accurate retrieval

Just like you wouldn't design a database without thinking about how you'll query it, you shouldn't chunk documents without thinking about what questions users will ask.

The Chunking Problem

Before we dive into strategies, let's understand what makes chunking hard.

The Three Chunking Goals (That Conflict)

Every chunk should be:

The problem: These goals conflict.

Example:

You can't satisfy all three perfectly.

Chunking is about trade-offs.

What Happens When You Get It Wrong

Let's see the failure modes:

❌ Chunks Too Small:

❌ Chunks Too Large:

❌ Bad Boundaries (Split Mid-Sentence):

The pattern: Bad chunking → wrong retrieval OR incomplete context → poor answers.

The Four Chunking Strategies

There are four main approaches to chunking, each with different trade-offs. Let's explore them from simplest to most sophisticated.

Strategy 1: Fixed-Size Chunking

The approach: Split text every N characters or tokens, regardless of content.

Think of it like: Cutting a cake with a ruler. Every slice is exactly 2 inches, whether it cuts through frosting, filling, or both.

How It Works

The Good

Advantage	Why It Matters
Dead simple	Just count and split no analysis needed
Predictable	Every chunk is roughly the same size
Fast	No computational overhead
Universal	Works on any text, any language

The Bad

When to Use It

Use fixed-size chunking when:

You're prototyping (need something working NOW)
Your content is extremely uniform (e.g., log entries, simple records)
You need predictable chunk sizes for downstream processing
Speed matters more than quality

Don't use it when:

Document structure matters (which is almost always)
You care about retrieval quality
You're going to production

Real talk: Fixed-size chunking is the "hello world" of RAG. It's where you start, not where you stay.

Strategy 2: Recursive Chunking

The approach: Try to split at natural boundaries (paragraphs, then sentences, then words), falling back to smaller units only when needed.

Think of it like: Cutting a cake along the layers. First try to separate by frosting layers. If a layer is too big, cut it at the cake's natural divisions. Only cut through the middle as a last resort.

How It Works

The hierarchy:

Example in Action

Compare to fixed-size (every 100 chars):

Chunk 1: "SICK LEAVE POLICY\n\nEmployees are entitled to 15 days of sick leave per calendar year. Sick leav"
Chunk 2: "e exceeding 3 consecutive days requires a medical certificate. Unused sick leave does not carry o"
Chunk 3: "ver to the next year.\n\nVACATION POLICY\n\nEmployees receive 21 days..."

❌ Breaks sentences, loses structure

The Good

Advantage	Why It Matters
Respects structure	Keeps related content together
Graceful degradation	Falls back intelligently when needed
No broken sentences	Worst case is word-level, not character-level
Configurable	You control the separator hierarchy

The Bad

Limitation	Why It Matters
Still size-based	Eventually splits when size limit is hit
Doesn't understand meaning	A paragraph might contain 3 different topics
Separator-dependent	Only works well on formatted text

When to Use It

Recursive chunking is the default for most production RAG systems. Use it when:

You have structured documents (sections, paragraphs)
You want better quality than fixed-size without complexity
You're building a production system (not just prototyping)

This is the sweet spot. 80% of RAG systems should start here.

Strategy 3: Semantic Chunking

The approach: Split based on meaning, not size or formatting.

Think of it like: Asking an editor to break up an article. They read it, understand the topics, and split where the subject changes regardless of paragraph breaks or word count.

The Problem It Solves

Consider this document:

"""
Our company was founded in 2010 in San Francisco. We started with 
just 5 employees in a small garage. Today, we have over 500 staff
across three continents.

Our mission is to make AI accessible to everyone. We believe that
artificial intelligence should be a tool for empowerment, not 
replacement. Every product we build reflects this philosophy.
"""

Recursive chunking might keep this as one chunk (it's short enough).

But there are two distinct topics:

Company history
Company mission

If someone asks "What's the company's mission?" they get company history as baggage, which dilutes the embedding and might cause the retriever to miss better chunks about mission statements elsewhere.

How Semantic Chunking Works

Result: Each chunk is topically coherent, even though there were no paragraph breaks.

The Good

Advantage	Why It Matters
Topic-aware	Understands what text is about
Format-independent	Works on unstructured text
High-quality chunks	Better retrieval precision

The Bad

Limitation	Why It Matters
Computationally expensive	Must embed every sentence
Slower	Adds significant indexing time
Requires tuning	Similarity threshold affects results
Can miss structure	Ignores formatting cues

When to Use It

Use semantic chunking when:

Your documents are unstructured (no clear sections/paragraphs)
Content quality matters more than indexing speed
You have transcripts, conversations, or stream-of-consciousness text
Recursive chunking isn't giving good results

Don't use it when:

You have well-structured documents (recursive is faster and equally good)
Indexing speed is critical
You're working with thousands of documents

Strategy 4: Agentic Chunking

The approach: Use an LLM to read the document and decide how to chunk it.

Think of it like: Hiring a professional editor to organize your content. They read it, understand the structure and meaning, then chunk it intelligently based on both.

How It Works

Example Prompt

"Given the following document, identify logical chunks where each chunk:
1. Covers a single coherent topic
2. Is self-contained and understandable on its own
3. Is between 100-500 words

Return chunk boundaries with start/end positions."

Document:
[Your document here]

The Good

Advantage	Why
Highest quality	LLM "reads" like a human
Handles edge cases	Can deal with unusual structures
Can add metadata	LLM can summarize or tag each chunk

The Bad

Limitation	Why It Matters
Expensive	LLM API call for every document
Slow	Adds minutes to indexing
Non-deterministic	Same document might chunk differently each time
Overkill	Using a cannon to kill a fly for most documents

When to Use It

Use agentic chunking when:

You have a small number of high-value documents
Documents are complex and unstructured
You need metadata/summaries per chunk
Cost and speed aren't concerns (one-time indexing)

Don't use it when:

You have thousands of documents (cost explodes)
You need fast, deterministic results
Your documents are simple/structured

Strategy Comparison

Strategy	Speed	Quality	Complexity	Best For
Fixed-Size	⚡⚡⚡	⭐	Simple	Prototypes
Recursive	⚡⚡	⭐⭐⭐	Moderate	Production (default)
Semantic	⚡	⭐⭐⭐⭐	Complex	Unstructured text
Agentic	🐢	⭐⭐⭐⭐⭐	Complex	High-value documents

The 80/20 rule: Recursive chunking solves 80% of use cases. Only move to semantic/agentic if you have specific quality issues.

Chunk Size: The Goldilocks Problem

You've picked your chunking strategy. Now you need to decide: how big should each chunk be?

This is where most people get stuck. Because there's no universal answer it depends on your documents, your queries, and your use case.

But there are principles to guide you.

The Trade-Off

The Context vs. Precision Problem

Let's see this trade-off in action with a real example.

Document:

PARENTAL LEAVE POLICY

Eligibility:
- Full-time employees with 1+ year tenure
- Part-time employees with 2+ years tenure

Entitlement:
- Primary caregivers: 16 weeks paid leave
- Secondary caregivers: 6 weeks paid leave

Application Process:
- Submit form HR-305 at least 30 days before expected date
- Attach medical documentation
- Manager approval required
- HR approval required

Benefits During Leave:
- Health insurance continues
- 401k matching paused
- Paid time off accrual paused

User asks: "How do I apply for parental leave?"

Scenario 1: Small Chunks (100-150 tokens each)

Chunk 1: "Eligibility: Full-time employees with 1+ year tenure.
          Part-time employees with 2+ years tenure."

Chunk 2: "Entitlement: Primary caregivers get 16 weeks paid leave.
          Secondary caregivers get 6 weeks paid leave."

Chunk 3: "Application Process: Submit form HR-305 at least 30 days
          before expected date. Attach medical documentation."

Chunk 4: "Manager approval required. HR approval required."

Chunk 5: "Benefits During Leave: Health insurance continues.
          401k matching paused. Paid time off accrual paused."

Query: "How do I apply for parental leave?"

Retrieved (Top-3):

Chunk 3 (high similarity: "application process")
Chunk 4 (medium similarity: "approval")
Chunk 2 (medium similarity: "parental leave")

LLM sees:

[3] Submit form HR-305 at least 30 days before expected date.
    Attach medical documentation.
[4] Manager approval required. HR approval required.
[2] Primary caregivers get 16 weeks paid leave...

LLM response:

"To apply for parental leave, submit form HR-305 at least 30 days 
before your expected date with medical documentation. You'll need 
both manager and HR approval."

Result: Correct answer! But notice Chunk 2 (entitlement) was retrieved even though it's not relevant to "how to apply." That's noise.

Scenario 2: Large Chunks (500+ tokens each)

Chunk 1: [Everything about eligibility + entitlement + application]

"Eligibility: Full-time employees with 1+ year tenure. Part-time 
employees with 2+ years tenure. Entitlement: Primary caregivers 
get 16 weeks paid leave. Secondary caregivers get 6 weeks paid 
leave. Application Process: Submit form HR-305 at least 30 days 
before expected date. Attach medical documentation. Manager 
approval required. HR approval required."

Query: "How do I apply for parental leave?"

Retrieved (Top-1):

Chunk 1 (contains "application process")

LLM sees:

[1] Eligibility: Full-time employees with 1+ year tenure...
    Entitlement: Primary caregivers get 16 weeks...
    Application Process: Submit form HR-305...
    [All the details]

LLM response:

"To apply for parental leave, submit form HR-305 at least 30 days
before your expected date with medical documentation. You'll need
both manager and HR approval.

Note: Full-time employees need 1+ year tenure, part-time need 2+ years.
Primary caregivers get 16 weeks paid, secondary get 6 weeks."

Result: Correct answer, but includes eligibility/entitlement info the user didn't ask for. The signal (application process) is diluted by noise (eligibility rules, leave duration).

Scenario 3: Medium Chunks (300-400 tokens each)

Chunk 1: "Eligibility: Full-time employees with 1+ year tenure.
          Part-time employees with 2+ years tenure.
          
          Entitlement: Primary caregivers get 16 weeks paid leave.
          Secondary caregivers get 6 weeks paid leave."

Chunk 2: "Application Process: Submit form HR-305 at least 30 days
          before expected date. Attach medical documentation.
          Manager approval required. HR approval required.
          
          Benefits During Leave: Health insurance continues.
          401k matching paused. Paid time off accrual paused."

Query: "How do I apply for parental leave?"

Retrieved (Top-1):

Chunk 2 (high similarity: "application process")

LLM sees:

[2] Application Process: Submit form HR-305 at least 30 days before
    expected date. Attach medical documentation. Manager approval
    required. HR approval required.
    
    Benefits During Leave: Health insurance continues. 401k matching
    paused. Paid time off accrual paused.

LLM response:

"To apply for parental leave, submit form HR-305 at least 30 days
before your expected date with medical documentation. You'll need
both manager and HR approval.

During your leave, your health insurance will continue, but 401k
matching and PTO accrual will pause."

Result: Perfect! Application process + relevant adjacent info (what happens during leave). No noise about eligibility.

The Pattern

Chunk Size	Precision	Context	Best For
Small (100-200)	High (very specific)	Low (fragmented)	Simple Q&A, FAQs
Medium (300-600)	Balanced	Balanced	Most use cases
Large (800+)	Low (diluted signal)	High (everything)	Research, summaries

The sweet spot for most RAG systems: 400-600 tokens (~300-450 words).

Tokens vs. Characters vs. Words

Quick conversion guide (approximate, English text):

Tokens	Words	Characters	Approximate Length
100	~75	~400	1 paragraph
200	~150	~800	2 paragraphs
400	~300	~1600	4 paragraphs
500	~375	~2000	5 paragraphs
1000	~750	~4000	2-3 pages

Remember: Embedding models have max token limits. Don't exceed them or your chunks get truncated!

Model	Max Tokens
text-embedding-3-small (OpenAI)	8,191
text-embedding-004 (Google)	2,048
embed-v3 (Cohere)	512

If your target chunk size is 500 tokens but your model maxes at 512, you're cutting it close. Leave headroom.

Overlap: The Safety Net

So far, we've talked about splitting documents. But what if the perfect answer sits right at a chunk boundary?

Example:

Perfect answer spans BOTH chunks, but:

Chunk 1 doesn't mention the approval requirement
Chunk 2 doesn't mention the 2-week notice

If only One Chunk is retrieved → Incomplete answer!

Overlap solves this.

What Is Overlap?

Overlap = including some text from the previous chunk in the next chunk.

Overlap in Action

Original text:

"Vacation requests require 2 weeks advance notice. Requests 
submitted late may be denied. Manager approval is mandatory for 
all requests exceeding 5 consecutive days."

Chunking without overlap (split after "denied"):

Chunk 1: "Vacation requests require 2 weeks advance notice.
          Requests submitted late may be denied."

Chunk 2: "Manager approval is mandatory for all requests
          exceeding 5 consecutive days."

Chunking with 30% overlap:

Now when someone asks: "What happens if I request a long vacation?"

Both chunks are relevant:

Chunk 1 mentions advance notice
Chunk 2 mentions manager approval for long requests
The overlap ensures context continuity

The Good and Bad of Overlap

Aspect	Without Overlap	With Overlap
Storage	Smaller (no redundancy)	Larger (duplicate content)
Edge cases	Risky (can lose context)	Safer (context preserved)
Retrieval	Clean boundaries	Better coverage at boundaries
Cost	Lower	Higher (more vectors to store/search)

How Much Overlap?

Common overlap percentages:

Overlap	When to Use	Notes
0% (None)	Chunks are very clean, no boundary issues	Risk: Missing context at edges
10-20%	Standard use case, want safety net	Sweet spot for most systems ✅
30-50%	Context is critical, edge cases frequent	Risk: Too much redundancy, noise in retrieval
50%+	Almost never	Creates massive redundancy ❌

Recommendation: Start with 10-20% overlap (e.g., 50 token overlap on 500 token chunks). Only increase if you're seeing retrieval issues at chunk boundaries.

When Overlap Doesn't Help

Overlap is not a fix for bad chunking strategy.

Problem: Fixed-size chunking breaks sentences
Solution attempt: Add 50% overlap
Result: Still broken sentences, just duplicated across more chunks!

Better solution: Switch to recursive chunking

Think of overlap as insurance, not a cure. It helps at the margins but won't fix fundamental chunking problems.

Beyond the Basics

The four strategies we covered are starting points, not laws. Let's explore advanced techniques.

Combining Strategies: Recursive + Semantic

The most powerful combination: use recursive chunking for structure, then semantic chunking to refine.

Example:

Document (poorly formatted, no section breaks):

"Our company was founded in 2010 by Jane Smith. We started with 3 
employees and a dream. Today we have 500 staff across 3 continents.
Our mission is to democratize AI. We believe AI should empower 
everyone, not replace them. Every product reflects this."

Recursive chunking: Keeps it all together (no paragraph breaks)

Hybrid approach:

Recursive finds no obvious split points → keeps as one chunk
Semantic analyzes sentence similarity
Detects topic shift: company history → company mission
Splits into two chunks

Result: Clean topical chunks even from unstructured text.

Custom Chunking for Document Types

Different document types need different strategies:

Document Type	Strategy	Why
HR Policies	Recursive (large chunks, 500-800 tokens)	Need full context, policies often have multi-step processes
FAQs	Custom (Q+A pairs)	Question meaningless without answer
API Docs	Code-aware chunking	Keep code examples with explanations
Chat Logs	Small chunks (100-200 tokens, by turn)	Each message is self-contained
Product Catalogs	One product = one chunk	Each product is independent
Legal Contracts	Clause-based chunking	Legal clauses are semantic units
Research Papers	Section-based recursive	Respect academic structure

Handling Special Content

Some content needs special treatment:

Tables:

❌ Bad: Splitting table across chunks
   Chunk 1: Headers
   Chunk 2: Data rows
   → Headers separated from data, meaningless

✅ Good options:
   Option A: Keep entire table as one chunk
   Option B: Convert to prose ("Product X costs $50...")
   Option C: One row = one chunk (if rows are independent)

Code Blocks:

❌ Bad: Splitting a function mid-way
   Chunk 1: def calculate_total(items):
   Chunk 2:     return sum([i.price for i in items])

✅ Good: Keep functions/classes together
   Detect code fences (```) and preserve entire blocks

Lists with Context:

❌ Bad: 
   Chunk 1: "Required documents:"
   Chunk 2: "- Passport\n- Visa\n- Ticket"
   → Header separated from items

✅ Good: Keep header + items together

Pre-Processing: Clean Before You Chunk

Chunking garbage = garbage chunks.

Before chunking, clean your documents:

Real example of why this matters:

Before cleaning:
"Our refund policy is 30 days." (2023 version)
"Our refund policy is 14 days." (2024 version)

Chunks created:
Chunk 1: "30 days" (old)
Chunk 2: "14 days" (current)

User asks: "What's the refund policy?"
Retrieved: BOTH chunks
LLM: "Your refund policy is 14-30 days..." ← WRONG!

After cleaning (remove old version):
Only one chunk with current 14-day policy ✅

The Decision Framework

You're ready to chunk. How do you decide which strategy to use?

Your Chunking Checklist

Before you finalize your chunking approach:

1. Know your documents:

What types? (Policies, FAQs, logs, research papers?)
How structured? (Clear sections vs. freeform text?)
How long? (Pages per document?)
Any special content? (Tables, code, lists?)

2. Know your queries:

How specific? ("What's the sick leave policy" vs "Tell me about leave")
Need full context? (Multi-step processes vs. simple facts)
Looking for exact info? (Numbers, dates) or concepts? (Philosophy, mission)

3. Set your constraints:

Embedding model max tokens?
Indexing speed important?
Storage/cost limitations?
Quality vs. speed trade-off?

4. Test and iterate:

Start with recursive (safe default)
Try a few chunk sizes (300, 500, 800 tokens)
Test with real queries
Measure retrieval quality
Adjust based on results

Quick Reference Table

If your content is...	Use this strategy	With this size	And this overlap
Well-structured docs	Recursive	400-600 tokens	10-20%
Unstructured text	Semantic	300-500 tokens	10%
Simple Q&A pairs	Custom (Q+A)	Variable	0%
Technical docs with code	Recursive + code-aware	500-800 tokens	20%
Chat transcripts	Small fixed	100-200 tokens	0%
Legal documents	Recursive (large)	800-1000 tokens	20%
Mixed/uncertain	Recursive	500 tokens	15%

Key Takeaways

Let's lock in what matters.

The Core Principles

The Mental Model

Think of chunking like organizing books in a library:

Fixed-size: Stack books on shelves until each shelf is exactly full. A 3-volume encyclopedia might end up on 3 different shelves. Fast but breaks related content.
Recursive: Group by category (Fiction → Mystery → Detective), then split when a section gets too big. Respects natural organization but still uses size as the final constraint.
Semantic: Read each book and group by actual themes/topics, even if they're different genres. "Books about grief" might include a memoir, a novel, and a psychology book together because they're semantically related.
Agentic: Hire a professional librarian to organize your entire collection. They understand context, see patterns you'd miss, and make expert decisions about what belongs together.

Questions You Can Now Answer

What's the difference between chunking strategies?

Fixed = by size, Recursive = by structure, Semantic = by meaning, Agentic = by LLM intelligence

Why does chunk size matter?

Too small = lost context, too large = buried signal, sweet spot = balanced

When should I use overlap?

10-20% overlap to handle edge cases at chunk boundaries

Which strategy should I start with?

Recursive chunking with 400-600 token chunks and 10-20% overlap

How do I know if my chunking is working?

Test with real queries, check if retrieved chunks contain the right information

What if recursive chunking isn't working?

First try adjusting chunk size and overlap
If still bad, check if your docs are unstructured → try semantic
If docs are unique/complex → consider agentic for high-value content

What You Don't Need to Worry About Yet

We intentionally left some things out:

❌ How to evaluate chunking quality scientifically (Post 8)
❌ Production optimization techniques (Post 6)

For now, you have the conceptual foundation. Everything else builds on this.

What's Next

You've mastered chunking. Your documents are split into clean, semantically coherent chunks. They're indexed in your vector database.

But here's where things get interesting.

You've prepared the knowledge. Now you need to search it effectively.

Remember from Post 2: the embedding model converts text to vectors, and the vector database searches those vectors. But we glossed over the details.

In the next post, we're diving deep into embeddings and vector databases:

Post 4 Preview: Embeddings & Vector Databases

What Post 4 will cover:

Understanding Embeddings:

How do embeddings actually capture meaning?
Why does "dog" end up close to "puppy" in vector space?
What are the different types of embedding models?
How do you choose the right one for your use case?

Vector Database Deep Dive:

How do vector databases search millions of vectors in milliseconds?
What's the difference between HNSW, IVF, and other indexing algorithms?
When should you use approximate vs. exact search?
What are similarity metrics and which should you use?

The Math (Made Simple):

Cosine similarity explained without the equations
Why dimensionality matters
The curse of dimensionality (and how vector DBs solve it)

Practical Decisions:

Choosing an embedding model (OpenAI vs. Cohere vs. Google vs. open source)
Choosing a vector database (Pinecone vs. Weaviate vs. Qdrant vs. Chroma)
Cost considerations (API fees, storage, compute)
When to self-host vs. use managed services

Why Embeddings Matter

You might think: "I've got good chunks now. Isn't that enough?"

Not quite.

Remember the sick leave example?

User asks: "How many sick days do I get?"

Your chunk: "Employees are entitled to 15 days of sick leave per year."

This only worked because the embedding model understood:

"sick days" ≈ "sick leave"
"How many" ≈ "entitled to 15"
"I get" ≈ "employees are entitled"

Different words, same meaning. That's what embeddings enable.

But not all embedding models are created equal:

Some are better at technical jargon
Some excel at multilingual content
Some are optimized for speed vs. accuracy
Some work better with short text vs. long documents

In Post 4, you'll learn how to choose the right embedding model and vector database for your specific use case.

See You in Post 4

You've built the foundation:

Post 1: You know WHY to use RAG
Post 2: You know HOW RAG works
Post 3: You know how to PREPARE your documents

Next up: Understanding the semantic search engine that makes retrieval work.

Ready to Build Your RAG System?

We help companies build production-grade RAG systems that actually deliver results. Whether you're starting from scratch or optimizing an existing implementation, we bring the expertise to get you from concept to deployment. Let's talk about your use case.

Contact Kalvad | Engineering & Technology Consulting

Get in touch with Kalvad to discuss your engineering, R&D, or technology consulting needs with our expert team.

Engineering & Technology Consulting

Part 3 of the RAG Deep Dive Series | Next up: Embeddings & Vector Databases

Here's the Thing

What You'll Learn

Table of Contents

What Is Chunking?

The Database Analogy

The Chunking Problem

The Three Chunking Goals (That Conflict)

What Happens When You Get It Wrong

The Four Chunking Strategies

Strategy 1: Fixed-Size Chunking

How It Works

The Good

The Bad

When to Use It

Strategy 2: Recursive Chunking

How It Works

Example in Action

The Good

The Bad

When to Use It

Strategy 3: Semantic Chunking

The Problem It Solves

How Semantic Chunking Works

The Good

The Bad

When to Use It

Strategy 4: Agentic Chunking

How It Works

Example Prompt

The Good

The Bad

When to Use It

Strategy Comparison

Chunk Size: The Goldilocks Problem

The Trade-Off

The Context vs. Precision Problem

The Pattern

Tokens vs. Characters vs. Words

Overlap: The Safety Net

What Is Overlap?

Overlap in Action

The Good and Bad of Overlap

How Much Overlap?

When Overlap Doesn't Help

Beyond the Basics

Combining Strategies: Recursive + Semantic

Custom Chunking for Document Types

Handling Special Content

Pre-Processing: Clean Before You Chunk

The Decision Framework

Your Chunking Checklist

Quick Reference Table

Key Takeaways

The Core Principles

The Mental Model

Questions You Can Now Answer

What You Don't Need to Worry About Yet

What's Next

Post 4 Preview: Embeddings & Vector Databases

Why Embeddings Matter

See You in Post 4

Ready to Build Your RAG System?