Skip to content

RAG & Knowledge Bases

Retrieval-Augmented Generation (RAG) allows your AI to answer questions using information from your own documents. Instead of relying only on what the model was trained on, RAG fetches relevant content and includes it in the AI’s context.

Without RAGWith RAG
AI only knows training dataAI accesses your documents
Can’t answer company-specific questionsAnswers from your knowledge base
May hallucinate factsCites actual sources
General, generic responsesSpecific, relevant answers

A RAG system in Flow-Like has two phases:

Documents ──▶ Chunk Text ──▶ Embed ──▶ Store in Database
User Question ──▶ Embed Query ──▶ Search Database ──▶ Add to Prompt ──▶ Generate Answer

First, you need to get your documents into Flow-Like:

  1. Upload files to your app’s Storage
  2. Read file contents using Storage nodes
  3. Split into chunks for efficient retrieval

Embeddings are numerical representations that capture the meaning of text. Similar texts have similar embeddings, enabling semantic search.

Use the Load Embedding Model node:

Load Embedding Model
├── Model: (select an embedding model)
└── Result ──▶ (embedding model reference)

Recommended embedding models:

  • text-embedding-3-small (OpenAI) – Fast, affordable
  • text-embedding-3-large (OpenAI) – Higher quality
  • nomic-embed-text (Ollama) – Local, free
  • voyage-2 (VoyageAI) – High quality

Large documents need to be split into smaller pieces. Use Chunk Text:

Chunk Text
├── Text: (your document)
├── Chunk Size: 500
├── Overlap: 50
└── Chunks ──▶ (array of text pieces)
ParameterDescriptionRecommendation
Chunk SizeCharacters per chunk300-1000
OverlapCharacters shared between chunks10-20% of chunk size

For each chunk, create an embedding using Embed Document:

For Each Chunk
Embed Document
├── Document: (chunk text)
├── Model: (embedding model)
└── Vector ──▶ (embedding array)

Flow-Like provides a local vector database for storing and searching embeddings.

Use Open Database to create or connect to a database:

Open Database
├── Name: "my_knowledge_base"
└── Database ──▶ (database connection)

Use Insert or Upsert to store your chunks with their embeddings:

Insert
├── Database: (connection)
├── Data: {
│ "text": "chunk content...",
│ "source": "document.pdf",
│ "page": 5
│ }
├── Vector: (embedding)
└── End

When a user asks a question:

Embed Query
├── Query: "What is our return policy?"
├── Model: (same embedding model!)
└── Vector ──▶ (query embedding)

Use Vector Search to find similar documents:

Vector Search
├── Database: (connection)
├── Vector: (query embedding)
├── Limit: 5
└── Results ──▶ (matching documents)

Now combine the retrieved documents with the user’s question:

Set System Message
├── System: "Answer using ONLY the provided context..."
Push Message (add context)
├── Content: "Context:\n{retrieved documents}"
├── Role: "user"
Push Message (add question)
├── Content: "Question: {user question}"
├── Role: "user"
Invoke LLM ──▶ Answer

Flow-Like supports multiple search strategies:

Finds documents by semantic similarity—great for conceptual questions.

"What's our vacation policy?" → finds "PTO guidelines" document

Finds documents by exact keywords—great for specific terms.

"policy number 12345" → finds documents containing "12345"

Combines vector + full-text for the best of both worlds:

Hybrid Search
├── Vector: (query embedding)
├── Search Term: "vacation policy"
├── Re-Rank: true
└── Results ──▶ (best matches)

The Re-Rank option reorders results for better relevance.

Here’s a full RAG chatbot flow:

Chat Event
├──▶ history
Get Last Message (extract user question)
Embed Query
Hybrid Search (find relevant docs)
Format Context (combine retrieved docs)
Set System Message: "Answer based on context..."
Push Message: (context + question)
Invoke LLM
Push Response ──▶ (stream answer to user)
  • Use smaller chunks (300-500 chars) for precise answers
  • Use larger chunks (800-1000 chars) for more context
  • Consider semantic chunking (by paragraph/section)

Store useful metadata with each chunk:

{
"text": "chunk content",
"source": "employee_handbook.pdf",
"page": 12,
"section": "Benefits",
"updated": "2025-01-15"
}

Tell the AI to use only the provided context:

Answer the user's question using ONLY the information provided in the context.
If the context doesn't contain the answer, say "I don't have information about that."
Always cite your sources.

When the search returns no relevant documents, acknowledge it:

If (results.length == 0)
└── Respond: "I couldn't find relevant information..."

Narrow down results using metadata filters:

Vector Search
├── SQL Filter: "source = 'hr_policies.pdf'"
└── Results (only from HR policies)

Run your indexing flow whenever you have new documents.

Use Upsert instead of Insert—it updates existing records or creates new ones based on a unique ID.

Use Delete with filters to remove outdated content:

Delete
├── Database: (connection)
├── SQL Filter: "source = 'old_document.pdf'"
└── End

Instead of embedding one document at a time, use Embed Documents (plural) for batch processing.

Don’t retrieve too many documents—5-10 is usually enough. More can overwhelm the AI’s context window.

For production systems, hybrid search usually outperforms pure vector search.

10-20% overlap ensures important information at chunk boundaries isn’t lost.

  • Check that your system prompt instructs the AI to use the context
  • Verify documents are being retrieved (log the search results)
  • Ensure the retrieved text is actually being added to the prompt
  • Try different chunk sizes
  • Use hybrid search with re-ranking
  • Check you’re using the same embedding model for indexing and queries
  • Verify your indexing flow ran successfully
  • Check the database name matches between indexing and querying
  • Look for errors in the indexing flow logs

With RAG set up, explore: