Skip to content

Extraction & Structured Output

AI Extraction lets you pull structured data from unstructured text. Instead of getting freeform responses, you can have the AI return data in a specific format—perfect for forms, document processing, and data pipelines.

Use CaseInputOutput
Form fillingCustomer emailStructured contact info
Document processingPDF invoiceLine items, totals, dates
Data entryNatural languageDatabase records
Content classificationArticle textCategories, tags, sentiment
Entity recognitionAny textPeople, places, organizations

Flow-Like provides two main extraction nodes:

Use when you have a direct text input:

Extract Knowledge
├── Model: (AI model)
├── Prompt: "Extract the person's info from: {text}"
├── Schema: (JSON schema)
├── Done ──▶ (extraction complete)
└── Result ──▶ (structured data)

Use when you want to extract from a conversation:

Extract Knowledge from History
├── Model: (AI model)
├── History: (chat history)
├── Schema: (JSON schema)
├── Done ──▶ (extraction complete)
└── Result ──▶ (structured data)

The schema tells the AI exactly what structure you want. It uses JSON Schema format:

Extract basic contact information:

{
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The person's full name"
},
"email": {
"type": "string",
"description": "Email address"
},
"phone": {
"type": "string",
"description": "Phone number"
}
},
"required": ["name"]
}

Extract a more complex structure:

{
"type": "object",
"properties": {
"order": {
"type": "object",
"properties": {
"id": { "type": "string" },
"date": { "type": "string", "description": "ISO format date" },
"total": { "type": "number" }
}
},
"items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"product": { "type": "string" },
"quantity": { "type": "integer" },
"price": { "type": "number" }
}
}
}
}
}

Constrain values to specific options:

{
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
"description": "Overall sentiment of the message"
},
"priority": {
"type": "string",
"enum": ["low", "medium", "high", "urgent"]
},
"category": {
"type": "string",
"enum": ["billing", "technical", "sales", "other"]
}
}
}

Example: Customer Support Ticket Classifier

Section titled “Example: Customer Support Ticket Classifier”
┌─────────────────────────────────────────────────────────┐
│ │
│ Chat Event │
│ │ │
│ ├──▶ history │
│ │ │
│ ▼ │
│ Extract Knowledge from History │
│ │ │
│ ├── Schema: { │
│ │ "customer_name": "string", │
│ │ "issue_type": ["billing","tech","other"], │
│ │ "priority": ["low","medium","high"], │
│ │ "summary": "string" │
│ │ } │
│ │ │
│ ├── Done │
│ │ │ │
│ │ ▼ │
│ │ Route by issue_type: │
│ │ ├── billing ──▶ Billing Team Flow │
│ │ ├── tech ──▶ Tech Support Flow │
│ │ └── other ──▶ General Support Flow │
│ │ │
│ └── Result ──▶ (structured ticket data) │
│ │
└─────────────────────────────────────────────────────────┘
Read PDF ──▶ Extract Knowledge ──▶ Insert to Database
├── Schema: {
│ "vendor": "string",
│ "invoice_number": "string",
│ "date": "string",
│ "line_items": [...],
│ "subtotal": "number",
│ "tax": "number",
│ "total": "number"
│ }
└── Result ──▶ structured invoice

Add descriptions to help the AI understand what you want:

{
"properties": {
"deadline": {
"type": "string",
"description": "The deadline mentioned, in YYYY-MM-DD format. If no deadline, use null."
}
}
}

Specify which fields are mandatory:

{
"required": ["name", "email"],
"properties": {
"name": { "type": "string" },
"email": { "type": "string" },
"phone": { "type": "string" } // optional
}
}

When data might not be present:

{
"properties": {
"phone": {
"type": ["string", "null"],
"description": "Phone number if mentioned, null otherwise"
}
}
}

When there might be multiple values:

{
"properties": {
"mentioned_people": {
"type": "array",
"items": { "type": "string" },
"description": "All people mentioned in the text"
}
}
}

The extraction result is a structured object. Use it directly in your flows:

Extract Knowledge
└── Result ──▶ Get Property "customer_name" ──▶ (string value)
Extract Knowledge
└── Result ──▶ Get Property "priority"
Branch:
├── "high" ──▶ Alert Team
└── other ──▶ Queue Normally
Extract Knowledge
└── Result ──▶ Insert to Database (directly as record)
{
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"email": { "type": "string" },
"phone": { "type": ["string", "null"] },
"company": { "type": ["string", "null"] }
},
"required": ["first_name", "last_name"]
}
{
"type": "object",
"properties": {
"sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] },
"confidence": { "type": "number", "minimum": 0, "maximum": 1 },
"key_phrases": { "type": "array", "items": { "type": "string" } },
"summary": { "type": "string" }
}
}
{
"type": "object",
"properties": {
"action_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"task": { "type": "string", "description": "What needs to be done" },
"assignee": { "type": ["string", "null"], "description": "Who should do it" },
"deadline": { "type": ["string", "null"], "description": "When it's due (ISO date)" },
"priority": { "type": "string", "enum": ["low", "medium", "high"] }
},
"required": ["task"]
}
}
}
}
{
"type": "object",
"properties": {
"title": { "type": "string" },
"date": { "type": "string" },
"attendees": { "type": "array", "items": { "type": "string" } },
"topics_discussed": { "type": "array", "items": { "type": "string" } },
"decisions": { "type": "array", "items": { "type": "string" } },
"action_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"task": { "type": "string" },
"owner": { "type": "string" }
}
}
},
"next_meeting": { "type": ["string", "null"] }
}
}

Vague: "date": { "type": "string" } Better: "date": { "type": "string", "description": "Date in YYYY-MM-DD format" }

Begin with the most important fields, test, then add more.

Always consider what happens when information isn’t available:

  • Use ["type", "null"] for optional fields
  • Set sensible defaults in your flow logic

Even with schemas, verify critical extractions:

Extract Knowledge
└── Result
Validate Email Format
├── Valid ──▶ Continue
└── Invalid ──▶ Flag for Review

Try extraction with:

  • Minimal information
  • Extra irrelevant information
  • Ambiguous phrasing
  • Multiple possible values

Search documents, then extract structured data:

Vector Search ──▶ Extract Knowledge from Results ──▶ Structured Output

Let agents extract data as part of their workflow:

Agent (with extraction tool) ──▶ Autonomous data collection

Extract info during conversation for personalization:

Chat Event ──▶ Extract Preferences ──▶ Customize Response
  • Check your JSON schema syntax is valid
  • Verify property types match expected data
  • Add clearer descriptions
  • Mark fields as required only if truly necessary
  • Allow null for optional fields
  • Check the source text actually contains the information
  • Use lower temperature (more deterministic)
  • Add examples in descriptions
  • Use enums for constrained values
  • Add “null if not mentioned” to descriptions
  • Use ["type", "null"] for optional fields
  • Validate critical fields after extraction

Now that you can extract structured data: