Skip to content

AI-Powered Analysis

The most powerful data science workflows combine traditional analytics with AI. Flow-Like lets you build AI agents that can query databases, analyze data, and generate insights—all through natural language.

Traditional ApproachAI-Powered Approach
Write SQL queries manuallyAsk questions in plain English
Build fixed dashboardsGenerate dynamic insights
Code data transformationsDescribe what you need
Static reportsConversational exploration

A data science agent has access to your data and can:

  1. Query databases using SQL
  2. Analyze results and identify patterns
  3. Create visualizations as charts
  4. Train ML models and make predictions
  5. Explain findings in plain language
Make Agent
├── Model: (a capable model like GPT-4 or Claude)
└── Agent ──▶ (agent object)
Set Agent System Prompt
├── Agent: (from step 1)
├── System Prompt:
│ "You are a data analyst assistant. You have access to:
│ - SQL tools to query the data warehouse
│ - Visualization tools to create charts
│ When analyzing data:
│ 1. First understand what tables are available
│ 2. Write SQL to answer the user's question
│ 3. Summarize findings in plain language
│ 4. Create visualizations when helpful
│ Always explain your reasoning."
└── Agent ──▶ (configured agent)

Give the agent access to your DataFusion session:

Add SQL Session
├── Agent: (configured agent)
├── Session: (DataFusion session with tables)
└── Agent ──▶ (agent with SQL access)

Add your own Flow-Like flows as tools:

Add Flow Tools
├── Agent: (agent)
├── Flows: [
│ "create_chart",
│ "train_model",
│ "export_report"
│ ]
└── Agent ──▶ (agent with custom tools)
Invoke Agent Streaming
├── Agent: (fully configured)
├── History: "What were our top 5 products last quarter?"
├── On Chunk ──▶ Push Chunk (stream response)
└── Done ──▶ Final response

User: “What were our sales trends last month?”

Agent thinks: I need to query the sales table, aggregate by day, and analyze the trend.

Agent executes:

SELECT DATE_TRUNC('day', order_date) as day,
SUM(amount) as daily_sales
FROM orders
WHERE order_date >= '2025-01-01'
AND order_date < '2025-02-01'
GROUP BY 1
ORDER BY 1

Agent responds:

“Sales in January showed an upward trend, starting at $45,000 on January 1st and reaching $78,000 by January 31st—a 73% increase. The strongest days were Fridays, averaging $65,000 compared to $42,000 on Mondays.

Here’s the daily trend:

type: line
title: January 2025 Daily Sales
---
day,sales
2025-01-01,45000
2025-01-02,48000
...
```"

When you add a SQL session to an agent, it gains these capabilities:

The agent can run any SQL query:

SELECT customer_segment,
COUNT(*) as customers,
AVG(lifetime_value) as avg_ltv
FROM customers
GROUP BY customer_segment
ORDER BY avg_ltv DESC

The agent can discover what tables and columns exist:

-- What tables are available?
SHOW TABLES
-- What columns are in this table?
DESCRIBE sales

Window functions, CTEs, joins—the agent can write sophisticated queries:

WITH monthly_sales AS (
SELECT DATE_TRUNC('month', date) as month,
product_category,
SUM(revenue) as revenue
FROM sales
GROUP BY 1, 2
)
SELECT month, product_category, revenue,
revenue - LAG(revenue) OVER (
PARTITION BY product_category
ORDER BY month
) as month_over_month_change
FROM monthly_sales

Build custom capabilities as Flow-Like flows:

┌────────────────────────────────────────────────────────────┐
│ Flow: create_chart │
│ │
│ Inputs: │
│ - data (string): CSV data │
│ - chart_type (string): bar, line, pie, etc. │
│ - title (string): Chart title │
│ │
│ Flow: │
│ Format Markdown ──▶ Return chart block │
│ │
│ Output: │
│ - chart (string): Markdown with nivo/plotly block │
│ │
└────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────┐
│ Flow: predict_churn │
│ │
│ Inputs: │
│ - customer_id (string): Customer to predict for │
│ │
│ Flow: │
│ Lookup Customer ──▶ Load Model ──▶ Predict │
│ │
│ Output: │
│ - prediction (object): {churn_risk: 0.75, factors: []} │
│ │
└────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────┐
│ Flow: export_report │
│ │
│ Inputs: │
│ - title (string): Report title │
│ - content (string): Report markdown │
│ - format (string): pdf, csv, html │
│ │
│ Flow: │
│ Generate Report ──▶ Save to Storage ──▶ Return URL │
│ │
│ Output: │
│ - download_url (string): Link to report │
│ │
└────────────────────────────────────────────────────────────┘

Here’s a complete flow for a data analytics chat assistant:

┌─────────────────────────────────────────────────────────────┐
│ │
│ App Setup (runs once): │
│ │
│ Create DataFusion Session │
│ │ │
│ ▼ │
│ Register PostgreSQL (production database) │
│ │ │
│ ▼ │
│ Mount CSV (reference data) │
│ │ │
│ ▼ │
│ Store Session in Variable │
│ │
├─────────────────────────────────────────────────────────────┤
│ │
│ Chat Event Handler: │
│ │
│ Chat Event │
│ │ │
│ ├──▶ history │
│ │ │
│ ▼ │
│ Make Agent (Claude 3.5 Sonnet) │
│ │ │
│ ▼ │
│ Set System Prompt: "You are a data analyst..." │
│ │ │
│ ▼ │
│ Add SQL Session (from variable) │
│ │ │
│ ▼ │
│ Add Flow Tools: [create_chart, export_csv] │
│ │ │
│ ▼ │
│ Add Thinking Tool │
│ │ │
│ ▼ │
│ Invoke Agent Streaming │
│ │ │
│ ├── On Chunk ──▶ Push Chunk │
│ │ │
│ └── Done ──▶ Log completion │
│ │
└─────────────────────────────────────────────────────────────┘

User prompts:

  • “Show me sales by region for last quarter”
  • “Which products have declining sales?”
  • “Compare this year to last year”

User prompts:

  • “Generate a weekly sales report”
  • “Create an executive summary of Q4 performance”
  • “Export the top 100 customers to CSV”

User prompts:

  • “Which customers are at risk of churning?”
  • “Predict next month’s revenue”
  • “What factors drive customer lifetime value?“

User prompts:

  • “Are there any anomalies in yesterday’s data?”
  • “Check for duplicate records”
  • “Find missing values in the customer table”

Include table descriptions in your system prompt:

You have access to these tables:
- orders: Order transactions (id, customer_id, amount, date)
- customers: Customer info (id, name, segment, join_date)
- products: Product catalog (id, name, category, price)
When analyzing data:
1. First understand the question
2. Check what data is available
3. Write and execute SQL
4. Summarize key findings
5. Suggest visualizations or next steps
For queries that might return many rows:
- Always use LIMIT unless explicitly asked for all data
- Summarize results instead of showing raw data
- Offer to export large datasets to files

Add the Thinking Tool for complex analysis:

Add Thinking Tool
├── Agent: (your agent)
└── Agent ──▶ (agent with step-by-step reasoning)
  • Use read-only database connections when possible
  • Limit which tables the agent can access
  • Log all queries for audit purposes

Agents can leverage ML models you’ve trained:

Create a flow that loads and runs a saved model:

Flow: predict_with_model
├── Input: features (array)
├── Load ML Model (saved model)
├── Predict
└── Output: prediction

Let the agent trigger model training:

Flow: train_classifier
├── Input: table_name, target_column
├── Query Data
├── Split Dataset
├── Fit Decision Tree
├── Evaluate
├── Save Model
└── Output: accuracy, model_path
  • Include table schemas in the system prompt
  • Add examples of correct queries
  • Use models known for good SQL (GPT-4, Claude)
  • Verify tools are properly connected
  • Mention available tools in the system prompt
  • Try more explicit user prompts
  • Use streaming to show progress
  • Set query timeouts
  • Consider caching frequent queries
  • Require the agent to always query before stating facts
  • Include verification steps in the system prompt
  • Log and validate SQL before execution

Combine AI-powered analysis with: