Skip to content

Add DataFusion Node

AI/Agents/Builder

Add DataFusion

Add a DataFusion SQL session to an agent for data analysis capabilities

add_datafusion_to_agentllm
Inputs7
Outputs2
Security exposure2/10
Packagellm

Ratings

Scores range from 0 to 10. Higher values mean more impact, exposure, or operational weight.

SecurityAttack surface and exposure impact.
2/10High
PrivacyPotential sensitivity of processed data.
2/10High
PerformanceRuntime or resource pressure.
3/10High
GovernancePolicy, audit, or compliance impact.
2/10High
ReliabilityOperational stability considerations.
2/10High
CostExternal or compute cost impact.
1/10High

Input Pins

7

Add

Execution
exec_in

Trigger execution

Agent

Struct
agent

Agent to add DataFusion context to

AgentAgent16 fields
modelBitrequired

The LLM model id backing this agent

idstring
default ""
typeBitTypes
enum "Llm", "Vlm", "Tts", "Stt"...default "Other"
metaMap<string, Metadata>
default {}
*Metadatamap value
namestringrequired
descriptionstringrequired
long_descriptionstring | null
release_notesstring | null
tagsArray<string>required
+11 more fields
authorsArray<string>
default []
itemsstringarray item
repositorystring | null
default null
+14 more fields
model_display_namestring | null

Model display name

max_iterationsinteger:uint64required

Maximum number of iterations/tool calls before stopping

format uint64min 0
system_promptstring | null

System prompt for the agent

toolsArray<Tool>

Registered tools (function calling schemas for non-function tools)

default []
itemsToolarray item
typeToolTyperequired
enum "function"
functionHistoryFunctionrequired
namestringrequired
descriptionstring | null
parametersHistoryFunctionParametersrequired
function_refsMap<string, string>

Function references (node_id -> node_name mapping) These are converted to tools at execution time to keep data slim

default {}
*stringmap value
mcp_serversArray<McpServerConfig>

MCP servers with optional tool filtering

default []
itemsMcpServerConfigarray item

MCP server registration with optional tool filtering

uristringrequired

URI of the MCP server

tool_filterarray | null

Optional tool filter - if None, all tools are used If Some, only tools in this set are used

itemsstringarray item
thinking_enabledboolean

Whether the thinking tool is enabled

default false
historyanyOf (2)

Optional conversation history to initialize with

variant 1Historyvariant
modelstringrequired
messagesArray<HistoryMessage>required
itemsHistoryMessagearray item
presetstring | null
streamboolean | null
stream_optionsanyOf (2)
variant 1StreamOptionsvariant
variant 2nullvariant
+14 more fields
variant 2nullvariant
datafusion_contextsArray<DataFusionContext>

DataFusion sessions for SQL-based data analysis Multiple sessions can be added to give the agent access to different data sources

default []
itemsDataFusionContextarray item

DataFusion session context for SQL-based data analysis

session_cache_keystringrequired

Cache key to look up the session in ExecutionContext.cache

descriptionstring | null

User-provided description of what this data represents e.g., "Sales data from 2020-2024 including customer demographics"

table_descriptionsMap<string, string>

Per-table descriptions for better LLM understanding Key is the table name, value is description

default {}
*stringmap value
example_queriesArray<string>

Example SQL queries that work well with this data

default []
itemsstringarray item
table_schemasMap<string, string>

Auto-discovered table schemas (populated at runtime) Key is table name, value is schema description

default {}
*stringmap value
infinite_contextboolean

Enable infinite context mode with automatic context window management. When enabled, applies the selected context management strategy.

default false
context_management_modeContextManagementMode

Strategy for managing context when it exceeds the token budget. - Truncate: Sliding window, removes oldest messages (fast, no extra cost) - Summarize: LLM compresses old messages (preserves info, adds latency/cost)

default "Truncate"
variant 1constvariant

Sliding window truncation - removes oldest messages to fit budget. Fast, deterministic, no extra API costs. May lose important early context.

const "Truncate"
variant 2constvariant

LLM summarization - compresses old messages into a summary. Preserves key information but adds latency and API cost.

const "Summarize"
max_context_tokensinteger | null

Maximum tokens to retain when truncating history in infinite context mode. Defaults to 32000 tokens if not specified. Only used when infinite_context is true.

format uint32default nullmin 0
lazy_function_refsArray<LazyFunctionRef>

Lazy function references backed by a vector DB index. At execution time the agent can search this index to dynamically discover and load only the tools it actually needs, keeping the context window lean.

default []
itemsLazyFunctionRefarray item

Reference to a lazy function tool index stored in a vector DB. Allows agents to do hybrid search over a large pool of tools at execution time instead of loading all tool schemas into the context upfront.

db_cache_keystringrequired

Cache key used to look up the LanceDB connection

lazy_embedding_modelanyOf (2)

Embedding model shared across all lazy function tool indexes. The model's cache key is encoded into the vector DB table name, so swapping the model automatically uses a fresh table (old embeddings are abandoned).

variant 1CachedEmbeddingModelvariant
cache_keystringrequired
model_typeBitTypesrequired
enum "Llm", "Vlm", "Tts", "Stt"...
variant 2nullvariant
memoryanyOf (2)

Persistent memory configuration. When set, the agent gains built-in `_memory_search`, `_memory_store`, and `_memory_compress` tools to autonomously store, recall, and compress observations across conversations.

variant 1MemoryConfigvariant
databaseNodeDBConnectionrequired
cache_keystringrequired
embedding_modelCachedEmbeddingModelrequired
cache_keystringrequired
model_typeBitTypesrequired
enum "Llm", "Vlm", "Tts", "Stt"...
max_context_tokensinteger:uint32required
format uint32min 0
recall_strategyRecallStrategyrequired
enum "RecentFirst", "Relevance", "Hybrid"
recall_top_kinteger:uint32required
format uint32min 0
+2 more fields
variant 2nullvariant

Session

Struct
session

DataFusion session from CreateDataFusionSession node

DataFusionSessionDataFusionSession1 fields
cache_keystringrequired

Description

String
description

User-friendly description of this data source

Table Descriptions

Struct
table_descriptions

Map of table names to descriptions (JSON object)

Map_of_stringMap_of_string1 fields
*stringmap value

Example Queries

Generic
example_queries

Example SQL queries that work with this data

Discover Schemas

Boolean
discover_schemas

Automatically discover table schemas at runtime

Default true

Output Pins

2

Done

Execution
exec_out

Execution completed

Agent

Struct
agent_out

Agent with DataFusion context added

AgentAgent16 fields
modelBitrequired

The LLM model id backing this agent

idstring
default ""
typeBitTypes
enum "Llm", "Vlm", "Tts", "Stt"...default "Other"
metaMap<string, Metadata>
default {}
*Metadatamap value
namestringrequired
descriptionstringrequired
long_descriptionstring | null
release_notesstring | null
tagsArray<string>required
+11 more fields
authorsArray<string>
default []
itemsstringarray item
repositorystring | null
default null
+14 more fields
model_display_namestring | null

Model display name

max_iterationsinteger:uint64required

Maximum number of iterations/tool calls before stopping

format uint64min 0
system_promptstring | null

System prompt for the agent

toolsArray<Tool>

Registered tools (function calling schemas for non-function tools)

default []
itemsToolarray item
typeToolTyperequired
enum "function"
functionHistoryFunctionrequired
namestringrequired
descriptionstring | null
parametersHistoryFunctionParametersrequired
function_refsMap<string, string>

Function references (node_id -> node_name mapping) These are converted to tools at execution time to keep data slim

default {}
*stringmap value
mcp_serversArray<McpServerConfig>

MCP servers with optional tool filtering

default []
itemsMcpServerConfigarray item

MCP server registration with optional tool filtering

uristringrequired

URI of the MCP server

tool_filterarray | null

Optional tool filter - if None, all tools are used If Some, only tools in this set are used

itemsstringarray item
thinking_enabledboolean

Whether the thinking tool is enabled

default false
historyanyOf (2)

Optional conversation history to initialize with

variant 1Historyvariant
modelstringrequired
messagesArray<HistoryMessage>required
itemsHistoryMessagearray item
presetstring | null
streamboolean | null
stream_optionsanyOf (2)
variant 1StreamOptionsvariant
variant 2nullvariant
+14 more fields
variant 2nullvariant
datafusion_contextsArray<DataFusionContext>

DataFusion sessions for SQL-based data analysis Multiple sessions can be added to give the agent access to different data sources

default []
itemsDataFusionContextarray item

DataFusion session context for SQL-based data analysis

session_cache_keystringrequired

Cache key to look up the session in ExecutionContext.cache

descriptionstring | null

User-provided description of what this data represents e.g., "Sales data from 2020-2024 including customer demographics"

table_descriptionsMap<string, string>

Per-table descriptions for better LLM understanding Key is the table name, value is description

default {}
*stringmap value
example_queriesArray<string>

Example SQL queries that work well with this data

default []
itemsstringarray item
table_schemasMap<string, string>

Auto-discovered table schemas (populated at runtime) Key is table name, value is schema description

default {}
*stringmap value
infinite_contextboolean

Enable infinite context mode with automatic context window management. When enabled, applies the selected context management strategy.

default false
context_management_modeContextManagementMode

Strategy for managing context when it exceeds the token budget. - Truncate: Sliding window, removes oldest messages (fast, no extra cost) - Summarize: LLM compresses old messages (preserves info, adds latency/cost)

default "Truncate"
variant 1constvariant

Sliding window truncation - removes oldest messages to fit budget. Fast, deterministic, no extra API costs. May lose important early context.

const "Truncate"
variant 2constvariant

LLM summarization - compresses old messages into a summary. Preserves key information but adds latency and API cost.

const "Summarize"
max_context_tokensinteger | null

Maximum tokens to retain when truncating history in infinite context mode. Defaults to 32000 tokens if not specified. Only used when infinite_context is true.

format uint32default nullmin 0
lazy_function_refsArray<LazyFunctionRef>

Lazy function references backed by a vector DB index. At execution time the agent can search this index to dynamically discover and load only the tools it actually needs, keeping the context window lean.

default []
itemsLazyFunctionRefarray item

Reference to a lazy function tool index stored in a vector DB. Allows agents to do hybrid search over a large pool of tools at execution time instead of loading all tool schemas into the context upfront.

db_cache_keystringrequired

Cache key used to look up the LanceDB connection

lazy_embedding_modelanyOf (2)

Embedding model shared across all lazy function tool indexes. The model's cache key is encoded into the vector DB table name, so swapping the model automatically uses a fresh table (old embeddings are abandoned).

variant 1CachedEmbeddingModelvariant
cache_keystringrequired
model_typeBitTypesrequired
enum "Llm", "Vlm", "Tts", "Stt"...
variant 2nullvariant
memoryanyOf (2)

Persistent memory configuration. When set, the agent gains built-in `_memory_search`, `_memory_store`, and `_memory_compress` tools to autonomously store, recall, and compress observations across conversations.

variant 1MemoryConfigvariant
databaseNodeDBConnectionrequired
cache_keystringrequired
embedding_modelCachedEmbeddingModelrequired
cache_keystringrequired
model_typeBitTypesrequired
enum "Llm", "Vlm", "Tts", "Stt"...
max_context_tokensinteger:uint32required
format uint32min 0
recall_strategyRecallStrategyrequired
enum "RecentFirst", "Relevance", "Hybrid"
recall_top_kinteger:uint32required
format uint32min 0
+2 more fields
variant 2nullvariant

Node Info

Internal name
add_datafusion_to_agent
Category
AI/Agents/Builder
Version
2