Skip to content

DataFusion Node Catalog

Generated category

DataFusion Node Catalog

Generated from 34 catalog nodes in Data/DataFusion.

Data/DataFusion/AggregationData/DataFusion/DatabasesData/DataFusion/LakesData/DataFusion/TimeData/DataFusion/Tools

Nodes in this category

Showing 34 of 34 generated node docs.

Create DataFusion Session

Data/DataFusion

Creates a new DataFusion session for SQL analytics. Configure optimization settings for production workloads.

Mount CSV

Data/DataFusion

Mount CSV files from a FlowPath into a DataFusion session as a queryable table

Mount JSON

Data/DataFusion

Mount JSON (newline-delimited) files from a FlowPath into a DataFusion session as a queryable table

Mount Parquet

Data/DataFusion

Mount Parquet files from a FlowPath prefix into a DataFusion session as a queryable table

Register Lance Table

Data/DataFusion

Register a LanceDB table into a DataFusion session for SQL queries. Uses the existing to_datafusion() implementation from the vector store.

Register Table

Data/DataFusion

Register a CSVTable (from Excel/CSV extraction) into a DataFusion session for SQL queries. Converts the table to an in-memory Arrow table.

SQL Query

Data/DataFusion

Execute a SQL query against a DataFusion session. Returns results as both a CSVTable (for analytics) and array of row objects (for iteration).

Date Truncate Aggregation

Data/DataFusion/Aggregation

Truncate timestamps to a specific precision (hour, day, month, etc.) and aggregate. Simpler alternative to date_bin for standard intervals.

Time Bin Aggregation

Data/DataFusion/Aggregation

Create time-based aggregations using DataFusion's date_bin function. Groups data by fixed time intervals (minute, hour, day, etc.) and applies aggregation functions.

Window Aggregation

Data/DataFusion/Aggregation

Apply window functions for rolling/moving aggregations over time series data.

Mount Athena S3 Results

Data/DataFusion/Databases

Mount Parquet files from an Athena query result location in S3. Supports explicit credentials or environment variables (including Lambda IAM roles).

Register Athena Table

Data/DataFusion/Databases

Register an AWS Athena table in DataFusion. Query S3 data via Athena's catalog. Supports explicit credentials or environment variables (including Lambda IAM roles).

Register BigQuery

Data/DataFusion/Databases

Register a Google BigQuery table or query result into a DataFusion session. Takes a GcpProvider for authentication — pair it with the GCP Provider node.

Register ClickHouse

Data/DataFusion/Databases

Register a ClickHouse table for federated queries. Uses real database connection for full SQL push-down.

Register DuckDB

Data/DataFusion/Databases

Register a DuckDB database table for federated queries. Uses real database connection.

Register FlightSQL

Data/DataFusion/Databases

Register a table via Arrow Flight SQL protocol. High-performance columnar data transfer (10-100x faster than JDBC/ODBC). Supports Dremio, InfluxDB, DuckDB Flight, ClickHouse Flight, and more.

Register MySQL

Data/DataFusion/Databases

Register a MySQL table for federated queries. Uses real database connection for full SQL push-down.

Register Oracle

Data/DataFusion/Databases

Register an Oracle database table for federated queries via ODBC. Requires Oracle Instant Client with ODBC driver installed.

Register PostgreSQL

Data/DataFusion/Databases

Register a PostgreSQL table for federated queries. Uses real database connection for full SQL push-down.

Register SQLite

Data/DataFusion/Databases

Register a SQLite database table for federated queries. Uses real database connection.

Delta Table Info

Data/DataFusion/Lakes

Get metadata and history information about a Delta table.

Delta Time Travel

Data/DataFusion/Lakes

Load a specific version or timestamp of a Delta table for point-in-time queries.

Iceberg Table Info

Data/DataFusion/Lakes

Get metadata, snapshots, and history of an Apache Iceberg table from a metadata file.

Iceberg Time Travel

Data/DataFusion/Lakes

Load a specific snapshot of an Iceberg table for point-in-time queries.

Register Delta Table

Data/DataFusion/Lakes

Register a Delta Lake table in DataFusion using a FlowPath. Requires the 'delta' feature.

Register Hive Parquet

Data/DataFusion/Lakes

Register Hive-partitioned Parquet files as a table in DataFusion using a FlowPath.

Register Iceberg Table

Data/DataFusion/Lakes

Register an Apache Iceberg table in DataFusion from a metadata file. Supports schema evolution and partition pruning.

Register Partitioned JSON

Data/DataFusion/Lakes

Register partitioned JSON/NDJSON files as a table in DataFusion using a FlowPath.

Write Delta Table

Data/DataFusion/Lakes

Write query results to a new or existing Delta Lake table using FlowPath. Supports append, overwrite modes.

DateTime to SQL Timestamp

Data/DataFusion/Time

Convert a DateTime (ISO 8601 string) to SQL timestamp literal for use in DataFusion queries.

Time Range Filter

Data/DataFusion/Time

Generate a SQL WHERE clause for filtering by time range. Supports relative time expressions.

Describe Table

Data/DataFusion/Tools

Get the schema (column names and types) of a table in a DataFusion session.

Execute SQL

Data/DataFusion/Tools

Execute a SQL query and return results as formatted text. Ideal for agent-driven data exploration.

List Tables

Data/DataFusion/Tools

List all tables registered in a DataFusion session. Returns array of table names.