Skip to content

AI Extract Documents Node

AI/Processing

AI Extract Documents

Extracts text and content from multiple documents using AI in parallel.

ai_processing_extract_documents_aiprocessingLong running
Inputs8
Outputs2
Security exposure5/10
Packageprocessing

Ratings

Scores range from 0 to 10. Higher values mean more impact, exposure, or operational weight.

SecurityAttack surface and exposure impact.
5/10Medium
PrivacyPotential sensitivity of processed data.
5/10Medium
PerformanceRuntime or resource pressure.
3/10High
GovernancePolicy, audit, or compliance impact.
5/10Medium
ReliabilityOperational stability considerations.
3/10High
CostExternal or compute cost impact.
7/10Low

Input Pins

8

Input

Execution
exec_in

Execution trigger to start AI-powered batch extraction.

Files

Struct Array
files

Array of document files to extract.

FlowPathFlowPath3 fields
pathstringrequired
store_refstringrequired
cache_store_refstring | null
Schema enforced

Model

Struct
model

Vision-capable AI model for image analysis and OCR.

BitBit19 fields
idstring
default ""
typeBitTypes
enum "Llm", "Vlm", "Tts", "Stt"...default "Other"
metaMap<string, Metadata>
default {}
*Metadatamap value
namestringrequired
descriptionstringrequired
long_descriptionstring | null
release_notesstring | null
tagsArray<string>required
itemsstringarray item
+11 more fields
authorsArray<string>
default []
itemsstringarray item
repositorystring | null
default null
download_linkstring | null
default null
file_namestring | null
default null
hashstring
default ""
sizeinteger | null
format uint64default nullmin 0
hubstring
default ""
parametersvalue
default null
versionstring | null
default null
licensestring | null
default null
dependenciesArray<string>
default []
itemsstringarray item
dependency_tree_hashstring
default ""
createdstring
default ""
updatedstring
default ""
model_slugstring | null
default null
+1 more fields
Schema enforced

Extract Images

Boolean
extract_images

Whether to extract and embed images from documents.

Default true

Images Per Message

Integer
images_per_message

Number of images to batch per LLM request (higher = faster but may hit token limits).

Default 2

Pages Per Batch

Integer
pages_per_batch

Number of PDF pages to process in parallel (higher = faster but uses more memory).

Default 2

Temperature

Float
temperature

LLM temperature (0.0 = deterministic, 1.0 = creative). Lower is better for extraction.

Default 0.1

Max Tokens

Integer
max_tokens

Maximum output tokens per LLM call. Leave at 0 for model default. Set lower for unreliable models.

Default 4096

Output Pins

2

Output

Execution
exec_out

Execution output after all extractions complete.

Results

Struct Array
results

Array of extracted document pages with AI descriptions for each file.

DocumentPageDocumentPage3 fields
page_numberinteger:uint32required
format uint32min 0
contentstringrequired
imagesArray<NodeImage>required
itemsNodeImagearray item
image_refstringrequired
Schema enforced

Node Info

Internal name
ai_processing_extract_documents_ai
Category
AI/Processing
Version
2