Skip to content

Desktop Automation & RPA

Flow-Like brings RPA (Robotic Process Automation) capabilities to your desktop, allowing you to automate interactions with any application—even legacy systems without APIs.

FeatureDescriptionStatus
Screen CaptureTake screenshots of full screen or regions✅ Available
AI OCRExtract text from images using vision AI✅ Available
Barcode/QR ReadingDecode QR codes and barcodes✅ Available
Keyboard AutomationType text and press key combinations🔜 Coming
Mouse AutomationClick, drag, scroll at positions🔜 Coming
Window ManagementFocus windows, launch apps🔜 Coming
Visual Element FindingLocate UI elements by image template🔜 Coming
Workflow RecordingRecord actions to generate automation🔜 Coming

Extract text from any screen capture or document using vision-capable AI models:

Screenshot / Image
AI Extract Document (Vision Model)
Extracted Text (Markdown format)

Supported formats:

  • Screenshots (PNG, JPG)
  • PDFs (rendered to images)
  • Scanned documents
  • Photos of documents

Example: Read text from screen region

Capture Screen Region (x, y, width, height)
AI Extract Document
├── model: GPT-4 Vision / Claude Vision
└── prompt: "Extract all visible text"
Extracted Text ──▶ Process / Store

Decode barcodes and QR codes from images:

Read Barcodes (image)
Array of detected codes:
├── type: QR_CODE
├── data: "https://example.com/product/123"
└── position: { x, y, width, height }

Supported formats:

  • QR Code
  • PDF417
  • Code 128
  • Code 39
  • EAN-13/8
  • UPC-A/E
  • DataMatrix
  • Aztec

Example: Process shipping labels

For Each image in shipping_labels
Read Barcodes (image)
Extract tracking number ──▶ Add to database

Create QR codes programmatically:

Write QR Code
├── data: "https://myapp.com/order/12345"
├── size: 256
└── format: PNG
QR Code Image ──▶ Save / Display / Email

Capture frames from network cameras for monitoring:

Grab Camera Frame (mjpeg_url)
Image ──▶ AI Analysis / Store / Alert

Use cases:

  • Inventory monitoring
  • Security alerts
  • Production line inspection

Control mouse movements and clicks:

NodeDescription
Click At PositionClick at specific x,y coordinates
Click TemplateClick on visually matched element
Double ClickDouble-click at position
Mouse DragDrag from point A to point B
ScrollScroll up/down at position

Example: Automate form filling

Click At Position (100, 200) ──▶ Focus name field
Type Text ("John Doe")
Click At Position (100, 250) ──▶ Focus email field
Type Text ("[email protected]")
Click Template (submit_button.png) ──▶ Click submit

Simulate keyboard input:

NodeDescription
Type TextType a string of text
Key PressPress a single key with modifiers
Key CombinationPress shortcuts (Ctrl+C, Alt+Tab)

Example: Copy data from legacy app

Focus Window ("Legacy CRM")
Key Press (Ctrl+A) ──▶ Select all
Key Press (Ctrl+C) ──▶ Copy
Get Clipboard ──▶ Process copied data

Control application windows:

NodeDescription
Focus WindowBring window to front by title
Launch AppStart an application
Minimize/MaximizeControl window state
Close WindowClose application window

Locate UI elements using template matching:

Find Template (button_image.png)
Position: { x: 450, y: 320, confidence: 0.95 }
Click At Position (450, 320)

With fallback:

Click Template
├── template: submit_button.png
├── fallback_position: (500, 400)
├── timeout: 5000ms
└── confidence_threshold: 0.8

Record your actions to generate automation:

  1. Start Recording – Begin capture mode
  2. Perform Actions – Click, type, navigate normally
  3. Stop Recording – End capture
  4. Review Generated Flow – Edit the automation
  5. Run – Execute the recorded workflow

The recorder captures:

  • Mouse clicks with screenshots
  • Keyboard input
  • Window focus changes
  • Timing between actions

Desktop automation requires system permissions:

PermissionPurposeHow to Enable
AccessibilityMouse/keyboard controlSystem Preferences → Security & Privacy → Accessibility
Screen RecordingTake screenshotsSystem Preferences → Security & Privacy → Screen Recording

Flow-Like requests these permissions automatically when needed.

  • Run as Administrator for some applications
  • No special permissions typically required
  • X11 input extension for keyboard/mouse
  • Screenshot permissions may vary by desktop environment

Automate data entry into systems without APIs:

For Each record in new_records
Focus Window ("Legacy ERP")
Click Template (new_record_button.png)
├──▶ Type in Field 1 (record.name)
├──▶ Type in Field 2 (record.value)
└──▶ Click Submit

Pull data from desktop applications:

Focus Window ("Financial Software")
Navigate to Reports
Screenshot Report Area
AI Extract (table data)
Parse and Store in Database

Test desktop applications:

Launch App ("MyApp.exe")
Wait for Window ("Main Window")
Click "Login" button
├──▶ Type username
├──▶ Type password
└──▶ Click Submit
Verify: Window title contains "Dashboard"

Combine RPA with document processing:

Watch Folder (/incoming)
For Each new file
├── PDF? ──▶ AI Extract Document ──▶ Process
├── Image? ──▶ AI OCR ──▶ Process
└── Scanned? ──▶ AI Vision Extract ──▶ Process
  • Capture unique UI elements
  • Avoid templates with dynamic content
  • Include enough context for reliable matching
  • Test at different screen resolutions
Retry (3 times, 1s delay)
Find Template (loading_complete.png)
Continue with automation
Try
Click Template (button.png)
└── Catch: Template not found
Take Screenshot ──▶ Log error ──▶ Alert user
  • Test thoroughly in attended mode first
  • Add checkpoints and logging
  • Implement timeout limits
  • Have recovery procedures
  • Add delays between actions (200-500ms)
  • Don’t overwhelm target applications
  • Simulate human-like interaction speeds

RPA becomes powerful when combined with AI:

Screenshot Application
AI Vision: "Find all customer records in this screenshot"
Structured Data: [{name, email, status}, ...]
Store in Database
Screenshot Current State
AI Analysis: "What is the application state?"
Branch based on AI response:
├── "Login screen" ──▶ Perform login
├── "Dashboard" ──▶ Navigate to reports
└── "Error dialog" ──▶ Handle error
Chat Event: "Download the latest sales report"
AI Plans actions:
├── Open reporting application
├── Navigate to sales reports
├── Select latest report
└── Click download
Execute each action

Yes, if it has a visible UI, you can automate it. Some applications with custom rendering may be harder to work with.

Currently, the target window needs to be visible. Background automation is planned.

Very reliable when done correctly. Use unique, stable UI elements and set appropriate confidence thresholds.

Can I run multiple automations simultaneously?

Section titled “Can I run multiple automations simultaneously?”

One automation can run at a time on a single machine. For parallel execution, use multiple machines.

Yes—automations run locally on your machine. No screenshots or data are sent anywhere unless you explicitly configure it.