Desktop Automation & RPA
Flow-Like brings RPA (Robotic Process Automation) capabilities to your desktop, allowing you to automate interactions with any application—even legacy systems without APIs.
Capabilities Overview
Section titled “Capabilities Overview”| Feature | Description | Status |
|---|---|---|
| Screen Capture | Take screenshots of full screen or regions | ✅ Available |
| AI OCR | Extract text from images using vision AI | ✅ Available |
| Barcode/QR Reading | Decode QR codes and barcodes | ✅ Available |
| Keyboard Automation | Type text and press key combinations | 🔜 Coming |
| Mouse Automation | Click, drag, scroll at positions | 🔜 Coming |
| Window Management | Focus windows, launch apps | 🔜 Coming |
| Visual Element Finding | Locate UI elements by image template | 🔜 Coming |
| Workflow Recording | Record actions to generate automation | 🔜 Coming |
Current Capabilities
Section titled “Current Capabilities”AI-Powered OCR
Section titled “AI-Powered OCR”Extract text from any screen capture or document using vision-capable AI models:
Screenshot / Image │ ▼AI Extract Document (Vision Model) │ ▼Extracted Text (Markdown format)Supported formats:
- Screenshots (PNG, JPG)
- PDFs (rendered to images)
- Scanned documents
- Photos of documents
Example: Read text from screen region
Capture Screen Region (x, y, width, height) │ ▼AI Extract Document├── model: GPT-4 Vision / Claude Vision└── prompt: "Extract all visible text" │ ▼Extracted Text ──▶ Process / StoreBarcode & QR Code Reading
Section titled “Barcode & QR Code Reading”Decode barcodes and QR codes from images:
Read Barcodes (image) │ ▼Array of detected codes:├── type: QR_CODE├── data: "https://example.com/product/123"└── position: { x, y, width, height }Supported formats:
- QR Code
- PDF417
- Code 128
- Code 39
- EAN-13/8
- UPC-A/E
- DataMatrix
- Aztec
Example: Process shipping labels
For Each image in shipping_labels │ ▼Read Barcodes (image) │ ▼Extract tracking number ──▶ Add to databaseQR Code Generation
Section titled “QR Code Generation”Create QR codes programmatically:
Write QR Code├── data: "https://myapp.com/order/12345"├── size: 256└── format: PNG │ ▼QR Code Image ──▶ Save / Display / EmailIP Camera Integration
Section titled “IP Camera Integration”Capture frames from network cameras for monitoring:
Grab Camera Frame (mjpeg_url) │ ▼Image ──▶ AI Analysis / Store / AlertUse cases:
- Inventory monitoring
- Security alerts
- Production line inspection
Planned RPA Capabilities
Section titled “Planned RPA Capabilities”Mouse Automation
Section titled “Mouse Automation”Control mouse movements and clicks:
| Node | Description |
|---|---|
| Click At Position | Click at specific x,y coordinates |
| Click Template | Click on visually matched element |
| Double Click | Double-click at position |
| Mouse Drag | Drag from point A to point B |
| Scroll | Scroll up/down at position |
Example: Automate form filling
Click At Position (100, 200) ──▶ Focus name field │ ▼Type Text ("John Doe") │ ▼Click At Position (100, 250) ──▶ Focus email field │ ▼Type Text ("[email protected]") │ ▼Click Template (submit_button.png) ──▶ Click submitKeyboard Automation
Section titled “Keyboard Automation”Simulate keyboard input:
| Node | Description |
|---|---|
| Type Text | Type a string of text |
| Key Press | Press a single key with modifiers |
| Key Combination | Press shortcuts (Ctrl+C, Alt+Tab) |
Example: Copy data from legacy app
Focus Window ("Legacy CRM") │ ▼Key Press (Ctrl+A) ──▶ Select all │ ▼Key Press (Ctrl+C) ──▶ Copy │ ▼Get Clipboard ──▶ Process copied dataWindow Management
Section titled “Window Management”Control application windows:
| Node | Description |
|---|---|
| Focus Window | Bring window to front by title |
| Launch App | Start an application |
| Minimize/Maximize | Control window state |
| Close Window | Close application window |
Visual Element Finding
Section titled “Visual Element Finding”Locate UI elements using template matching:
Find Template (button_image.png) │ ▼Position: { x: 450, y: 320, confidence: 0.95 } │ ▼Click At Position (450, 320)With fallback:
Click Template├── template: submit_button.png├── fallback_position: (500, 400)├── timeout: 5000ms└── confidence_threshold: 0.8Workflow Recording
Section titled “Workflow Recording”Record your actions to generate automation:
- Start Recording – Begin capture mode
- Perform Actions – Click, type, navigate normally
- Stop Recording – End capture
- Review Generated Flow – Edit the automation
- Run – Execute the recorded workflow
The recorder captures:
- Mouse clicks with screenshots
- Keyboard input
- Window focus changes
- Timing between actions
Permission Requirements
Section titled “Permission Requirements”Desktop automation requires system permissions:
| Permission | Purpose | How to Enable |
|---|---|---|
| Accessibility | Mouse/keyboard control | System Preferences → Security & Privacy → Accessibility |
| Screen Recording | Take screenshots | System Preferences → Security & Privacy → Screen Recording |
Flow-Like requests these permissions automatically when needed.
Windows
Section titled “Windows”- Run as Administrator for some applications
- No special permissions typically required
- X11 input extension for keyboard/mouse
- Screenshot permissions may vary by desktop environment
Use Cases
Section titled “Use Cases”Legacy System Integration
Section titled “Legacy System Integration”Automate data entry into systems without APIs:
For Each record in new_records │ ▼Focus Window ("Legacy ERP") │ ▼Click Template (new_record_button.png) │ ├──▶ Type in Field 1 (record.name) ├──▶ Type in Field 2 (record.value) └──▶ Click SubmitData Extraction
Section titled “Data Extraction”Pull data from desktop applications:
Focus Window ("Financial Software") │ ▼Navigate to Reports │ ▼Screenshot Report Area │ ▼AI Extract (table data) │ ▼Parse and Store in DatabaseAutomated Testing
Section titled “Automated Testing”Test desktop applications:
Launch App ("MyApp.exe") │ ▼Wait for Window ("Main Window") │ ▼Click "Login" button │ ├──▶ Type username ├──▶ Type password └──▶ Click Submit │ ▼Verify: Window title contains "Dashboard"Document Processing Pipeline
Section titled “Document Processing Pipeline”Combine RPA with document processing:
Watch Folder (/incoming) │ ▼For Each new file │ ├── PDF? ──▶ AI Extract Document ──▶ Process ├── Image? ──▶ AI OCR ──▶ Process └── Scanned? ──▶ AI Vision Extract ──▶ ProcessBest Practices
Section titled “Best Practices”1. Use Image Templates Wisely
Section titled “1. Use Image Templates Wisely”- Capture unique UI elements
- Avoid templates with dynamic content
- Include enough context for reliable matching
- Test at different screen resolutions
2. Add Wait/Retry Logic
Section titled “2. Add Wait/Retry Logic”Retry (3 times, 1s delay) │ ▼Find Template (loading_complete.png) │ ▼Continue with automation3. Handle Errors Gracefully
Section titled “3. Handle Errors Gracefully”Try │ ▼Click Template (button.png) │ └── Catch: Template not found │ ▼ Take Screenshot ──▶ Log error ──▶ Alert user4. Run Unattended Carefully
Section titled “4. Run Unattended Carefully”- Test thoroughly in attended mode first
- Add checkpoints and logging
- Implement timeout limits
- Have recovery procedures
5. Respect Rate Limits
Section titled “5. Respect Rate Limits”- Add delays between actions (200-500ms)
- Don’t overwhelm target applications
- Simulate human-like interaction speeds
Combining with AI
Section titled “Combining with AI”RPA becomes powerful when combined with AI:
Intelligent Data Extraction
Section titled “Intelligent Data Extraction”Screenshot Application │ ▼AI Vision: "Find all customer records in this screenshot" │ ▼Structured Data: [{name, email, status}, ...] │ ▼Store in DatabaseDecision-Based Automation
Section titled “Decision-Based Automation”Screenshot Current State │ ▼AI Analysis: "What is the application state?" │ ▼Branch based on AI response:├── "Login screen" ──▶ Perform login├── "Dashboard" ──▶ Navigate to reports└── "Error dialog" ──▶ Handle errorNatural Language Instructions
Section titled “Natural Language Instructions”Chat Event: "Download the latest sales report" │ ▼AI Plans actions:├── Open reporting application├── Navigate to sales reports├── Select latest report└── Click download │ ▼Execute each actionCan I automate any application?
Section titled “Can I automate any application?”Yes, if it has a visible UI, you can automate it. Some applications with custom rendering may be harder to work with.
Does it work in the background?
Section titled “Does it work in the background?”Currently, the target window needs to be visible. Background automation is planned.
How reliable is template matching?
Section titled “How reliable is template matching?”Very reliable when done correctly. Use unique, stable UI elements and set appropriate confidence thresholds.
Can I run multiple automations simultaneously?
Section titled “Can I run multiple automations simultaneously?”One automation can run at a time on a single machine. For parallel execution, use multiple machines.
Is it secure?
Section titled “Is it secure?”Yes—automations run locally on your machine. No screenshots or data are sent anywhere unless you explicitly configure it.
Next Steps
Section titled “Next Steps”- Document Processing – Extract data from documents
- API Integrations – When APIs are available
- GenAI – AI-powered analysis
- Building Internal Tools – Create control UIs for automations