DOM 자동화와 API 통합을 결합한 하이브리드 AI 데스크탑 레이어
hackernews
|
|
🏗️ 프레임워크
#api 통합
#biamos
#dom 자동화
#기타 ai
#데스크탑 레이어
#하이브리드 ai
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
새로운 하이브리드 AI 데스크톱 레이어는 DOM-자동화와 API 통합을 결합하여 사용자 경험을 혁신합니다. 이 기술은 웹 페이지 요소(DOM)를 직접 제어하고 외부 서비스(API)와 원활하게 연동하며, 기존 자동화 도구의 한계를 극복합니다. 특정 수치나 맥락은 언급되지 않았지만, 이 접근 방식은 복잡한 작업을 단순화하고 생산성을 높일 잠재력을 가집니다.
본문
Quick Start • Web Agent • Ghost-Auth • Domain Brain • 📺 YouTube BiamOS is a complete paradigm shift for AI interaction. We have moved beyond the "chatbot next to a browser" era. BiamOS is an Autonomous AI Web Browser disguised as a desktop environment. It operates locally, running complex agentic workflows directly on your machine. You simply type a command (e.g., /act Find the newest video from Marques Brownlee and leave a comment ), and the built-in AI drives the browser, navigating Single Page Applications (SPAs) with absolute precision. It's not a browser extension. It's not a copilot wrapper. It's an autonomous agentic operating layer. Unlike traditional web automation tools (Playwright/Puppeteer/Selenium) that rely on fragile DOM selectors, or basic AI agents that get blocked by captchas, BiamOS uses the WORMHOLE Stealth Executor. The agent doesn't guess where an element is based on outdated snapshots. Milliseconds before a click, it calculates live CSS geometry (DOM.getBoxModel ), completely defeating lazy-loading layout shifts. The agent's visual cursor mathematically tethers to the live-raycast, swooping in with a 0.6s cubic-bezier animation. Once perfectly aligned, it fires a native OS-level click. SPAs don't see an automation event; they see a human mouse click. To bypass sophisticated bot-detection (like Cloudflare Turnstile or Recaptcha v3), all automated mouse movements simulate human cursor acceleration and deceleration along randomized cubic-bezier curves. BiamOS features a Zero-Hardcoded RAG Semantic Memory. We completely stripped out hardcoded scripts. The agent learns how to use websites purely through observation and negative reinforcement. - The Librarian: An active background process that observes when the Agent makes a mistake or falls into an infinite loop. The Librarian immediately steps in, analyzes the failure, distills an "Avoid Rule" (Negative Reinforcement), and permanently memorizes it. - 4-Tier Retrieval: When the Agent visits a website, BiamOS instantly injects contextual rules ( Global →Domain →Subdomain →Exact Path ). The Agent instantly knows the eccentricities of whatever SPA it is currently viewing. The ultimate privacy feature for AI agents. Forget generating API keys or granting OAuth permissions to startups. BiamOS embeds a native Chromium Webview. Log into Gmail, Notion, X, or YouTube directly inside the UI just like a normal browser. When you ask the Agent to act, it securely rides on your existing authenticated session. No tokens leave your machine. No APIs are required. When the Agent successfully completes a complex task (e.g., searching for a flight, applying a filter, and extracting prices), BiamOS hashes the semantic intent and saves the entire step-by-step sequence to a local SQLite database. The next time you ask for a similar task, the system recognizes the intent via on-device embeddings (MiniLM-L6) and perfectly replays the "Muscle Memory" instantly, bypassing the expensive LLM planning phase. ┌─────────────────────────────────────────────────────────────────┐ │ Electron 34 Desktop Shell │ │ Chromium Webview • Stealth Executor • Native OS Input │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────┐ ┌──────────────────────────────┐ │ │ │ React 19 Frontend │ │ Hono REST Backend │ │ │ │ │ │ │ │ │ │ Spatial Canvas │◄──►│ WORMHOLE Engine (CDP) │ │ │ │ GhostCursor Engine │ │ Domain Brain (RAG) │ │ │ │ Set-of-Mark Overlay │ │ The Librarian (Learning) │ │ │ │ Agent Dashboard UI │ │ Workflow Muscle Memory │ │ │ │ │ │ │ │ │ │ TypeScript + MUI │ │ Drizzle ORM + SQLite │ │ │ └───────────────────────┘ └──────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ | Windows | macOS | | |---|---|---| | Runtime | Node.js 18+ & npm | Node.js 18+ & npm | | LLM Key | OpenRouter API key → Get one | OpenRouter API key → Get one | # Clone the repository git clone https://github.com/BiamOS/BiamOS.git cd BiamOS # Setup Environment Variables # Copy .env.example to .env and insert your OpenRouter Key cp .env.example .env # Install dependencies npm install # Start BiamOS npm run electron # macOS → produces dist-electron/BiamOS-*.dmg (arm64 + x64) npm run dist:mac # Windows → produces dist-electron/BiamOS Setup *.exe npm run dist:win Note for macOS: On first launch of the .dmg , right-click → Open to bypass Gatekeeper. BiamOS is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). You are free to use, modify, and distribute this software under the terms of the AGPL-3.0. If you modify and deploy BiamOS as a service, you must release your modifications under the same license. Built with 🧬 by the BiamOS Contributors From Vienna, Austria 🇦🇹
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유