자동 브라우저 – 사람이 인계받는 MCP 기반 브라우저 에이전트

hackernews | | 📰 뉴스
#ai #ai 딜 #claude #gemini #human-in-the-loop #mcp #openai #브라우저 에이전트 #오픈소스
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

오픈소스 프로젝트 'Auto Browser'는 실제 브라우저를 통해 AI 에이전트가 인가된 워크플로우를 수행할 수 있는 MCP(MCP-native) 기반 도구입니다. 이 도구는 FastAPI와 Playwright로 구축되어 noVNC를 통해 사람이 개입하여 문제를 해결할 수 있으며, 한 번의 로그인으로 인증 프로필을 저장해 나중에 세션을 재사용하는 기능을 제공합니다. 또한, Claude Desktop이나 Cursor 등 MCP 클라이언트와 통합이 가능하고, Docker compose 명령어만으로 간편하게 로컬 개발 환경을 구축할 수 있습니다.

본문

Give your AI agent a real browser — with a human in the loop. Open-source MCP-native browser agent for authorized workflows. Works with: - Claude Desktop - Cursor - any MCP client that can speak JSON-RPC tools - direct REST callers when you want curl-first control - MCP-native, not bolted on later. Use it from Claude Desktop, Cursor, or any MCP client. - Human takeover when the web gets weird. noVNC lets you recover from brittle flows without losing the session. - Login once, reuse later. Save named auth profiles and reopen fresh sessions already signed in. If you want one clean mental model, this repo is: browser agent as an MCP server If Auto Browser is useful, a ⭐ helps others find it. git clone https://github.com/LvcidPsyche/auto-browser.git cd auto-browser docker compose up --build That works with zero config for local dev. Optional sanity check: make doctor make doctor needs local Docker access and the ability to open localhost sockets. Open: - API docs: http://localhost:8000/docs - Operator Dashboard: http://localhost:8000/ui/ - Visual takeover: http://localhost:6080/vnc.html?autoconnect=true&resize=scale All published ports bind to 127.0.0.1 by default. Only copy .env.example if you want to change ports, providers, or allowed hosts: cp .env.example .env To see the rest of the common commands: make help Maintenance release — no API changes, all fixes are backwards compatible. - Python 3.10 host compatibility — host-side controller workflows now run on the machine’s existing Python 3.10 environment make test-local — editable controller packaging plus a first-class host-side test path for faster iteration without Docker- Provider HTTP coverage — /agent/providers and/sessions/{id}/agent/step now have direct HTTP-layer tests without real provider credentials - Broader Ruff coverage — CI now lints controller tests and Python helper scripts in addition to the main app package make doctor restricted-shell fix — localhost socket probing now fails with a clear message instead of repeated Python tracebacksbrowser-node Xvfb cleanup — stale:99 lock/socket files are cleared before startup so release-smoke reruns stay stable All 152 tests pass. - CDP Connect Mode — attach to an existing Chrome via --remote-debugging-port instead of launching a new one - Network Inspector — per-session request/response capture with header masking and PII scrubbing - PII Scrubbing Layer — 16 pattern classes (AWS keys, JWTs, credit cards, SSNs, emails…); pixel redaction on screenshots; console + network body scrubbing - Proxy Partitioning — named proxy personas for per-agent static IPs, preventing shared network footprints - Shadow Browsing — flip a headless session to a headed (visible) browser mid-run for live debugging - Session Forking — branch a session’s auth state (cookies + storage) into a new independent session - Playwright Script Export — GET /sessions/{id}/export-script downloads the session as runnable Python - Shared Session Links — HMAC-signed, TTL-enforced observer tokens for team handoffs - Vision-Grounded Targeting — browser.find_by_vision uses Claude Vision to locate elements by natural language description - Cron + Webhook Triggers — APScheduler-backed autonomous jobs; HMAC webhook keys; full CRUD at /crons - MCP Resources Protocol — resources/list +resources/read expose live screenshot, DOM, console, and network log as MCP resources - 30+ new MCP tools — eval_js, get_html, find_elements, drag_drop, set_viewport, cookies/storage R/W, and more See CHANGELOG.md for the full list. - a browser node with Chromium, Xvfb, x11vnc, and noVNC - a controller API built on FastAPI + Playwright - screen-aware observations with screenshots and interactable element IDs - optional OCR excerpts from screenshots via Tesseract - human takeover through noVNC - artifact capture for screenshots, traces, and storage state - optional encrypted auth-state storage with max-age enforcement on restore - reusable named auth profiles for login-once, reuse-later workflows - basic policy rails with host allowlists and upload approval gates - durable session metadata under /data/sessions , with optional Redis backing - durable agent job records under /data/jobs with background workers for queued step/run requests - audit events with per-request operator identity headers - optional SQLite backing for approvals + audit events - optional built-in REST agent runner for OpenAI, Claude, and Gemini - one-step and multi-step REST agent orchestration endpoints - richer browser abilities through the shared action schema: hover, select_option, wait, reload, back, forward - tab awareness and tab controls for popup-heavy workflows - download capture with session-scoped files and URLs under /artifacts - optional session-level proxy routing and custom user agents for controlled network paths - social page helpers for feed scrolling, post/profile extraction, search, and approval-gated write actions - a browser-node managed Playwright server endpoint so the contro

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →