HN 표시: GhostDesk – AI 에이전트에 완전한 가상 Linux 데스크탑을 제공하는 MCP 서버
hackernews
|
|
📦 오픈소스
#ai 모델
#ai 에이전트
#chatgpt
#claude
#gemini
#linux 데스크탑
#llama
#llm
#mcp
#가상 데스크탑
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
GhostDesk는 AI 에이전트에게 완전한 리눅스 가상 데스크톱 환경을 제공하여 마우스와 키보드를 직접 제어하고 애플리케이션을 실행할 수 있게 해주는 MCP 서버입니다. 이 도구는 API가 없는 레거시 소프트웨어나 웹페이지도 시각적으로 인식하고 조작할 수 있어, 브라우저 자동화나 데이터 추출 같은 복잡한 작업을 사람처럼 수행할 수 있습니다. 또한 접근성 엔진과 인간 같은 입력 패턴을 활용해 봇 탐지를 우회하며, 도커 환경에서 격리되어 안전하게 자율 작업을 처리하는 것이 특징입니다.
본문
Give your AI agent eyes, hands, and a full Linux desktop. An MCP server that lets LLM agents see the screen, move the mouse, type on the keyboard, launch apps, and run shell commands — all inside a sandboxed virtual desktop. If a human can do it on a desktop, your agent can too. Most AI agents are trapped in text. They can call APIs and generate code, but they can't use software. GhostDesk changes that. Connect any MCP-compatible LLM (Claude, GPT, Gemini...) and it gets a full Linux desktop with 11 tools to interact with any application — browsers, IDEs, office suites, terminals, legacy software, internal tools. No API needed. No integration required. If it has a UI, your agent can use it. Your agent gets its own Linux desktop. Here's what that unlocks: "Go to the CRM, export last month's leads as CSV, open LibreOffice Calc, build a pivot table, take a screenshot of the chart, and email it to the team." Your agent opens the browser, logs in, downloads the file, switches to another app, processes the data, captures the result, and sends it — autonomously, across multiple applications, in one conversation. "Search for competitors on Google, open the first 5 results, extract pricing from each page, and summarize in a spreadsheet." No Selenium. No CSS selectors. No Puppeteer scripts that break every week. The agent looks at the screen, clicks what it sees, fills forms naturally — with human-like mouse movement that bypasses bot detection. "Open the legacy inventory app, search for product #4521, update the stock count to 150, and confirm the change." That old Java app with no API? That internal admin panel from 2010? A Windows app running in Wine? If it renders pixels on screen, your agent can operate it. "Open the analytics dashboard, read the KPI table, scroll down to the revenue chart, take a screenshot, then export the raw data." The agent takes screenshots, reads the screen visually, and extracts what it needs — works on any application, any UI framework, any language. "Navigate the signup flow, try invalid emails, empty fields, and SQL injection in every input. Screenshot each error state." Your agent becomes a QA engineer — it clicks every button, fills every form, tests every edge case, and brings back screenshots as proof. "Every morning: log into the supplier portal, download the latest price list, compare with yesterday's, and flag any changes above 5%." Runs headless in Docker. No physical screen. No human babysitting. Schedule your agent to handle repetitive desktop tasks while you sleep. "Open VS Code, create a new Python file, write a script that calls our API, run it in the terminal, debug if it fails, then commit and push to GitHub." Your agent isn't limited to one app. It can switch between browser, terminal, IDE, file manager, email client — just like a human switching windows on their desktop. | Feature | Why it matters | | |---|---|---| | 📸 | Screenshots | Full or regional captures with cursor overlay — the agent sees exactly what a human would see | | 🖱️ | Human-like input | Bézier mouse curves, variable typing speed, micro-jitter — bypasses bot detection | | 📋 | Clipboard | Read & write the clipboard — paste long text instantly | | ⌨️ | Keyboard control | Type text, press hotkeys, keyboard shortcuts — full keyboard access | | 🖥️ | Shell access | Run any command, launch any app, capture stdout/stderr | | 🐳 | Sandboxed | Runs in Docker — isolated, reproducible, safe | | 👀 | Live view | Watch your agent work in real-time via VNC or browser (noVNC) | | Tool | Description | |---|---| screenshot() | Capture the screen (full or region) with cursor position overlay | | Tool | Description | |---|---| mouse_click(x, y) | Click at coordinates | mouse_double_click() | Double-click at coordinates | mouse_drag() | Drag from one position to another | mouse_scroll() | Scroll in any direction (up/down/left/right) | type_text() | Type with realistic per-character delays | press_key() | Press keys or combos (ctrl+c , alt+F4 , Return ...) | | Tool | Description | |---|---| exec() | Run shell commands with stdout/stderr capture | launch() | Start GUI applications | get_clipboard() | Read clipboard contents | set_clipboard() | Write to clipboard | docker run -d --name ghostdesk \ -p 3000:3000 \ -p 5900:5900 \ -p 6080:6080 \ ghcr.io/yv17labs/ghostdesk:latest That's it. The virtual desktop, MCP server, and VNC are all running inside an isolated container. Your agent gets a full Linux desktop — your host machine stays untouched. GhostDesk works with any MCP-compatible client. Add it to your config: Claude Desktop / Claude Code { "mcpServers": { "ghostdesk": { "type": "http", "url": "http://localhost:3000/mcp" } } } ChatGPT, Gemini, or any LLM with MCP support — same config, just point to http://localhost:3000/mcp . Local models (Ollama, LM Studio, etc.) — any MCP client library can connect to the same endpoint. Open http://localhost:6080/vnc.html in your browser to see the virtual desktop in real time. | Service
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유