Deskbrid – 에이전트 및 스크립트를 위한 Unix 소켓을 통한 Linux 데스크톱 제어

hackernews | | 📦 오픈소스
#review #ai 에이전트 #claude #command r #gemini #오픈소스
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

Deskbrid는 리눅스 데스크톱의 다양한 시스템 구성 요소를 하나의 유닉스 소켓 프로토콜로 통합하여, 셸 스크립트와 AI 에이전트 모두가 동일한 방식으로 데스크톱을 제어할 수 있게 하는 Rust 기반 도구입니다. 기존의 xdotool이 Wayland 환경에서 호환되지 않는 문제를 해결하여 현재의 자동화뿐만 아니라 미래의 AI 에이전트 활용까지도 대응하는 것이 핵심 목표입니다.

본문

The HAL your Linux desktop agents are missing. Deskbrid is a single Rust binary that wraps GNOME Shell, DBus, NetworkManager, BlueZ, PipeWire, and Wayland utilities into one JSON-over-Unix-socket protocol. Your shell scripts and AI agents use the same socket. # Human deskbrid windows list deskbrid keyboard type "git push origin main" # Agent (same socket) {"action": "windows.list"} → [{"title": "VS Code", "app_id": "code", ...}] Every major AI lab is racing to ship desktop agents. AppleScript gives macOS agents native control. Windows has UI Automation. Linux has xdotool — which breaks on Wayland, the default display protocol for every major distro. Deskbrid fills that gap. It doesn't bet on agents taking off — automation use cases validate it today, agents validate it tomorrow. Same daemon, same protocol, same socket. # 1. Clone and install system dependencies git clone https://github.com/coe0718/deskbrid cd deskbrid sudo apt install -y grim wl-clipboard # 2. Install the GNOME Shell extension cp -r extensions/deskbrid@deskbrid ~/.local/share/gnome-shell/extensions/ gnome-extensions enable deskbrid@deskbrid # 3. Log out and back in (GNOME must reload extensions) # 4. Build and run cargo build --release ./target/release/deskbrid daemon & # 5. Test it ./target/release/deskbrid windows list ./target/release/deskbrid system info | Desktop | Session | Status | |---|---|---| | GNOME 46+ | Wayland | ✅ Supported | | GNOME 45 | Wayland | | | KDE Plasma | Wayland | 🔄 Planned | | X11 | X11 | ❌ Not planned | | Action | Description | |---|---| windows.list | List all open windows (title, app_id, workspace, geometry) | windows.focus | Focus a window by ID | windows.get | Get details for a specific window | workspaces.list | List workspaces | workspaces.switch | Switch to a workspace | workspaces.move_window | Move a window to another workspace | | Action | Description | |---|---| input.keyboard type | Type text into the focused window | input.keyboard key | Send a single keypress | input.keyboard combo | Send key combos (ctrl+shift+t) | input.mouse move | Move mouse to absolute position | input.mouse click | Click (left/middle/right) | input.mouse scroll | Scroll (dx/dy) | | Action | Description | |---|---| clipboard.read | Read Wayland clipboard | clipboard.write | Write to Wayland clipboard | screenshot | Capture screen (full, region, or window) | notification.send | Send a desktop notification | notification.close | Close a notification by ID | | Action | Description | |---|---| system.info | Desktop info (GNOME version, monitors, workspaces) | system.idle | Seconds since last user input | system.battery | Battery percentage, state, time remaining | system.power | Suspend, hibernate, shutdown, reboot, lock, logout | network.status | Online/offline via NetworkManager | network.interfaces | List interfaces with IPs | network.wifi.scan | Scan for WiFi networks | network.wifi.connect | Connect to a WiFi network | bluetooth.list | List known/available devices | bluetooth.scan | Start device discovery | bluetooth.connect | Connect to a device | audio.list_sinks | List audio output devices | audio.set_sink_volume | Set sink volume (0.0-1.0) | files.search | Search files by name | files.watch | Watch a path for changes (creates, modifies, deletes) | files.unwatch | Stop watching a path | {"action": "subscribe", "events": ["file.*"]} | Pattern | What you get | |---|---| file.* | file.created, file.modified, file.deleted | file.created | Just file creation events | * | Everything | → {"action": "windows.list"} ← [{"title": "PatchHive — VS Code", "app_id": "code", ...}, {"title": "praxis — VS Code", "app_id": "code", ...}] → {"action": "windows.focus", "window_id": "0x3a0000b"} ← {"type": "response", "status": "ok"} → {"action": "input.mouse", "action": "move", "x": 900, "y": 920} → {"action": "input.mouse", "action": "click", "button": "left"} → {"action": "input.keyboard", "action": "type", "text": "Fix the build errors\n"} The agent picks the right window by title, brings it to front, clicks into the chat input, and types. | Language | Status | Install | |---|---|---| | Python | ✅ Done | pip install ./clients/python/ | | Rust (built-in CLI) | ✅ Done | CLI included in binary | | TypeScript | 🔄 Planned | npm install deskbrid | from deskbrid import Deskbrid client = Deskbrid() # Subscribe to events @client.on("file.*") def on_file_change(event): print(f"File changed: {event['path']}") # Actions client.windows_list() client.keyboard_type("deploy production\n") text = client.clipboard_read() path = client.screenshot() client.listen() # blocks, streaming events Deskbrid binds a Unix socket at $XDG_RUNTIME_DIR/deskbrid.sock . Every interaction is one JSON line in → one JSON line out. Agents subscribe to events and get pushed real-time updates. Under the hood it talks to: - GNOME Shell via DBus (windows, workspaces) - Mutter RemoteDesktop API (keyboard injection, pointer control) - Mutter (IdleMonitor) - NetworkManager (network, WiFi) - BlueZ (Bluetooth) - UPower (battery) - org.freedesktop.Notifications (notifications) - grim (Wayland screenshots) - wl-paste/wl-copy (clipboard) - pactl (audio) - notify crate (inotify file watching) | Tool | Wayland | Agent-native | JSON protocol | Windows | Input | Clipboard | Screenshot | Bluetooth | Audio | File watch | |---|---|---|---|---|---|---|---|---|---|---| | deskbrid | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | xdotool | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | ydotool | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | wtype | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | | grim | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | | wl-clipboard | ✅ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | | atspi | limited | ❌ | ❌ | limited | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | Deskbrid is the only tool that combines all of these into a single daemon with a structured protocol designed for programmatic use. See PROTOCOL.md for the complete JSON-over-socket specification. MIT

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →