Dictare – AI 코딩 에이전트를 위한 오픈 소스 음성 레이어(100% 로컬)

hackernews | | 📦 오픈소스
#100% 로컬 #ai 딜 #ai 코딩 에이전트 #claude #dictare #gemini #llama #openai #오픈 소스 #음성 인식
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

AI 코딩 에이전트를 위한 100% 로컬 기반의 오픈소스 보이스 인터랙션 툴 ‘Dictare’가 공개되었습니다. 이 도구는 OpenVIP라는 오픈 프로토콜을 사용하여, 에이전트 창이 활성화되지 않아도 백그라운드에서 음성을 전송할 수 있는 것이 가장 큰 특징입니다. macOS와 Linux를 지원하며 Whisper 등의 모델을 활용해 기기 내에서 음성 인식(STT)을 처리하고, 명령어를 통해 다중 에이전트 간 전환이 가능합니다.

본문

If you want to know how a poker game turned into a voice interaction system for coding agents... watch this → Most voice tools (Wispr Flow, Superwhisper, etc.) simulate keystrokes — they type into whatever window has focus. Switch to your browser and your code gets your voice. Dictare uses a protocol. Your agent listens via SSE and receives transcriptions regardless of window focus. Your coding agent can be behind 3 other windows — it still gets your words. - No focus required — agent receives voice even when its window is in the background - Agent-native — transcriptions go to the agent protocol, not a text field - 100% local — STT runs on-device, zero data leaves your machine - Multi-agent — switch agents by voice: "agent coding", "agent review" - Open protocol — OpenVIP — any tool can implement the SSE endpoint - Bidirectional — STT (voice in) + TTS (voice out) macOS — full guide brew install dragfly/tap/dictare Linux — full guide curl -fsSL https://raw.githubusercontent.com/dragfly/dictare/main/install.sh | bash sudo usermod -aG input $USER # required for hotkey (log out/in after) macOS — grant when prompted: - Microphone — prompted on first launch - Input Monitoring — System Settings → Privacy & Security → enable Dictare - Accessibility — needed for keyboard mode (typing into other apps) After granting all three: dictare service restart Linux — two steps: - Input group (hotkey, X11 + Wayland): sudo usermod -aG input $USER — log out/in- ydotool (keyboard mode on Wayland): sudo apt install ydotool dictare agent freddie # starts the default profile (Claude Code) That's it. The service starts automatically. Speak — your agent receives the transcription. If you prefer a different coding agent: dictare agent ozzy --profile codex # OpenAI Codex dictare agent gilmour --profile gemini # Google Gemini CLI dictare agent bowie --profile aider # Aider Microphone │ ▼ STT Module Whisper (MLX / CTranslate2) or Parakeet (ONNX) │ all local, zero cold-start ▼ Pipeline submit detection, mute control, agent switching │ ▼ OpenVIP HTTP / SSE — open protocol │ ▼ Agent receives transcription, no window focus needed The engine runs as a background service (launchd on macOS, systemd on Linux). STT models are preloaded at startup. Each agent connects in its own terminal. Profiles are predefined in ~/.config/dictare/config.toml : [agent_profiles] default = "claude" [agent_profiles.claude] command = ["claude"] description = "Claude Code" [agent_profiles.codex] command = ["codex"] description = "OpenAI Codex" [agent_profiles.pi] command = ["pi", "--provider", "ollama", "--model", "qwen3:8b"] continue_args = ["-c"] description = "Pi + Ollama local, agentic with tools" Then connect: dictare agent freddie # default profile (claude) dictare agent ozzy --profile codex # use codex profile dictare agent -- claude --model opus # explicit command override | Say | Action | |---|---| | "ok, submit" / "ok, send" / "ok, invia" / "ja, senden" | Submit to agent (Enter) | | "ok, mute" / "ok, hold on" | Mute (stop listening) | | "ok, listen" / "ok, listen up" | Unmute (resume listening) | | "agent coding" / "agent review" | Switch active agent | Submit triggers are multilingual (en, de, es, it, fr) and fully configurable. Default hotkey: Right ⌘ (macOS) / Scroll Lock (Linux). | Gesture | Action | |---|---| | Single tap | Toggle listening on/off | | Double tap | Submit (send Enter to agent) | | Right Alt + hotkey | Switch mode: agents ↔ keyboard | dictare service install # Install + enable (auto-starts at login) dictare service start # Start the service dictare service stop # Stop the service dictare service restart # Restart the service dictare service status # Show service and engine status dictare service logs # View recent logs dictare service uninstall # Remove the service No agent? Use dictare as a dictation tool — voice to keystrokes in any app. dictare config set output.mode keyboard Hotkey to toggle listening (configurable): - macOS: Right ⌘ by default - Linux: Scroll Lock by default dictare config set hotkey.key KEY_RIGHTALT # change hotkey dictare speak "Hello world" dictare speak --engine piper "Hello" echo "Hello" | dictare speak Engines: espeak , say (macOS), piper , kokoro dictare config edit # Open config in editor dictare config list # Show all settings dictare config get stt.model dictare config set stt.language it Full configuration reference at dictare.io/docs/configuration. git clone https://github.com/dragfly/dictare && cd dictare # macOS Apple Silicon (MLX GPU acceleration) uv sync --python 3.11 --extra mlx # macOS Intel / Linux uv sync --python 3.11 # Run engine in foreground uv run --python 3.11 dictare serve # Tests uv run --python 3.11 pytest tests/ -x # Tests (parallel) uv run --python 3.11 pytest tests/ -x -n auto Ghostty users: add keybind = shift+enter=text:\n to config. See TERMINAL_COMPATIBILITY.md. dictare is the reference implementation of OpenVIP — an open protocol for voice input to AI agents. Any tool can impleme

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →