Show HN: 빠르게 읽고 새로운 언어를 학습하는 AI 에이전트
hackernews
|
|
📦 오픈소스
#gpt-4
#ai 딜
#ai 에이전트
#chatgpt
#gemini
#gpt-5
#openai
#데스크톱 앱
#언어 학습
#음성 ai
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
macOS용 데스크톱 AI 비서 'Samuel'은 만화 스타일의 캐릭터가 화면 위에 떠 있어 Apple Books를 읽어주거나 일본어, 중국어, 한국어 등의 언어 학습을 돕는 보조 프로그램입니다. OpenAI의 Realtime API와 GPT-5.4 Computer Use를 기반으로 음성 명령에 반응하여 전자책을 읽고, 번역하며, 문법을 설명하고 화면을 자동으로 탐색하는 복합적인 기능을 제공합니다. 또한 'Hey Samuel'이라는 호출어로 손을 쓰지 않고도 제어할 수 있으며, 투명한 말풍선과 리브(Rive) 애니메이션을 통해 영국식 집사 페르소나로 사용자와 상호작용하는 것이 특징입니다.
본문
A real-time voice AI tutor that lives on your desktop. It sees your screen, hears your audio, and teaches you vocabulary, grammar, and pronunciation — out loud, by voice — while you watch anime, browse the web, or read a book. No typing. No flashcards. Just say "Hey Samuel." Keywords: AI language tutor, real-time voice teaching, learn Japanese while watching anime, AI desktop pet, ambient learning agent, OpenAI Realtime API voice agent, screen-aware AI, Tauri desktop app, voice-first AI assistant, learn languages from video sticker_2_compressed.mp4 samuel_v2_option2_compressed.mp4 You're watching a video. Samuel sees the subtitles, hears the dialogue, and speaks to you by voice: "食べる — 'to eat', sir." You don't press anything. You don't look away from the video. He just tells you. Ask "what did they just say?" and he answers instantly — because he was already listening. You want to learn a language. You download Duolingo. You do it for a week. You stop. Meanwhile, you watch 3 hours of anime, K-dramas, or YouTube every day — content in the language you want to learn — and retain nothing because there's no one sitting next to you explaining what the words mean. Samuel is that person. Except he speaks to you in real time, by voice, hands-free. Samuel doesn't send you text notifications or flashcards. He talks to you: - You watch a video with Japanese subtitles - Samuel sees "取得していること" on your screen and hears the audio - He speaks: "取得 — 'to acquire', sir. This is a formal requirement pattern." - You keep watching. 20 seconds later, he catches another word. - You say "what did they just say?" — he heard the whole clip and tells you All voice. All real time. Zero interruption to your workflow. See a word you don't understand? Highlight it. Say "what's this word?" Samuel reads your exact text selection from the clipboard — no guessing from screenshots — and teaches you the meaning, reading, and usage by voice. Tell Samuel "I already know that" and he permanently stops teaching that word. Tell him "I'm intermediate" and he skips beginner content. His memory persists across sessions — he adapts to your level over time. | Duolingo / Busuu / Anki | ChatGPT / Gemini | Samuel | | |---|---|---|---| | Teaches by voice | No | Text only | Yes — real-time speech | | Watches your screen | No | No | Yes — sees subtitles, web pages, books | | Listens to audio | No | No | Yes — hears video dialogue, podcasts | | Teaches from YOUR content | No (app exercises) | Only if you paste it | Automatic — whatever is on screen | | Hands-free | No | No | Yes — "Hey Samuel" wake word | | Remembers your level | Per-app only | Per-session | Permanent adaptive memory | | Always available | Must open app | Must open tab | Floats on desktop 24/7 | - Speaks to you — not text, not notifications. Actual voice output via OpenAI Realtime API (WebRTC, ~/.books-reader.json # Grant Screen Recording: System Settings → Privacy & Security → Screen Recording → add peekaboo + samuel npm run tauri:dev Say "Hey Samuel" and start learning. | Mode | Cost | |---|---| | Wake word (always listening) | ~$0.006/min | | Ambient teaching (screen + audio + triage) | ~$0.02–0.05/min | | Book reading | ~$0.01/page | | Voice conversation | Standard Realtime API pricing | - macOS only — relies on Apple Books, Peekaboo, ScreenCaptureKit - DRM content — protected books may produce black screenshots - GPT-5.4 access — required for Computer Use navigation - Copyright — Vision API may decline to transcribe copyrighted text verbatim - Local on-device wake word (zero-cost, instant activation) - Pre-routing classifier (GPT-4o-mini intent classification before tool selection) - Custom AI-generated companion characters via Rive - Anki flashcard export from learned vocabulary - iOS / Android companion app - Plugin system for custom tools and behaviors Samuel is a solo project, but the ambient voice teaching pattern has a lot of unexplored potential. Issues and PRs welcome. MIT Built by Sam Feng — if Samuel helps you learn, star the repo so others can find it.
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유