Show HN: Apple Intelligence를 사용하여 로컬 코딩 에이전트를 구축했습니다.

hackernews | 2026년 4월 9일 22:12 | 🔬 연구

#afm #ai 딜 #anthropic #apple intelligence #claude #llama #openai #swift #온디바이스 ai #코딩 에이전트

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

개발자가 애플 실리콘 기기에서 완벽하게 구동되는 오픈소스 코딩 에이전트 'Junco'를 개발하여 공개했습니다. 이 도구는 단일 바이너리 형태로 제공되며, 애플 파운데이션 모델(AFM)의 3B 모델과 Neural Engine을 활용해 오프라인에서도 초당 40~80개 이상의 토큰을 생성하는 빠른 속도를 자랑합니다. 최신 AI 모델들에 비해 추론 능력이나 지식이 부족하고 오류가 많은 한계가 있지만, 클라우드 비용과 API 의존성 없이 민감한 소스 코드와 개인정보를 안전하게 보호할 수 있다는 강력한 장점을 제공합니다.

본문

I built a fully on-device coding agent called Junco, using Apple Intelligence, the Apple Foundation Model (AFM), in Swift. Why? To learn, but also to build a tool I want to use myself. Why call it “Junco?” Because nothing else was named “Junco,” it fits the Swift-related bird theme, and, to be honest, it’s kind of junky compared to Claude Code. Does Junco work? Kind of! Honestly, not well. It has some very rough edges, but you can try it yourself. Install Junco with a single command: curl -fsSL https://raw.githubusercontent.com/LastByteLLC/junco/master/install.sh | bash What is Junco? # Junco is a free and open-source (MIT-licensed), fully on-device coding agent built in Swift for Swift. It’s a single Mach-O binary that includes a terminal user interface (TUI) and leverages Apple Intelligence’s LanguageModelSession , or other on-device models via Ollama and AnyLanguageModel . ──────────────────────────────────────────────────────────── junco v0.6.0 — on-device AI coding agent Domain: Swift / Apple │ Git: branch: master | 14 files changed Dir: ~/Documents/GitHub/junco Files: 148 │ Reflections: 61 Model: Apple Foundation Models (Neural Engine) /help for commands │ @file to target │ exit to quit ──────────────────────────────────────────────────────────── Why build Junco? # I was inspired by Ivan Magda’s swift-claude-code and his 8-part series on developing a Claude Code-like command line interface (CLI) tool in Swift that covers agentic loop, tool calls, subagents, and task planning. Generally, local models are worse at reasoning, worse at following instructions, more likely to hallucinate, and lack up-to-date world knowledge when compared to state of the art (SOTA) frontier models. Nonetheless, local models like Gemma 4 and Qwen 3.5 continue to get better. Local models have their own benefits like: - Work offline - Better privacy - Lower energy use - No subscription fees Claude Code launched just over a year ago in February 2025. Now we’re in a Code Overload. I believe local models are on the same, albeit delayed, exponential trajectory. Why fully local? # Claude Code and Codex are incredibly powerful, but are limited by cost, privacy, and connectivity. A fully local coding agent has several advantages. - Unit economics: build once, sell many times without the up-front capital for advanced GPUs or operating expense of inference-as-a-service - Trade secret preservation: local inference means intellectual property isn’t revealed to providers like Anthropic or OpenAI - Data privacy: protected health information (PHI), personally-identifiable information (PII), and other sensitive data doesn’t get sent to AI providers In high-stakes industries like Defense, High-Frequency Trading, and Medical Research, sending source code to a third-party API is often a serious offense. Cloud inference always has a marginal cost, either per token or amortized into a subscription. A local agent could run air-gapped without fear it will train on your trade secrets or leak your .env secrets. Imagine agentic coding on an intercontinental flight without WiFi, or automatic agentic code review on a Self-Hosted GitHub Runner (without paying $15-25 per pull request). Why not use OpenCode? # OpenCode is great. I use it to reverse-engineer Android APIs, for JavaScript-to-TypeScript migrations, and much more. However, it’s not designed to work with the AFM’s miniscule 4K context window. In my limited testing, OpenCode with the Apple Foundation Model almost never succeeded without overflowing the context window, and rarely produced valid code. Junco at a glance # - Completely free and open source under the MIT license - A single signed & notarized Mach-O binary (~9MB) - Written entirely in Swift 6.2+ - Works exclusively on macOS 26+ with Apple Silicon (M1+) - Trained specifically to edit and generate Swift code Lessons & Limitations # Apple Intelligence isn’t very intelligent. In my experience, Apple’s 3B model performs worse than just about any similarly-sized model. So why work with it? - It comes pre-installed. Go to Settings > Apple Intelligence & Siri > Enable, that’s it! This is a huge win for user onboarding. - It’s highly optimized. I get ~40-80+ tokens per second on an M4 Air. The AFM uses less energy and respond faster than most local 2-4B INT4 quantized models in Ollama. - It can be taught. You can train a custom adapter for the AFM, augmenting its style, format, and (to some extent) knowledge. So what did I learn? Ask not what your model can do for you # Ask what you can do for your model. Help local models help you: train a text classifier, use named-entity recognition (NER), parse prompts with regular expressions, and search the web. Don’t ask a tiny model to do everything because they’re slow, they hallucinate, and they lack world knowledge. This screenshot is from maclocal-api , using the Llama.cpp Web UI. Clearly the AFM has a knowledge cutoff before the 2024 US Presidential Election. Set the model up for success by giving i

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기