HN 보기: iPhone 13 Pro에서 Gemma 4 실행

hackernews | 2026년 4월 15일 18:54 | 📦 오픈소스

#하드웨어/반도체 #gemma #ios #litert #swift

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

커뮤니티에서 개발된 Swift 패키지인 LiteRTLMSwift를 통해 iOS 17.0 이상의 기기에서 Gemma 4 E2B 같은 LiteRT-LM 모델을 실행할 수 있게 되었습니다. 이 패키지는 텍스트 생성뿐만 아니라 이미지와 오디오 이해, 스트리밍 등 멀티모달 기능을 async/await 인터페이스로 지원하며 모델 로딩을 위해 최소 6GB의 RAM과 메모리 한도 권한이 필요합니다.

본문

Swift package for running LiteRT-LM models on iOS. Wraps Google's C API in a clean, async/await Swift interface. Supports text generation, vision (image understanding), audio (speech/sound understanding), and streaming with models like Gemma 4 E2B. Note: This is a community project, not an official Google product. The included CLiteRTLM.xcframework is built from Google's open-source LiteRT-LM C API (Apache 2.0). - iOS 17.0+ - Xcode 16+ - iPhone 13 Pro or later (6 GB+ RAM required for Gemma 4 E2B) increased-memory-limit entitlement (model loading needs ~4 GB RAM) How to add the increased-memory-limit entitlement In Xcode: select your app target > Signing & Capabilities > + Capability > search "Increased Memory Limit". Or add manually to your .entitlements file: com.apple.developer.kernel.increased-memory-limit Without this entitlement the system may kill your app during model loading. Add to your Package.swift : dependencies: [ .package(url: "https://github.com/mylovelycodes/LiteRTLM-Swift.git", from: "0.1.0") ], targets: [ .target( name: "YourApp", dependencies: [ .product(name: "LiteRTLMSwift", package: "LiteRTLM-Swift") ] ) ] Or in Xcode: File > Add Package Dependencies > paste the repo URL > add LiteRTLMSwift to your target. A complete end-to-end example: import LiteRTLMSwift // 1. Download model (~2.6 GB, only needed once) let downloader = ModelDownloader() try await downloader.download() // defaults to Gemma 4 E2B from HuggingFace // 2. Load engine let engine = LiteRTLMEngine(modelPath: downloader.modelPath) try await engine.load() // takes ~5-10s on first launch // 3. Generate text let response = try await engine.generate( prompt: "user\nWhat is Swift?\n\nmodel\n", temperature: 0.7, maxTokens: 256 ) print(response) // 4. Vision (image understanding) let imageData = try Data(contentsOf: photoURL) let caption = try await engine.vision( imageData: imageData, // JPEG, PNG, or HEIC prompt: "Describe this photo.", maxTokens: 512 ) print(caption) // 5. Audio (speech/sound understanding) let audioData = try Data(contentsOf: audioURL) let transcript = try await engine.audio( audioData: audioData, // WAV, FLAC, or MP3 prompt: "Transcribe this audio.", maxTokens: 512 ) print(transcript) Important: Text generation ( generate ,generateStreaming ,openSession ) requires Gemma 4's turn marker format in the prompt (see Prompt Format). Vision, audio, and multimodal methods take plain text prompts — the Conversation API handles formatting internally. for try await chunk in engine.generateStreaming( prompt: "user\nTell me a story.\n\nmodel\n" ) { print(chunk, terminator: "") } let answer = try await engine.visionMultiImage( imagesData: [image1Data, image2Data], prompt: "Compare these two photos.", maxTokens: 1024 ) Supports WAV, FLAC, and MP3. Audio is automatically resampled to 16 kHz mono internally. let audioData = try Data(contentsOf: recordingURL) // Transcription (default format: .wav) let text = try await engine.audio( audioData: audioData, prompt: "Transcribe this audio." ) // MP3 file let mp3Data = try Data(contentsOf: mp3URL) let summary = try await engine.audio( audioData: mp3Data, prompt: "Summarize what is being said.", format: .mp3, maxTokens: 1024 ) Analyze audio and images together in a single query: let response = try await engine.multimodal( audioData: [audioTrackData], imagesData: [keyframeData], prompt: "Does the speaker's description match what's shown in the image?" ) For multi-turn conversations, use the persistent session API. The KV cache is preserved across turns, reducing time-to-first-token from ~20s to ~1-2s on follow-up messages. // Open a persistent session try await engine.openSession(temperature: 0.7, maxTokens: 512) // First turn — full prefill (~15-20s TTFT) for try await chunk in engine.sessionGenerateStreaming( input: "user\nHello!\n\nmodel\n" ) { print(chunk, terminator: "") } // Second turn — incremental prefill (~1-2s TTFT) for try await chunk in engine.sessionGenerateStreaming( input: "\nuser\nTell me more.\n\nmodel\n" ) { print(chunk, terminator: "") } // Clean up when done engine.closeSession() ModelDownloader is @Observable , so you can bind directly in SwiftUI: struct DownloadView: View { @State private var downloader = ModelDownloader() var body: some View { switch downloader.status { case .notStarted: Button("Download Model (\(downloader.totalBytesDisplay))") { Task { try await downloader.download() } } case .downloading(let progress): ProgressView(value: progress) Text("\(downloader.downloadedBytesDisplay) / \(downloader.totalBytesDisplay)") Button("Pause") { downloader.pause() } case .paused: Button("Resume") { Task { try await downloader.download() } } case .completed: Text("Model ready!") case .failed(let msg): Text("Error: \(msg)") Button("Retry") { Task { try await downloader.download() } } } } } struct EngineView: View { @State private var engine: LiteRTLMEngine init() { let path = ModelDownloader().modelPath _engine = State(initialValue: LiteRTLMEngine(modelPath: path)) } var body: some View { Group { switch engine.status { case .notLoaded: Button("Load Model") { Task { try await engine.load() } } case .loading: ProgressView("Loading model...") case .ready: Text("Ready for inference!") case .error(let msg): Text("Error: \(msg)") } } } } | Method | Description | |---|---| init(modelPath:backend:) | Create engine. backend : "cpu" (default, recommended) or "gpu" (experimental, Metal) | load() | Load the .litertlm model. Call once, reuse across inferences | unload() | Free model memory | generate(prompt:temperature:maxTokens:) | One-shot text generation. Prompt must use Gemma turn markers | generateStreaming(prompt:temperature:maxTokens:) | Streaming text generation | vision(imageData:prompt:temperature:maxTokens:maxImageDimension:) | Single-image understanding. Plain text prompt | visionMultiImage(imagesData:prompt:temperature:maxTokens:maxImageDimension:) | Multi-image understanding | audio(audioData:prompt:format:temperature:maxTokens:) | Audio understanding (WAV, FLAC, MP3). Plain text prompt | multimodal(audioData:audioFormat:imagesData:prompt:temperature:maxTokens:maxImageDimension:) | Combined audio + vision inference | openSession(temperature:maxTokens:) | Open persistent session for multi-turn chat (KV cache reuse) | sessionGenerateStreaming(input:) | Stream generation using persistent session | closeSession() | Close persistent session, free KV cache | | Property | Type | Description | |---|---|---| status | Status | .notLoaded , .loading , .ready , or .error(String) | isReady | Bool | Whether the engine is ready for inference | | Method | Description | |---|---| init(modelsDirectory:) | Create downloader. Default path: ~/Library/Application Support/LiteRTLM/Models/ | download(from:) | Download model from URL. Defaults to defaultModelURL (HuggingFace) | pause() | Pause download. Resume data is persisted to disk | cancel() | Cancel download and discard resume data | deleteModel() | Delete the downloaded model file | | Property | Type | Description | |---|---|---| status | DownloadStatus | Current download state | progress | Double | 0.0 to 1.0 | isDownloaded | Bool | Whether the model file exists on disk | modelPath | URL | Full path to model file (use with LiteRTLMEngine(modelPath:) ) | The Session API (text generation) requires Gemma 4's native turn marker format. The Conversation API (vision) does NOT — just pass plain text. user Your message here model With system prompt: system You are a helpful assistant. user Hello! model Multi-turn (for persistent session — only send the NEW content each turn): # First turn input: user Hello! model # Second turn input (note the closing marker from previous model turn): user Tell me more. model ┌──────────────────────────────────────────────┐ │ Your App │ ├──────────────────────────────────────────────┤ │ LiteRTLMSwift │ │ ┌─────────────────┐ ┌──────────────────┐ │ │ │ LiteRTLMEngine │ │ ModelDownloader │ │ │ │ │ │ │ │ │ │ .generate() │ │ .download() │ │ │ │ .vision() │ │ .pause() │ │ │ │ .audio() │ │ .cancel() │ │ │ │ .multimodal() │ │ │ │ │ │ .openSession() │ │ │ │ │ └────────┬────────┘ └──────────────────┘ │ │ │ │ │ Serial DispatchQueue │ │ (thread safety) │ ├───────────┼──────────────────────────────────┤ │ CLiteRTLM.xcframework (C API) │ │ │ │ │ Session API Conversation API │ │ (text in/out) (multimodal JSON) │ │ │ │ For text generation For vision / audio / │ │ Raw prompt format multimodal inference │ └──────────────────────────────────────────────┘ - Session API — raw text prompts via InputData . You control the prompt format. Used bygenerate() ,generateStreaming() ,openSession() . - Conversation API — JSON-based messages with image/audio file paths. Handles image decode/resize/patchify and audio decode/resample/mel-spectrogram internally. Used by vision() ,visionMultiImage() ,audio() ,multimodal() . - All C API calls are serialized on a single DispatchQueue for thread safety. LiteRT-LM supports only one active session at a time. This repo ships a prebuilt CLiteRTLM.xcframework . If you want to build it yourself (e.g. to pick up upstream fixes or try the GPU backend), follow the steps below. | Tool | Version | Install | |---|---|---| | Bazel | 7.6.1 | brew install bazelisk (auto-downloads correct version) | | Xcode | 16+ | Mac App Store | | Disk space | ~20 GB | Bazel build cache | # Clones LiteRT-LM source automatically and builds xcframework ./scripts/build-xcframework.sh # Or point to an existing local checkout ./scripts/build-xcframework.sh ~/Dev/LiteRT-LM The script will: - Clone (or use existing) google-ai-edge/LiteRT-LM source - Build libLiteRTLMEngine.dylib forios_arm64 (device) andios_sim_arm64 (simulator) - Package both into Frameworks/LiteRTLM.xcframework git clone https://github.com/google-ai-edge/LiteRT-LM.git cd LiteRT-LM bazel build --config=ios_arm64 //c:libLiteRTLMEngine.dylib Output: bazel-bin/c/libLiteRTLMEngine.dylib The Bazel build target is defined in c/BUILD : linkshared = True +linkstatic = True — produces a self-contained dylib with all C++ deps statically linked-Wl,-exported_symbol,_litert_lm_* — only exports the public C API symbols # Save device dylib first (Bazel overwrites bazel-bin between configs) cp bazel-bin/c/libLiteRTLMEngine.dylib /tmp/libLiteRTLMEngine-device.dylib bazel build --config=ios_sim_arm64 //c:libLiteRTLMEngine.dylib cp bazel-bin/c/libLiteRTLMEngine.dylib /tmp/libLiteRTLMEngine-sim.dylib Available iOS configs in .bazelrc : | Config | Architecture | Use Case | |---|---|---| ios_arm64 | arm64 | Physical device | ios_sim_arm64 | arm64 | Apple Silicon simulator | ios_x86_64 | x86_64 | Intel Mac simulator | ios_arm64e | arm64e | A12+ with pointer auth | Each architecture needs to be wrapped in a .framework bundle before creating the xcframework. # Device framework mkdir -p /tmp/ios-arm64/CLiteRTLM.framework/{Headers,Modules} cp /tmp/libLiteRTLMEngine-device.dylib /tmp/ios-arm64/CLiteRTLM.framework/CLiteRTLM install_name_tool -id "@rpath/CLiteRTLM.framework/CLiteRTLM" /tmp/ios-arm64/CLiteRTLM.framework/CLiteRTLM # Simulator framework mkdir -p /tmp/ios-arm64-simulator/CLiteRTLM.framework/{Headers,Modules} cp /tmp/libLiteRTLMEngine-sim.dylib /tmp/ios-arm64-simulator/CLiteRTLM.framework/CLiteRTLM install_name_tool -id "@rpath/CLiteRTLM.framework/CLiteRTLM" /tmp/ios-arm64-simulator/CLiteRTLM.framework/CLiteRTLM Copy headers (from the LiteRT-LM source c/ directory): for DIR in /tmp/ios-arm64 /tmp/ios-arm64-simulator; do cp c/engine.h "$DIR/CLiteRTLM.framework/Headers/" cp c/litert_lm_logging.h "$DIR/CLiteRTLM.framework/Headers/" done Create module.modulemap (same for both): for DIR in /tmp/ios-arm64 /tmp/ios-arm64-simulator; do cat > "$DIR/CLiteRTLM.framework/Modules/module.modulemap" "$DIR/CLiteRTLM.framework/Info.plist" CFBundleExecutable CLiteRTLM CFBundleIdentifier com.google.CLiteRTLM CFBundleInfoDictionaryVersion 6.0 CFBundleName CLiteRTLM CFBundlePackageType FMWK CFBundleVersion 1.0 MinimumOSVersion 13.0 EOF done Ad-hoc code sign: codesign --force --sign - /tmp/ios-arm64/CLiteRTLM.framework/CLiteRTLM codesign --force --sign - /tmp/ios-arm64-simulator/CLiteRTLM.framework/CLiteRTLM xcodebuild -create-xcframework \ -framework /tmp/ios-arm64/CLiteRTLM.framework \ -framework /tmp/ios-arm64-simulator/CLiteRTLM.framework \ -output Frameworks/LiteRTLM.xcframework # Check architectures file Frameworks/LiteRTLM.xcframework/ios-arm64/CLiteRTLM.framework/CLiteRTLM # -> Mach-O 64-bit dynamically linked shared library arm64 file Frameworks/LiteRTLM.xcframework/ios-arm64-simulator/CLiteRTLM.framework/CLiteRTLM # -> Mach-O 64-bit dynamically linked shared library arm64 (simulator) # Check exported symbols nm -gU Frameworks/LiteRTLM.xcframework/ios-arm64/CLiteRTLM.framework/CLiteRTLM | grep litert_lm # Should list all litert_lm_* public API functions | Issue | Solution | |---|---| no such package '@build_bazel_apple_support' | Run bazel sync to fetch external dependencies | | Xcode SDK not found | Ensure Xcode is selected: sudo xcode-select -s /Applications/Xcode.app | | Build takes very long | First build downloads ~10 GB of deps. Subsequent builds use cache | Undefined symbols at link time | Make sure you're using //c:libLiteRTLMEngine.dylib target, not //c:engine | | Code signing errors | Use ad-hoc signing (--sign - ) for development; real signing happens at app archive | MIT License. See LICENSE. The CLiteRTLM.xcframework contains code from Google's LiteRT-LM project, licensed under the Apache License 2.0.

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기