HN 표시: Apple Silicon용 Gemma 4 다중 모드 미세 조정기

hackernews | 2026년 4월 8일 04:41 | 📦 오픈소스

#apple silicon #fine-tuning #gemma #lora #멀티모달 #하드웨어/반도체

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

애플 실리콘(MPS) 환경에 최적화된 오픈소스 툴킷을 통해 NVIDIA GPU나 고가의 H100 서버 없이도 맥(Mac) 환경에서 구글의 Gemma 모델을 텍스트, 이미지, 오디오 멀티모달로 파인튜닝할 수 있습니다. 이 도구는 로컬 CSV 파일을 활용한 이미지 캡셔닝 및 음성 인식 학습을 지원할 뿐만 아니라, GCS나 BigQuery 같은 클라우드에서 테라바이트 단위의 방대한 학습 데이터를 스트리밍하여 기기의 저장공간 부족 문제를 해결합니다. 의료, 법률 등 특수 도메인 맞춤형 시각·음향 모델 학습 및 개인정보가 로컬에서만 처리되는 온디바이스 AI 파이프라인 구축에 매우 유용하게 활용될 수 있습니다.

본문

Fine-tune Gemma on text, images, and audio — on your Mac, on data that doesn't fit on your Mac. - 🖼️ Image + text LoRA — captioning and VQA on local CSV. - 🎙️ Audio + text LoRA — the only Apple-Silicon-native path that does this. - 📝 Text-only LoRA — instruction or completion on CSV. - ☁️ Stream from GCS / BigQuery — train on terabytes without filling your SSD. - 🍎 Runs on Apple Silicon — MPS-native, no NVIDIA box required. Source: github.com/mattmireles/gemma-tuner-multimodal (public). | This | MLX-LM | Unsloth | axolotl | | |---|---|---|---|---| | Fine-tune Gemma (text-only CSV) | ✅ | ✅ | ✅ | ✅ | | Fine-tune Gemma image + text (caption / VQA CSV) | ✅ | ||| | Fine-tune Gemma audio + text | ✅ | ❌ | ❌ | | | Runs on Apple Silicon (MPS) | ✅ | ✅ | ❌ | ❌ | | Stream training data from cloud | ✅ | ❌ | ❌ | | | No NVIDIA GPU required | ✅ | ✅ | ❌ | ❌ | If you want to fine-tune Gemma on text, images, or audio without renting an H100 or copying a terabyte of data to your laptop, this is the only toolkit that does all three modalities on Apple Silicon. Text-only fine-tuning (instruction or completion on CSV) is supported: set modality = text in your profile and use local CSV splits under data/datasets// . See Text-only fine-tuning below. Image + text fine-tuning (captioning or VQA on local CSV) uses modality = image , image_sub_mode , and image_token_budget ; see Image fine-tuning below. v1 is local CSV only (same constraint as text-only). Under the hood: Hugging Face Gemma checkpoints + PEFT LoRA, supervised fine-tuning in gemma_tuner/models/gemma/finetune.py , exported as a merged HF / SafeTensors tree by gemma_tuner/scripts/export.py . For Core ML conversion and GGUF inference tooling, see README/guides/README.md — this repo's training path is Gemma-only by design. Deeper reading: README/guides/README.md · README/specifications/Gemma3n.md - Domain-specific ASR — fine-tune on medical dictation, legal depositions, call-center recordings, or any field where off-the-shelf Whisper / Gemma mishears the jargon. - Domain-specific vision — captioning or VQA on receipts, charts, screenshots, manufacturing defects, medical imagery — any visual domain where generic models hallucinate. - Document & screen understanding — train on screenshot → structured-output pairs for UI agents, OCR-adjacent pipelines, or chart QA. - Accent, dialect, and low-resource language adaptation — adapt a base Gemma model to underrepresented voices and languages with your own labeled audio. - Multimodal assistants — extend Gemma's text reasoning with image or audio grounding for transcription, captioning, and Q&A pipelines. - Private, on-device pipelines — train and run entirely on your Mac. Data never leaves the machine; weights never touch a third-party API. If your data lives in GCS or BigQuery, you can do all of this on a laptop without copying terabytes locally — the dataloader streams shards on demand. Training targets Gemma multimodal (text + image + audio) checkpoints loaded via base_model in config/config.ini and routed to gemma_tuner/models/gemma/finetune.py . The default file ships these [model:…] entries (LoRA on top of the Hub weights): Model key (in config/config.ini ) | Hugging Face base_model | Notes | |---|---|---| gemma-4-e2b-it | google/gemma-4-E2B-it | Gemma 4 instruct, ~2B — requires requirements/requirements-gemma4.txt (see Installation) | gemma-4-e4b-it | google/gemma-4-E4B-it | Gemma 4 instruct, ~4B — requires Gemma 4 stack | gemma-4-e2b | google/gemma-4-E2B | Gemma 4 base — requires Gemma 4 stack | gemma-4-e4b | google/gemma-4-E4B | Gemma 4 base — requires Gemma 4 stack | gemma-3n-e2b-it | google/gemma-3n-E2B-it | Gemma 3n instruct, ~2B — default on the base pip install -e . pin | gemma-3n-e4b-it | google/gemma-3n-E4B-it | Gemma 3n instruct, ~4B | Add your own [model:your-name] section with group = gemma and a compatible base_model if you need another any-to-any Gemma 3n / Gemma 4 E2B–E4B checkpoint. Larger Gemma 4 weights on Hugging Face (for example 26B or 31B class) use a different Transformers architecture than this trainer’s AutoModelForCausalLM audio path—they are not supported here yet. Wizard time and memory hints come from gemma_tuner/wizard/base.py (ModelSpecs ). | Piece | Role | |---|---| gemma_tuner/cli_typer.py | Canonical CLI (gemma-macos-tuner ). Imports core.bootstrap early so MPS env vars exist before Torch wakes up. | gemma_tuner/core/ops.py | Dispatches prepare → scripts.prepare_data , finetune → scripts.finetune , evaluate → scripts.evaluate , export → scripts.export . | gemma_tuner/scripts/finetune.py | Router: only models whose name contains gemma → gemma_tuner/models/gemma/finetune.py . | gemma_tuner/utils/device.py | MPS → CUDA → CPU selection, sync helpers, memory hints. | gemma_tuner/utils/dataset_utils.py | CSV loads, patches, blacklist/protection semantics. | gemma_tuner/wizard/ | Questionary + Rich UI; training is spawned with python -m gemma_tuner.main finetune … from the repo root (see gemma_tuner/

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기