Show HN: TRELLIS.2 image-to-3D running on Mac Silicon – no Nvidia GPU needed

hackernews | | 📦 오픈소스
#하드웨어/반도체
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

MS의 최신 모델인 TRELLIS.2를 Apple Silicon의 PyTorch MPS 백엔드로 이식하여, Nvidia GPU 없이 Mac에서도 이미지를 3D로 변환할 수 있게 되었습니다. M4 Pro 기준 약 3.5분 만에 40만 개 이상의 정점을 가진 메시를 생성하며, OBJ와 GLB 파일 형식으로 출력됩니다. 이 과정은 활성 복셀에 대한 공간 해시 구축 및 캐싱을 통한 희소 3D 합성곱 구현과 CUDA 해시맵 대신 Python 사전을 활용한 메시 추출 기술을 활용합니다.

본문

Run TRELLIS.2 image-to-3D generation natively on Mac. This is a port of Microsoft's TRELLIS.2 — a state-of-the-art image-to-3D model — from CUDA-only to Apple Silicon via PyTorch MPS. No NVIDIA GPU required. Generates 400K+ vertex meshes from single images in ~3.5 minutes on M4 Pro. Output includes vertex-colored OBJ and GLB files ready for use in 3D applications. - macOS on Apple Silicon (M1 or later) - Python 3.11+ - 24GB+ unified memory recommended (the 4B model is large) - ~15GB disk space for model weights (downloaded on first run) # Clone this repo git clone https://github.com/shivampkumar/trellis-mac.git cd trellis-mac # Log into HuggingFace (needed for gated model weights) hf auth login # Request access to these gated models (usually instant approval): # https://huggingface.co/facebook/dinov3-vitl16-pretrain-lvd1689m # https://huggingface.co/briaai/RMBG-2.0 # Run setup (creates venv, installs deps, clones & patches TRELLIS.2) bash setup.sh # Activate the environment source .venv/bin/activate # Generate a 3D model from an image python generate.py path/to/image.png Output files are saved to the current directory (or use --output to specify a path). # Basic usage python generate.py photo.png # With options python generate.py photo.png --seed 123 --output my_model --pipeline-type 512 # All options python generate.py --help | Option | Default | Description | |---|---|---| --seed | 42 | Random seed for generation | --output | output_3d | Output filename (without extension) | --pipeline-type | 512 | Pipeline resolution: 512 , 1024 , 1024_cascade | TRELLIS.2 depends on several CUDA-only libraries. This port replaces them with pure-PyTorch and pure-Python alternatives: | Original (CUDA) | Replacement | Purpose | |---|---|---| flex_gemm | backends/conv_none.py | Sparse 3D convolution via gather-scatter | o_voxel._C hashmap | backends/mesh_extract.py | Mesh extraction from dual voxel grid | flash_attn | PyTorch SDPA | Scaled dot-product attention for sparse transformers | cumesh | Stub (graceful skip) | Hole filling, mesh simplification | nvdiffrast | Stub | Differentiable rasterization (texture export) | Additionally, all hardcoded .cuda() calls throughout the codebase were patched to use the active device instead. Sparse 3D Convolution (backends/conv_none.py ): Implements submanifold sparse convolution by building a spatial hash of active voxels, gathering neighbor features for each kernel position, applying weights via matrix multiplication, and scatter-adding results back. Neighbor maps are cached per-tensor to avoid redundant computation. Mesh Extraction (backends/mesh_extract.py ): Reimplements flexible_dual_grid_to_mesh using Python dictionaries instead of CUDA hashmap operations. Builds a coordinate-to-index lookup table, finds connected voxels for each edge, and triangulates quads using normal alignment heuristics. Attention (patched full_attn.py ): Adds an SDPA backend to the sparse attention module. Pads variable-length sequences into batches, runs torch.nn.functional.scaled_dot_product_attention , then unpads results. Benchmarks on M4 Pro (24GB), pipeline type 512 : | Stage | Time | |---|---| | Model loading | ~45s | | Image preprocessing | ~5s | | Sparse structure sampling | ~15s | | Shape SLat sampling | ~90s | | Texture SLat sampling | ~50s | | Mesh decoding | ~30s | | Total | ~3.5 min | Memory usage peaks at around 18GB unified memory during generation. - No texture export: Texture baking requires nvdiffrast (CUDA-only differentiable rasterizer). Meshes export with vertex colors only. - Hole filling disabled: Mesh hole filling requires cumesh (CUDA). Meshes may have small holes. - Slower than CUDA: The pure-PyTorch sparse convolution is ~10x slower than the CUDA flex_gemm kernel. This is the main bottleneck. - No training support: Inference only. The porting code in this repository (backends, patches, scripts) is released under the MIT License. Upstream model weights are subject to their own licenses: - TRELLIS.2: MIT License - DINOv3: Meta custom license (gated, review before commercial use) - RMBG-2.0: CC BY-NC 4.0 (non-commercial; commercial use requires a license from BRIA)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →