LeWorldModel: 픽셀의 안정적인 E2E 조인트 임베딩 예측 아키텍처

hackernews | | 🔬 연구
#e2e #jepa #leworldmodel #review #머신러닝 #조인트 임베딩
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

LeWorldModel(LeWM)은 기존 방식들의 불안정성을 개선하여, 복잡한 추가 설정 없이 원시 픽셀 데이터에서부터 학습 가능한 최초의 안정적인 JEPA(조인트 임베딩 예측 아키텍처)입니다. 단 두 개의 손실 함수만을 사용하여 튜닝 가능한 하이퍼파라미터를 6개에서 1개로 줄인 이 모델은, 약 1,500만 개의 파라미터로 단일 GPU에서 몇 시간 만에 학습될 수 있습니다. LeWM은 기존 기반 모델 기반의 월드 모델 대비 최대 48배 빠른 계획 속도를 자랑하며, 다양한 2D 및 3D 제어 작업에서 경쟁력 있는 성능을 보이고 물리적 타당성을 검증하는 데에도 성공했습니다.

본문

Computer Science > Machine Learning Title:LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels View PDF HTML (experimental)Abstract:Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events. Bibliographic and Citation Tools Code, Data and Media Associated with this Article Demos Recommenders and Search Tools arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →