Show HN: Quantum-PULSE – LLM 교육 데이터를 위한 압축 후 암호화 저장소

hackernews | | 🔬 연구
#llm #rest api #review #데모 #데이터 압축 #암호화 #파이프라인
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

오픈소스 프로젝트 'QUANTUM-PULSE'는 LLM 학습 데이터를 안전하게 보관하기 위해 교차 코퍼스 Zstd 사전을 통한 압축과 AES-256-GCM 암호화, SHA3-256 머클 트리 무결성 검증을 동시에 제공하는 특수 저장소입니다. 이 도구는 압축만 지원하는 기존 방식들과 달리 보안 기능을 모두 포함하면서도 zstd-L22보다 3배 빠르고 gzip 대비 95.51배의 높은 압축률을 기록하는 등 가장 빠른 보안 파이프라인 성능을 입증했습니다. 사용자는 CLI 명령어나 REST API를 통해 로컬 환경은 물론 AWS S3 및 구글 클라우드 스토리지(GCS)와 같은 클라우드 백엔드에 데이터를 손쉽게 암호화하여 저장할 수 있습니다.

본문

Extreme-density encrypted data vault for LLM training pipelines. MsgPack + Zstd-L22 + corpus dictionary + AES-256-GCM + SHA3-256 Merkle trees + REST API ▶ Full 26s demo video · ⭐ Star on GitHub pip install quantum-pulse → qp seal dataset.json --offline → 39× compression + AES-256-GCM pip install quantum-pulse Or clone for the full server + Docker setup: git clone https://github.com/Naveenub/quantum-pulse.git cd quantum-pulse cp .env.example .env docker-compose up -d QUANTUM-PULSE is an open-source compress-then-encrypt vault built specifically for LLM training data. Every blob is compressed with a cross-corpus Zstd dictionary, encrypted with AES-256-GCM, integrity-verified with a SHA3-256 Merkle tree, and stored in your chosen backend — all through a single API call or CLI command. Those tools only compress. QUANTUM-PULSE also: - Trains a shared dictionary across your corpus so every shard benefits from every other shard's patterns - Encrypts each blob with a per-record key derived via PBKDF2 + HKDF - Verifies integrity via SHA3-256 Merkle trees on unseal — silently corrupted data is impossible - Groups related shards into MasterPulses with cross-shard deduplication - Exposes a virtual mount so training scripts read vaulted data without ever decrypting to disk Cryptographic tools earn trust through scrutiny, not marketing. QUANTUM-PULSE is open source because: - Crypto needs public review — before anyone puts real training data through this pipeline, the implementation should be auditable. Security through obscurity is not security. - Community builds better benchmarks — ML engineers working with real datasets will find edge cases no synthetic corpus can simulate. Submit yours to benchmarks/community/ . - Adoption precedes monetization — if this solves a real problem at scale, a hosted version becomes a natural next step. Demand should be proven, not assumed. Self-hosting a secure vault means managing keys, uptime, and backups yourself. A managed version of QUANTUM-PULSE is in planning: - Zero-ops — no MongoDB to run, no passphrase rotation to script - Metered billing — pay per GiB sealed, not per seat - Compliance-ready — audit log export, key rotation SLA, SOC 2 roadmap - Same open protocol — data sealed via the API can always be unsealed with the self-hosted version; no lock-in Interested in early access? Open a GitHub Discussion or star the repo to signal demand. ──────────────────────────────────────────────────────────────────── Algorithm Ratio vs gzip Time Enc Int ──────────────────────────────────────────────────────────────────── snappy 12.03× −80.9% 0.7 ms ✗ ✗ lz4 33.80× −46.2% 0.4 ms ✗ ✗ gzip-9 62.86× baseline 9.1 ms ✗ ✗ zstd-L3 76.19× +21.2% 0.7 ms ✗ ✗ QUANTUM-PULSE ◀ 95.51× +51.9% 553.4 ms ✓ ✓ ← fastest secure zstd-L22+MsgPack 96.60× +53.7% 1173.4 ms ✗ ✗ zstd-L22 99.58× +58.4% 1644.3 ms ✗ ✗ brotli-11 112.95× +79.7% 1354.2 ms ✗ ✗ ──────────────────────────────────────────────────────────────────── Enc = AES-256-GCM encryption Int = SHA3-256 Merkle integrity | Claim | Evidence | |---|---| | Fastest high-compression pipeline with security | 553 ms vs 1173 ms (zstd+mp), 1354 ms (brotli), 1644 ms (zstd-L22) | | Only option with both encryption + integrity | Every other row shows ✗/✗ | | 3× faster than zstd-L22 vanilla | Dictionary eliminates per-shard pattern re-discovery | | Brotli-11 wins raw ratio | 112× vs 95× — but 2.4× slower, no security at all | Reproduce: python scripts/benchmark_compare.py Full details: BENCHMARKS.md pip install quantum-pulse # Generate a strong passphrase qp keygen # Seal a file qp seal dataset.json --passphrase "yourpassphrase16+" --offline # → dataset.qp (AES-256-GCM encrypted · SHA3-256 Merkle signed) # Recover it — byte-perfect qp unseal dataset.qp --passphrase "yourpassphrase16+" --offline --output recovered.json # Benchmark qp benchmark --passphrase "yourpassphrase16+" git clone https://github.com/Naveenub/quantum-pulse.git cd quantum-pulse cp .env.example .env # set QUANTUM_PASSPHRASE and QUANTUM_API_KEYS docker-compose up -d # Seal via API curl -X POST http://localhost:8747/pulse/seal \ -H "X-API-Key: my-api-key" \ -H "Content-Type: application/json" \ -d '{"payload": {"text": "hello world", "tokens": [1,2,3]}}' # Seal via CLI qp seal dataset.json --tag version=v1 qp unseal pip install quantum-pulse aioboto3 # .env QUANTUM_STORAGE_BACKEND=s3 QUANTUM_S3_BUCKET=my-training-data-bucket QUANTUM_S3_REGION=us-east-1 # AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY or instance profile docker-compose up -d MinIO / LocalStack (local dev): QUANTUM_STORAGE_BACKEND=s3 QUANTUM_S3_BUCKET=my-bucket QUANTUM_S3_ENDPOINT_URL=http://localhost:9000 pip install quantum-pulse gcloud-aio-storage aiohttp # .env QUANTUM_STORAGE_BACKEND=gcs QUANTUM_GCS_BUCKET=my-gcs-bucket QUANTUM_GCS_SERVICE_FILE=/path/to/service-account.json # or use Application Default Credentials qp keygen # generate strong passphrase qp seal dataset.json --tag v1 # seal (needs MongoDB) qp seal dataset.json --passphrase "p1

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →