Show HN: Seamless – Content-addressed computation caching for Python and bash
hackernews
|
|
📦 오픈소스
#bash
#caching
#devtool
#hn
#python
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
Seamless는 계산 파이프라인의 입력과 출력을 선언하여 중복 계산을 방지하는 캐싱과 코드 수정 없이 원격 배포를 지원하는 도구입니다. Python 함수와 커맨드 라인 코드를 모두 지원하며, 체크섬을 통해 동일한 작업은 즉시 결과를 반환하고 연구원 간에 효율적으로 결과를 공유할 수 있습니다. 현재 새로운 아키텍처를 기반으로 하는 1.x 버전이 출시되었으며, pip를 통해 seamless-suite를 설치하여 사용할 수 있습니다.
본문
Seamless: define your computation once — cache it, scale it, share it. Most computational pipelines are already reproducible — the same inputs produce the same outputs. Wrap your code as a step with declared inputs and outputs, and Seamless gives you caching (never recompute what you've already computed) and remote deployment (run on a cluster without changing your code). Remote execution also acts as a reproducibility test: if your wrapped code runs on a clean worker and produces the same result, it is reproducible. If not, Seamless has helped you find the problem — whether it's a missing input, an undeclared dependency, or a sensitivity to platform or library versions. Seamless wraps both Python and command-line code. In Python, direct runs a function immediately; delayed records the function for deferred or remote execution. From the shell, seamless-run wraps any command as a Seamless transformation — no Python required. In both cases, the transformation is identified by the checksum of its code and inputs: identical work always produces the same identity. Sharing works at two levels. The lightweight path is to exchange checksums: if two researchers have computed the same transformation, they already have the same result — no data transfer needed. The concrete path is to share the seamless.db file, a portable SQLite database that maps transformation checksums to result checksums. Copy it to a colleague, a cluster, or a publication archive, and every cached result travels with it. Combined, these two paths let a lab build up a shared computation cache that grows over time and never recomputes what anyone has already computed. This is Seamless 1.x, running on a new code architecture. Seamless 0.x offered an interactive, notebook-first workflow experience with reactive cells, Jupyter widget integration, filesystem mounting, and collaborative web interfaces. These features are being ported to the new architecture. If your work is primarily interactive/exploratory, you can use the legacy version today, or watch this space for updates. pip install seamless-suite This installs all standard Seamless components. For a minimal install, the core user-facing packages are: | Package | Import | Provides | |---|---|---| seamless-core | import seamless | Checksum , Buffer , cell types, buffer cache | seamless-transformer | from seamless.transformer import direct, delayed, parallel | direct , delayed , parallel , parallel_async , TransformationList , seamless-run , seamless-upload , seamless-download | seamless-config | import seamless.config | seamless.config.init() , seamless.config.set_nparallel() , seamless-init | All source code is on GitHub: seamless-core, seamless-transformer, seamless-config, seamless-remote, seamless-jobserver, seamless-dask from seamless.transformer import direct @direct def add(a, b): return a + b add(2, 3) # runs the function, returns 5 add(2, 3) # cache hit — returns 5 instantly export SEAMLESS_CACHE=~/.seamless/cache # global persistent caching seamless-run 'seq 1 10 | tac && sleep 5' # runs, caches result seamless-run 'seq 1 10 | tac && sleep 5' # cache hit — instant Full documentation — including getting-started guides, cluster setup, remote execution, and reference API — is at: Seamless includes an agent skill (seamless-adoption ) for AI coding assistants. It guides assessment of codebase fit and planning/executing ports — covering both the Python face (direct /delayed ) and the Unix face (seamless-run ). See skills/seamless-adoption/SKILL.md.
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유