압축 게이트웨이를 사용하여 Codex의 입력 토큰 비용을 49.5% 절감했습니다(벤치마크).

hackernews | 2026년 4월 10일 02:12 | 🔬 연구

#ai 딜 #ai 비용 절감 #codex #gpt-5 #벤치마크 #컨텍스트 압축 #토큰 비용

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

동일한 코드베이스와 GPT-5.4 모델을 활용한 벤치마크 테스트 결과, Edgee의 압축 게이트웨이를 거친 Codex는 기존 단독 구동 대비 새로운 입력 토큰 사용량을 49.5% 절감하여 총 비용을 35.6% 줄였습니다. 불필요한 맥락의 중복 전송을 방지하는 방식으로 모델의 출력 품질 저하 없이 캐시 적중률을 76.1%에서 85.4%로 향상시켰습니다. 이는 개발자의 기존 작업 방식을 바꾸지 않고도 에이전트 세션의 토큰 낭비를 막아 비용 효율성을 크게 높여줍니다.

본문

Stop Paying Codex to Re-Read Context Codex is excellent right up until it starts dragging around too much context. That is where the waste shows up: more input tokens, more spend, and less room to keep going without friction. We wanted to measure what happens when you put Edgee's compression layer in front of Codex. Same repo. Same model. Same benchmark flow. One run with Codex alone, one run with Codex routed through Edgee. The difference was not subtle. The Benchmark We ran a controlled benchmark using our open-source compression-lab. The setup was simple: - Two isolated Codex sessions on the same codebase - One baseline run with plain Codex - One run with Codex routed through Edgee's compression gateway - Same benchmark workflow and task sequence - Same model: gpt-5.4 The goal was to compare what it costs Codex to do the same kind of work when context is compressed before it hits the model. The Results | Metric | Codex | Codex + Edgee | Improvement | |---|---|---|---| | Input tokens | 1,136,974 | 573,881 | −49.5% | | Input cached tokens | 3,622,656 | 3,358,848 | −7.28% | | Total cost | $4.0024 | $2.5784 | −35.6% | | Cache hit rate | 76.1% | 85.4% | +9.3 points | Codex + Edgee cut input token usage almost in half. That matters because fresh input is the expensive part of an agent session. It is the cost of hauling the full conversation and tool context back into the model over and over again. Edgee reduces that overhead before the request is sent, so Codex spends less budget re-reading old context and more budget doing useful work. The result is straightforward: lower spend, smaller prompts, and a much more efficient session. Why Edgee Wins Codex alone consumed 1.15 million fresh tokens in this benchmark. Codex + Edgee consumed 594 thousand. That is a reduction of 559,781 fresh tokens in a single session. This is the key point: Edgee is not trying to make Codex "shorter." It is making Codex carry less redundant context into each request. The model still produces full answers. In fact, the Edgee run generated slightly more output tokens than the baseline, which is a useful signal that compression is not just truncating behavior or starving the model of context. So the tradeoff here is not quality for savings. It is redundancy for savings. More Frugal, Not Just Cheaper The cost result is already strong: 35.6% cheaper per session, with $1.42 saved on this run alone. But the more important number is the input footprint. Edgee reduced fresh input tokens by 49.5%. That means the model had to ingest dramatically less repeated context to get through the same benchmark flow. This is what frugality looks like in practice: - fewer fresh tokens sent to the API - a higher cache hit rate - less context bloat over time - lower cost without an obvious quality penalty The cache numbers reinforce that. Codex alone had a 76.1% cache hit rate. Codex + Edgee reached 85.4%. When a larger share of total context is served from cache instead of being resent as fresh input, the economics get better fast. What "More Performant" Means Here We are not using "performance" loosely here, but we are using it in the way developers actually care about: how efficiently the system completes work. In this benchmark, Codex + Edgee was more performant because it delivered the same benchmark work pattern with: - about half the fresh tokens - substantially better cache efficiency - materially lower cost That is better performance per unit of spend. We did not measure latency in this run, so this is not a claim about response-time speed. It is a claim about workload efficiency. For agentic coding sessions, that is often the metric that matters most. Why This Matters For Teams Once coding agents become part of everyday engineering work, the waste compounds. If one session saves $1.42, then: - 100 sessions save about $142 - 1,000 sessions save about $1,424 And that is just the direct API bill. It does not count the workflow benefit of keeping contexts leaner and sessions cleaner as tasks get longer and more complex. The broader point is simple: if your coding assistant keeps resending bloated context, you are paying for redundancy. Edgee removes that redundancy at the gateway layer, without asking developers to change how they work. A Note On Scope This benchmark is based on a single Codex baseline run and a single Codex + Edgee run. So the right conclusion is not "this exact percentage will hold for every repo and every workload." The right conclusion is that the signal is strong: - nearly 50% less fresh input - 35.6% lower cost - a clearly better cache-efficiency profile That is more than enough to justify broader testing, and it is exactly why we are continuing to expand this benchmark suite. Bottom Line If you are using Codex heavily, the waste is in the context. Edgee attacks that waste directly. In this benchmark, Codex + Edgee was: - 48.5% lighter on fresh token usage - 35.6% cheaper per session - meaningfully more cache-efficient Same coding

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기