하네스 엔지니어링: 에이전트 우선 세계에서 Codex 활용

hackernews | | 💼 비즈니스
#ai 에이전트 #개발 팁 #자동 코딩 #ai 모델 #codex #tip #에이전트 #자동 코드 생성 #하네스 엔지니어링
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

1. **Analyze the Request:** * **Role:** News summarization expert. * **Task:** Summarize an article titled "Harness engineering: leveraging Codex in an agent-first world" from HackerNews. * **Constraints:** 2-4 sentences, concise, focus on important info, NO copy-paste, NO restating title (add new info). * **Language:** English. * **Input Text:** The user provided the title and a brief snippet, but the *actual article text* seems to be missing or truncated to just the title repeated and the word "Summary:". Wait, looking at the input again: * *Title:* Harness engineering: leveraging Codex in an agent-first world * *Source:* hackernews * *Article Text:* "Harness engineering: leveraging Codex in an agent-first world\n\nSummary:" * **Problem:** The article text provided is essentially empty. It's just the title repeated. 2. **Determine the Strategy:** * Since I cannot see the actual content of the article (it's not pasted in the prompt), I cannot summarize it based on the *content*. * *Self-Correction/Check:* Did the user make a mistake? Or is this a trick? Or is the text really short? * *Assumption:* The user likely pasted the title and the beginning of a summary or just the header. I need to handle this gracefully. * *Alternative Interpretation:* Perhaps the user wants me to *generate* a summary based on external knowledge if the text is missing? No, the prompt says "Summarize the *following* article". * *Realization:* The provided text is insufficient. I cannot summarize specific details without the body. * *However*, usually in these prompts, if the text is missing

본문

Over the past five months, our team has been running an experiment: building and shipping an internal beta of a software product with 0 lines of manually-written code. The product has internal daily users and external alpha testers. It ships, deploys, breaks, and gets fixed. What’s different is that every line of code—application logic, tests, CI configuration, documentation, observability, and internal tooling—has been written by Codex. We estimate that we built this in about 1/10th the time it would have taken to write the code by hand. Humans steer. Agents execute. We intentionally chose this constraint so we would build what was necessary to increase engineering velocity by orders of magnitude. We had weeks to ship what ended up being a million lines of code. To do that, we needed to understand what changes when a software engineering team’s primary job is no longer to write code, but to design environments, specify intent, and build feedback loops that allow Codex agents to do reliable work. This post is about what we learned by building a brand new product with a team of agents—what broke, what compounded, and how to maximize our one truly scarce resource: human time and attention. The first commit to an empty repository landed in late August 2025. The initial scaffold—repository structure, CI configuration, formatting rules, package manager setup, and application framework—was generated by Codex CLI using GPT‑5, guided by a small set of existing templates. Even the initial AGENTS.md file that directs agents how to work in the repository was itself written by Codex. There was no pre-existing human-written code to anchor the system. From the beginning, the repository was shaped by the agent. Five months later, the repository contains on the order of a million lines of code across application logic, infrastructure, tooling, documentation, and internal developer utilities. Over that period, roughly 1,500 pull requests have been opened and merged with a small team of just three engineers driving Codex. This translates to an average throughput of 3.5 PRs per engineer per day, and surprisingly the throughput has increased as the team has grown to now seven engineers. Importantly, this wasn’t output for output’s sake: the product has been used by hundreds of users internally, including daily internal power users. Throughout the development process, humans never directly contributed any code. This became a core philosophy for the team: no manually-written code. The lack of hands-on human coding introduced a different kind of engineering work, focused on systems, scaffolding, and leverage. Early progress was slower than we expected, not because Codex was incapable, but because the environment was underspecified. The agent lacked the tools, abstractions, and internal structure required to make progress toward high-level goals. The primary job of our engineering team became enabling the agents to do useful work. In practice, this meant working depth-first: breaking down larger goals into smaller building blocks (design, code, review, test, etc), prompting the agent to construct those blocks, and using them to unlock more complex tasks. When something failed, the fix was almost never “try harder.” Because the only way to make progress was to get Codex to do the work, human engineers always stepped into the task and asked: “what capability is missing, and how do we make it both legible and enforceable for the agent?” Humans interact with the system almost entirely through prompts: an engineer describes a task, runs the agent, and allows it to open a pull request. To drive a PR to completion, we instruct Codex to review its own changes locally, request additional specific agent reviews both locally and in the cloud, respond to any human or agent given feedback, and iterate in a loop until all agent reviewers are satisfied (effectively this is a Ralph Wiggum Loop(opens in a new window)). Codex uses our standard development tools directly (gh, local scripts, and repository-embedded skills) to gather context without humans copying and pasting into the CLI. Humans may review pull requests, but aren’t required to. Over time, we’ve pushed almost all review effort towards being handled agent-to-agent. As code throughput increased, our bottleneck became human QA capacity. Because the fixed constraint has been human time and attention, we’ve worked to add more capabilities to the agent by making things like the application UI, logs, and app metrics themselves directly legible to Codex. For example, we made the app bootable per git worktree, so Codex could launch and drive one instance per change. We also wired the Chrome DevTools Protocol into the agent runtime and created skills for working with DOM snapshots, screenshots, and navigation. This enabled Codex to reproduce bugs, validate fixes, and reason about UI behavior directly. We did the same for observability tooling. Logs, metrics, and traces are exposed to Codex via

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →