스파게티에서 메인 버스까지: Elm을 사용하여 AI 에이전트 오케스트레이터 리팩토링
hackernews
|
|
📰 뉴스
#claude
#오픈소스
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
저자는 1부에서 구축했던 AI 에이전트 오케스트레이션 시스템을 비효율적인 ‘스파게티’ 구조에서 ‘메인 버스’ 방식으로 재구축하는 과정을 설명합니다. 팩토리오 게임의 확장 원리에 빗대어 기존 시스템을 분해하고 첫 번째 원칙에 입각해 재설계하며, 이 과정에서 수백만 개의 클로드 토큰을 낭비했던 실수들을 분석합니다.
본문
The Factory Must Grow (Part II): From Spaghetti AI Agent Orchestrator to a Main Bus tl;dr: In Part I, I built the factory: an orchestration system that runs AI agents like workers on a production line. Part II tears the original system down and rebuilds it from first principles. This post is about The Big Rejig and mistakes that burned millions of Claude tokens. Part I post here. The First Factory Worked For those who’ve played the game, your first Factorio factory works. Iron moves on the conveyor belts, copper is delivered to assemblers and circuit boards come out on the other side. You added one thing at a time, solved one problem at a time and the factory hums. Then you decide to scale it. You try to add a new conveyor belt and realise the belt you need runs straight through the middle of three other assemblers. You try to decouple and reroute the belt. Then that reroute cuts off a different production line. You try to fix that. The fix introduces a bottleneck for another belt. You fix that. Hours later, your factory looks like patch work. The foundation works. But beware, improving it is like open chest surgery. Every Factorio player hits this wall at some point. It’s spaghetti. Kabel Salat. Belts criss-crossing, inserter arms reaching across each other, production lines skirting the walls of multiple factories. The factory grew and the spaghetti too. The same problem awaits when you add AI agents. The orchestration system I created runs agents like workers on a production line. They pick up issues from a “queue”, write code, open PRs and address review comments, while a central orchestrator decides who runs when and what state every issue is in. It started small (dev). Then I added a second workflow (PR reviews). Then a third (PRDs). Each one added an if-statement case here and a fallback there. The factory worked but touching it became open chest surgery (again). The fix is the “main bus”. Take out the spaghetti and replace it with a centralised spine of parallel belts, each carrying resources in one direction. New production lines branch off the bus. They get what they need, return their outputs and never cut through the bus. The bus itself doesn’t know what the branches are doing. After my refactor, state and business logic are clearly decoupled! The Spaghetti Era Spaghetti is what happens when you don’t have state-machine discipline. Each bug that surfaces is another way imperative dispatching with fallbacks rots over time. When I first described the orchestrator in Part I, it was polling a board, picking up issues, spawning workers. The pipeline was understandable. But there were more than a dozen ways it could fail that was not sustainable. State transitions lived inside imperative dispatching code. The orchestrator would pick up an issue, check conditions, mutate the issue’s state, start a worker, check more conditions, mutate the state again. All of it happened in imperative code. There was no single place where “what should happen next” was described. State transitions happened gradually, across several functions that called each other in ways that were hard to trace. The silent fallbacks were the worst offenders. A switch statement that handled five states but left out two. A default fallback that logged a warning and just moved on. There were also silently dropped issues. The next tick would check the issue, perhaps find partial state and move on to the next candidate. Only a manual audit surfaced those dropped issues. Imagine an employee threw your work briefs in the bin. That’s a pretty bad employee. Stuck states at least were visible. The worker may have finished the work but the issue hadn’t moved forward. The orchestrator had seen the worker finish. But a condition somewhere hadn’t been met, so the state transition never completed. The issue was stuck in purgatory. Then, there were workers that silently dropped state during retries. A continuation run (an agent resuming after its turn limit, put in place to avoid runaway infinite work) would pick up mid-task, make progress and then write back state that was missing fields an earlier run had populated. Oops. As far as it was concerned, it did its job and there was no error and no warning. Finally, there was “PR ping-pong”, where an issue would cycle between “needs review” and “needs fix” infinitely because the “fix” and the “review” workflows had slightly different conditions about what a resolved review comment looked like. Each thought the other was wrong. So they just kept handing the issue back and forth and burning precious Claude tokens. Most of these bugs were caused by a default fallback in some switch statement that was not anticipated. The factory grew faster than the imperative switch statements. The Main Bus In a spaghetti architecture, the “what should happen” and “how it’s done” are tangled together. In a main-bus architecture they’re decoupled and separated. What inspired me to do a big refactor and bring order to my AI agent orchestration was the ELM architecture (popularised by React.js’s Redux library). A pure function (a “reducer”, which I also call “the bus” throughout this post) decides what should happen. An interpreter does the work. This is the new, simple architecture. I spent a couple of weeks thinking about and rewriting the architecture. The spaghetti factory’s problems were: outputs feeding back into inputs, state changes happening in many different places, no single place where “what is the factory doing” is clear. The main bus architecture (inspired by ELM architecture) solves this by separating two things that spaghetti factories mix: trigger and the action (deciding what to do vs doing it). Deciding what to do is done by a pure function called a “reducer”. In the codebase it is literally a function named “decide()”. This is the bus. It takes the current state of an issue and an incoming event, then it returns the next state and a list of things to perform (”side effects”). It does not perform the side effects themselves, as that’s not the bus’s job. It does not check the clock or call any external system. It produces a description of what should happen next, as data (JSON values that the codebase calls “effects”). The interpreter performs the side effects. It takes the list of side effects, goes through them and does the actual work. It writes to GitHub, spawns a worker AI agent and sends a notification. The interpreter does not decide anything. It just performs what the bus told it to do. The original orchestrator I built mixed both of these but now they are separated. The bus decides what should be performed and the interpreter acts. Responsibilities are clearly separated. The new flow is: state + event → reducer → [next state, side effects] → interpreter → actions The immediate benefit was explainability. When something went wrong, there were only a couple of places to look. If a decision is wrong, that’s a bus bug. If the execution is wrong, that’s an interpreter bug. In the spaghetti factory I had no idea where to look. The main bus has exactly two places to look. Implicitly, this saves lots of tokens too. Deterministic Especially with AI, I need the same inputs to always produce the same outputs. The reducer has no clock and no I/O. It’s a pure function that doesn’t perform actions and complex logic. It cannot call outside APIs. Given the same state and the same event, it will always produce the same next state and the same side effects list. In the spaghetti era, to debug “Issue #45 got stuck”, you’d have to look at the logs and try to reconstruct the sequence of events, wonder whether the retry happened before or after the state was written. Even with good logs, this is hard — very hard. State may or may not have changed when the bug occurred. You had to reconstruct it, if you’re lucky to have the logs. With a deterministic bus, debugging looks different. The event log is append-only, one record per decision, immutable, nothing ever overwritten. To understand why Issue #45 bugged out, you just replay the event log against the initial state. If something went wrong, you can see exactly which event triggered it, exactly what state the bus saw and what it decided. This is called “event sourcing”. The log is the source of truth and you can reconstruct the state of the world with it. The log is also a test harness in the sense that you can verify that the bus behaves correctly with it. Use the events to assert the next state. A test for a new state transition is simple: initial state, input event, expected next state, then call the function. Deterministic flows give you consistency. You also gain “totality”, with defined output for every possible input. The spaghetti era was full of partial functions, such as switches that handled the common cases and defaults that weren’t handled. Every partial function is a potential bug if there is an input that wasn’t anticipated. When the reducer handles an event, it must handle every event type. The TypeScript compiler enforces this, as the build fails if you add a new event type and don’t add a handler for it. This cuts down on runtime errors massively. The “missing branch” bug that caused dropped issues is now fixed, because the compiler won’t compile without the case being handled. In addition to event types, we have totality of state types too. Not every combination of fields is a valid state. A GitHub issue cannot be both “in progress” and “waiting for worker” at the same time. In the spaghetti era, invalid states were possible because state was mutated incrementally and mutations could produce a partial result. In the main bus architecture, the state types make that not possible and that reduces bugs. Declarative People who know me know I like declarative programming. Each workflow in my orchestration system has its own spec file (MD file with YAML). The spec declares the states the workflow can be in, the events that trigger transitions and the conditions that gate each transition. This file (e.g. for the dev workflow) is the contract between the orchestrator and the worker. The state machine’s type is generated from the spec. The code does not define which states are valid, as that’s the spec’s job. This decoupling again is intentional. It means you can’t write a transition to a bogus state the spec doesn’t declare because the state’s type doesn’t exist in the built code. The spec file is the first thing you edit when a workflow needs to change. You add a new state to the spec, regenerate the types and then fix the compile errors that tell you everywhere the new state needs to be handled. The spec change propagates through the codebase structurally. The spec is the documentation and the implementation is derived from it. The implementation won’t compile if it drifts from the spec. An example: suppose I add a new GitHub Issue state called “needs-human” to a workflow’s spec and I save the file. The next compile breaks in a couple of places: the function that picks an event handler doesn’t recognise “needs-human” and the “verdict” table doesn’t list it. I work through the couple of errors and the new state is wired up end to end. Without spec-driven types, “needs-human” would have been a string in one switch statement that quietly fell through everywhere. At its most basic, a workflow spec looks something like this: tracker: kind: github status_field: Status active_states: [Todo, In Progress] terminal_states: [Done, Abandoned] verdict_map: DONE: Done FAILED: Abandoned agent: max_concurrent_agents: 3 max_turns: 20 --- You are working on issue {{ issue.id }}: {{ issue.title }}. Steps: 1. Read the description above. 2. Plan first: write plan.md before any code. 3. Commit small, push often. 4. Write the result of the run to .verdict (DONE or FAILED). The spec is declarative with its state types, enum state lists, etc. The YAML on top declares the state machine: which board states are live, which are terminal, what each worker verdict should transition to. The Markdown below it are the instructions that the worker follows. The hot-reload behaviour in the orchestration system applies to the workflow spec file as well. Changes to the spec regenerate types. Changes to the types force handler updates and bad updates will not compile. What I Left Out The new main bus architecture makes the orchestration (PRD drafting, dev, PR reviews, marketing) more trustworthy in production. I plan to write more about this AI agent orchestrator. In Part III, I’ll discuss how I brought Toyota production principles to my AI agent production line. The architecture in this post gives you the structure. The next posts will be about what happens when the structure gets stress-tested. The Factory Caught Itself Last week, a worker hit a turn limit and stopped. The reducer finalised the issue to “abandoned”. A few seconds later a retry timer fired and asked the dispatcher to start a fresh worker on that same issue. The dispatch guard checked the state, saw “abandoned” and refused. The refusal was the right decision. The architecture said: “I will not dispatch a worker on an issue I have already given up on”. But the refusal tripped a global lock and halted all workflows. I had to clear the lock by hand to bring the factory back. To fix this, I cancelled pending retries on every terminal transition. I future-proofed the system against new terminal states forgetting to clear stale timers. The architecture caught the bad dispatch. Failing loud beats corrupting state any day. A cleaner architecture doesn’t mean fewer bugs. It means more debuggable. Dropped issues became not possible by design. Silent state drops became compile errors. When something breaks there are two places to look and the event log says which. The factory grew since I started it. The architecture has grown now too. There’ll be more things I’ll need to improve and more Claude tokens I’ll end up burning. But the next bugs will be loud and findable and the logs will help me debug them. That’s the next generation of factory I need. I write about building with AI at blog.mariohayashi.com. Follow along if this is useful to you. If you’re working through similar problems I’d love to hear from you! Feel free to follow me on Twitter: @logicalicy.
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유