Swival은 제가 원하던 AI 에이전트입니다.
hackernews
|
|
🔬 연구
#anthropic
#claude
#review
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
기존 AI 에이전트의 데이터 유출 및 비효율성 문제를 해결하기 위해 개발된 'Swival'은 비밀 정보 암호화와 로컬 모델 최적화를 통해 프라이버시와 보안을 강화했습니다. 또한 에이전트 간 통신 프로토콜을 지원하고 코드 품질 검증 기능을内置하여, 개방형이고 신뢰할 수 있는 개발 환경을 제공하는 것이 특징입니다.
본문
Swival is a CLI AI agent, and Swival 1.0.0 has just been tagged. People are going to ask the obvious question. Why build a new agent when Codex, Claude Code and Opencode already exist, are well established, and already look good enough for most people? Because I wanted an agent that fixes the things existing agents still get wrong in actual daily use. Privacy, local models, and not leaking secrets to a provider Current agents are built around genuinely incredible models. But I still donât trust companies such as Anthropics with my data. For open source work, fine. For closed-source work, or anything sensitive or personal, I think the default posture of most current tools just isnât acceptable. Using an AI agent inevitably leaks internal information. Sometimes a lot of it. That includes access tokens, internal project names, URLs, company names, and all the little bits of context that look harmless until they arenât. Mitigating that risk is something I have cared about for a long time. I even gave a Zigtoberfest talk about that exact topic. So I wanted an agent that does two things properly. First, it needs mitigations for leaking secrets to providers. That means transparently encrypting secrets before sending them to providers, then decrypting them locally so that models can still reason about them without actually seeing them. And it means being able to block and redact specific strings such as internal project names, company names and URLs. The fact that current agents, including the ones heavily used by corporations, still donât ship these features is, to me, irresponsible and unacceptable. Swival has transparent secret encryption and outbound LLM filters specifically to reduce how much information leaves your machine. Second, it needs to work well with open source models. Not as a checkbox. Actually well. Local models have predictable behavior and predictable cost. They donât suddenly get worse because a provider changed something. They donât suddenly get more expensive because pricing moved. You control them, not a third party. And of course, they also donât leak sensitive data to anyone. Open source models are getting good fast. Gemma 4, Qwen 3.5, and GLM-5.1 make that pretty obvious. Plenty of exciting models are uploaded to Hugging Face every day. At the same time, efficient local inference is turning into a basic requirement for modern devices, and itâs only going to improve. Apple M5 chips are a good illustration of where this is going. No, local models arenât a replacement for everything. But theyâre already good enough for a lot of work, they have a bright future, and they can be fine-tuned. Even modern small models have surprisingly strong agentic capabilities. The frustrating part is that most agents are still optimized and tested mainly for frontier models. Even the ones that advertise support for many providers and local models usually behave badly with local models. Tools are used poorly. Context is managed poorly. Everything gets slower. Output quality gets unreliable. Then people blame the model instead of the tool. I wanted an agent that performs well with any model. From large frontier models all the way down to small local models with a small context window that anyone can run on their own machine. And if it fails, I want the first instinct to be improving the agent so that it helps the model deliver as much as it can, not immediately declaring the model useless. Thatâs one of the main reasons Swival isnât just for frontier models. A lot of that comes from excellent context management. Swival has a /compact command. But in practice, itâs rarely needed, if at all. The agent keeps trying to deliver regardless of the constraints, and the context window isnât something you should have to babysit during a long session. And when I want to test new models, thereâs nothing more convenient than HuggingFace CPU-less inference. So I wanted that to be trivial as well. With Swival, itâs as easy as: swival --provider huggingface --model zai-org/GLM-5.1 Agent-to-Agent is too useful to remain niche The A2A protocol is great. People who actually use it know how powerful it is, and they usually donât want to go back to a single isolated agent. Unfortunately, for most people, A2A is still one of these things they have vaguely heard about but never really use, mostly because mainstream tools such as Claude Code still donât support it. Thatâs a shame, because A2A changes what an agent can be. With A2A, you can run multiple agents with different configurations and different models, and let tasks naturally reach for the right one. So instead of stuffing documentation and skills into one local agent, you can have a dedicated documentation agent with direct access to the material, while other agents donât need to carry all of that context. Then, when an agent needs to know how to do something, it asks the documentation agent to research it and return a concise, accurate answer instead of dumping blind grep results. And of course, that specialized documentation agent can run a small, cheap, local model. Thatâs the larger idea. Donât restrict yourself to one model, or even a tiny set of related models. Use as many models as you want, including open source ones, depending on what needs to be done. I wanted A2A to be simple enough that people would actually use it. Swival comes with built-in support for A2A and can act both as a server and as a client. You can set up a network of specialized agents in minutes. Open source, readable, powerful, and not a circus Existing agents have become ridiculously bloated over time. Claude Code is enormous. It flickers. It crashes. New versions keep adding gadgets I donât care about while making the whole thing even heavier. I donât want a kitchen sink. I want a small, reliable tool. Swival is lean, fully open source, doesnât depend on any company, and isnât optimized around a specific provider. Itâs totally free and open, and has nothing to sell. Itâs written in Python because Python is simple, readable and maintainable. Nothing is obfuscated. Anyone can read the code, understand it, and modify it for their own needs. And this isnât a toy. Itâs a workhorse. A boring one, which is exactly what I want here. It focuses on the tools a developer actually needs, not on gimmicks. But it still has the features you would expect from a modern agent, and then some. Benchmarking needs a real environment I also wanted to run benchmarks. I wanted a tool to evaluate models, settings, skills, MCP servers, and similar pieces on real-world tasks, in an environment that actually resembles how a user works. A lot of benchmarking tools arenât designed that way. They either assume tools optimized for specific models, or they provide an environment that doesnât feel much like the real thing. And if you want to learn anything useful from evaluations, you need traces. Detailed ones. Accurate ones. You need to be able to look at what happened and understand how a model behaved under different conditions. Swival comes with strong reporting features. Combined with calibra, you can compare traces, diff them, and run evaluations that are actually meaningful. Evaluating many configurations can burn through a lot of tokens. Thatâs yet another reason I cared so much about making the agent work well with open source models running locally. For evaluations, cost is often more important than wall-clock speed. Small models are great The real problem is that models produce terrible code and then confidently tell you everything is fine. Watching a model generate code is impressive. Itâs hard not to be impressed when you type a prompt and a feature, or sometimes a whole project, appears in one shot. And the final report is always soothing. Everything is done. Everything works. Of course. The reality is that AI-generated code is almost always poor quality. It may compile. It may appear to work. But from a correctness perspective, itâs often terrible. You may be very happy with the code generated by Claude Code with Opus 4.6 max pro high thinking max, and maybe even want to deploy it to production, merge it into open source projects, or write triumphant blog posts about it. But thereâs a good chance that the code is inefficient, buggy, hard to maintain, and going to cost you later. Thereâs a trivial experiment anyone can try. Ask your favorite agent to generate code, or even just a plan. Then, in a separate environment, ask another AI agent, even one running the same model, to review that code or plan. Itâs very likely to find issues immediately. Sometimes critical ones. As much as I like AI, in my own projects I refuse pull requests blindly generated by tools such as Claude Code for exactly that reason. And in a company context, I wouldnât deploy that output to production either. There are two ways to significantly improve quality and confidence. First, write the tests first, then force the agent not to declare the task complete until the tests pass. The tests donât even have to be part of the applicationâs formal test suite. A simple shell script with curl commands can be enough. What matters is that this becomes a contract the agent has to satisfy. Thatâs much stricter than a prompt, because a contract canât be hand-waved away or interpreted creatively. Second, use a loop with an LLM-as-a-judge. Let one agent write code, documentation or a plan. Then let another agent review that work against the original instructions, and force the implementer to retry until the reviewer thinks itâs correct. Swival makes both of these approaches trivial because theyâre built in. Before starting a task, you can give the agent a script that will act as a reviewer. That reviewer can be another swival instance with a custom configuration. Or, even more simply, you can start with --self-review , and tasks will be reviewed by the same instance and same model in a dedicated context. Thereâs nothing else to wire together. One of the most interesting things to watch is how bad the initial output of an LLM agent can be, especially for code, and how honest and picky a model can suddenly become when itâs reviewing its own output without realizing it. After a couple of iterations, the code, plan or documentation is often far better than the first attempt. This is also one of the main reasons I wrote a new agent at all. I donât want to use AI to generate a mountain of code just so I can brag about productivity if the result is unreliable, insecure and unmaintainable. I wanted an agent that optimizes for quality rather than raw time savings. It can be slow. It can be expensive. But I want the output to be something I can trust and deploy to production. Long sessions shouldnât make the agent dumber Another thing I wanted was continuity. I wanted an agent that remembers what I did before, and what it did before. When I come back to the same project the next day, I want the agent to remember prior work without filling the live context with junk. Swival does that in a way that feels much more natural than in other agents. I also wanted it to stop making the same mistakes twice. So Swival has a /learn command: at the end of a session, the agent can reflect on the issues it ran into and write concise instructions about how to avoid repeating them. And once those learnings exist, it will keep updating them automatically. That has turned out to be much more effective than premade agent skills. Or, more accurately, itâs an extremely effective way to produce the right skills, because the agent discovers what it actually needs from real sessions instead of from speculation. Modern features, but without the usual mess Skills, MCP, parallel subagents and similar capabilities are table stakes for serious agent use now. Of course Swival supports all of that. But I also wanted it to avoid the usual security and reliability mistakes. So tool and MCP output are explicitly tagged as untrusted in order to reduce prompt-injection risk. And markdown comments are ignored, so what you see in a rendered skill isnât different from what the agent actually interprets. Thereâs another common failure mode I have always found silly. If an MCP command or tool returns a large output, many agents either stuff the whole thing into the context window or fail in some awkward way. I wanted an agent that handles that properly. Swival writes large outputs to a temporary file and lets the agent access them in chunks later instead. I also wanted a clean way to share files such as agent memories and AGENTS.md across multiple devices working on the same project, without committing them into a Git repository. Swival has lifecycle hooks specifically for that sort of workflow. Arbitrary commands are easy too. In ~/.config/swival/commands/ , you can place either scripts or plain files. Then ! command_name will inject either the content of the file or the output of the script into the prompt. Yes, other agents have versions of this. But I wanted it to be trivial from a user perspective. Not five overlapping systems with five different names for basically the same thing. Just one simple mechanism. The same goes for shell command inspection and rewriting. I didnât want people to need to learn some complicated generic hook system. In Swival, enabling command middleware is safe and straightforward. And more importantly, I wanted the agent to be usable programmatically, not just from the CLI. I also didnât want that to require a separate SDK with its own worldview and its own behavior. Everything the CLI agent does should be accessible in a consistent way from Python code. This is why Swival can be used as a CLI, but also as a library. It exposes a very simple API so anyone can build custom agents, or more general applications, on top of a batteries-included agentic environment. Small things matter Some of the things I cared about arenât glamorous. Theyâre just the kind of rough edges that make a daily tool annoying. For example, markdown rendering for LLM output looks nice, but I dislike the fact that copy-pasting rendered output often strips the markdown markers. I also donât love the idea of an agent accidentally deleting files. These are small things. But they matter if you actually use the tool every day. Swival renders LLM markdown output while preserving the formatting tags. So the output looks good, but can still be copied and pasted without losing the markdown. And even in full YOLO mode, it has safety guards against dangerous commands, plus built-in support for the AgentFS copy-on-write filesystem overlay. Also, when a file is deleted using Swivalâs own tools, it isnât actually deleted. Itâs moved to a Trash directory instead. I have never personally seen an agent delete the wrong file. But if it ever does happen, I want recovery to be possible. Why I use it At this point, I use Swival almost exclusively. Itâs reliable, and Iâm happy with the output I get from it. I use open source models as m
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유