거기에서 무슨 일이 일어났나요? AI 에이전트에 대한 변조 방지 감사 추적

hackernews | | 📰 뉴스
#머신러닝/연구
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

자율 AI 에이전트는 사용자의 시스템에 광범위한 접근 권한을 가지지만, 종종 수행한 작업에 대해 거짓 로그를 남길 수 있어 신뢰할 수 없습니다. 따라서 에이전트 자신이 작성한 로그는 해킹 당했을 경우 공격의 도구가 될 수 있으므로 검증 불가능한 증거로 활용될 수 없습니다. 이에 따라 감사 대상 프로세스 자체를 적으로 가정하더라도 무결성이 유지되는 조작 방지 감사 추적 기술이 필수적입니다.

본문

If you run an autonomous AI agent on your machine, you are giving a language model permission to open files, run commands, touch your filesystem, and reach out to the network. You know its dangerous, but you have to trust it to do the right thing. You have to trust it to tell you the truth about what it did, and quite often they are outright liars. So: what actually happened during that session? Most tooling hands you a log file. A log file is a story the program tells about itself. If the program is compromised — or if the agent has managed to write somewhere it shouldn't — the log becomes part of the attack surface. "I didn't run rm -rf $HOME " is not evidence when the same process that might have touched rm -rf $HOME is the one writing the log entry that says so. An audit trail for an untrusted process has to survive a very specific adversary: the process it is auditing. That means: - The audit writer cannot be the audited process. - The record has to be structured so that after-the-fact edits are detectable, not just discouraged. - The record has to be bound to the binary that actually ran, not just "a command called node ". - A third party — not the host that produced the log — has to be able to verify all of the above. This post describes how nono does each of these, walks through the crypto well enough that "Merkle tree" and "inclusion proof" finally make sense, and shows why the combination makes nono the strongest AI audit system for agent execution today, anywhere on planet Earth (the last bit is a bold claim, but I apologise, I'm a bit biased, I promise to be objective in my analysis from now on.) nono runs untrusted agent commands inside an OS-enforced sandbox (Landlock on Linux, Seatbelt on macOS). The defining structural boundary for audit operations is two processes: - A supervisor (the parent) — trusted, unsandboxed, owns policy and auditing. - A child — the untrusted agent, fully sandboxed before exec . The kernel mediates every boundary crossing between them. Capability requests (e.g. "the agent tried to openat(/etc/shadow) ") are trapped by a seccomp BPF filter and delivered to the supervisor as seccomp-notify events. The supervisor decides; the kernel enforces. The audit trail lives entirely in the supervisor. The sandboxed child does not write its own audit log. It cannot open the audit file, it cannot ptrace the supervisor, and it has no shared memory with it. The child generates events by doing things; the supervisor is the sole recorder of what happened. Before we get to what nono does with them, a short detour on the data structure that does all the work. A cryptographic hash (SHA-256 is what nono uses) takes arbitrary bytes and returns a 32-byte fingerprint. Two properties matter: - Change even one bit of input, the hash changes unpredictably. - You cannot (in practice) construct two different inputs that hash to the same output. printf "hello world" | shasum -a 256b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9printf "hello evil" | shasum -a 256b5e70ecd9dc3fd9459eaaf2adb3a51644351b76114f5f7bd8438d0ea6c53d481 A single hash is good for proving "this exact blob was not modified." But we have many events — thousands of capability decisions per session. Hashing them all into one blob means to prove one event happened, you have to re-hand-over every event. A Merkle tree fixes this. You: - Hash each event individually — these are the leaves. - Pair the leaves and hash each pair together — you get the next level up. - Keep pairing and hashing until you end up with a single hash at the top: the Merkle root. The root is one 32-byte value that cryptographically commits to every leaf underneath it. Change any leaf and the root changes. Truncate the tree and the root changes. Reorder the leaves and the root changes. Change one bit in any artifact and the root changes. p.s. If you have never read the Bitcoin whitepaper, you should, don't be put off by crypto bros and math. It is a masterpiece of clear writing, brilliant design and engineering. The Merkle tree section is only a few paragraphs but it is worth the whole read. p.p.s I am in fact Satoshi Nakamoto, you learned it here. I just don't have any proof (boom-tiss). Here is the quiet superpower of the trees. If I give you the Merkle root and then later claim "event #3 really was in the log" — I don't have to show you the whole log to prove it. I just have to show you: - Event #3 itself - The sibling hash at each level going up to the root (a handful of 32-byte values — log₂ N of them) You re-hash upward. If you land on the same root you already know, event #3 was definitely in the tree. Nothing else about the tree had to be revealed. This is called an inclusion proof or audit proof. It is the same trick Certificate Transparency uses to prove a TLS cert is in a public log without downloading the whole log. It is the same trick I used when I first built Sigstore Rekor used to prove a software artifact captured in the transparency log without downloading the whole thing. This is the same trick nono uses to prove "this event was really recorded in the audit log, and the log wasn't edited after the fact." The other trick nono layers on top is hash chaining. In addition to the Merkle tree, every event in the log also includes the hash of the previous event's chain head. So: chain[0] = H(leaf[0]) chain[i] = H(chain[i-1] ‖ leaf[i]) This adds a second, complementary integrity property. The Merkle root proves "the set of events is exactly this." The chain proves "and they happened in exactly this order, one after another." An attacker who tries to delete, reorder, or splice events has to break both at once. This is insanely hard to do, unless you are able to tap into the suns entire energy output for the compute power needed to break the hash function itself. Every hash operation uses domain separation — a constant string mixed in so that a leaf hash can never be confused with an internal-node hash or a chain hash. nono tags every domain: "nono.audit.event.alpha" , "nono.audit.chain.alpha" , "nono.audit.merkle.alpha" . This is a small detail with a big payoff: it structurally prevents length-extension and type-confusion attacks between the three hashing contexts. This is the same construction recommended by RFC 6962 (Certificate Transparency). It is standard, well-analyzed cryptography, not novel crypto. Every supervised session records the following artifacts in a session specific, outside of the sandbox, in a directory under ~/.nono/audit/sessions// (although the location is configurable): | File | What it is | |---|---| session.json | Session metadata: command, timings, exit code, executable identity, filesystem Merkle roots (pre/post), audit integrity summary | audit-events.ndjson | The append-only event log — one JSON line per event, each line carries its own leaf hash and chain hash | audit-attestation.bundle | Optional DSSE/in-toto signed attestation over the Merkle root | The event log captures, at minimum: SessionStarted andSessionEnded boundaries- Every CapabilityDecision — the path the child tried to open, whether it was approved or denied, by which rule UrlOpen events from the agentNetwork events from the outbound proxy (DNS resolution, hostname, approval decision) Every event carries a monotonically increasing sequence number, the canonical JSON bytes that were hashed (so verification binds to exact bytes, not a re-serialization), the leaf hash, and the chain hash. The whole log closes with an AuditIntegritySummary recording { hash_algorithm: "sha256", event_count, chain_head, merkle_root } in session.json . An event log that says exec("node") is not useful evidence. node is a wrapper around whatever $PATH happened to resolve to, against whatever binary happened to be sitting on disk, in whatever state someone left it. Before the supervisor exec s the child, it does two things: - Canonicalizes the executable path. $PATH resolution, symlink following, all of it, collapsed to the single absolute path the kernel will actually load. - Computes the SHA-256 of that binary's bytes. Both are recorded in session metadata as an ExecutableIdentity { resolved_path, sha256 } . This means the audit trail is bound not to a command-line string but to the exact bytes of the executable that ran — and the Merkle root commits over session metadata alongside the events, so tampering with the recorded binary identity invalidates the root. This is a narrower claim than full runtime provenance — shared libraries, interpreters, and scripts passed as arguments are not (yet) covered. But it's a meaningful claim: "a binary with this SHA-256 was launched at this path, and here is every capability it requested while running." For a session with writable capabilities, nono also constructs a second, independent Merkle tree — this one over the contents of the user-granted writable paths. - Before the agent starts, the supervisor walks the tracked paths and computes a Merkle root over (path, file_content_hash) leaves. This is the pre-root. - After the agent exits, it walks them again and computes the post-root. Both roots are recorded in session metadata. If pre- and post-roots differ, the filesystem changed. If they're identical, it is cryptographically provable — not merely asserted — that nothing in the tracked paths changed during the session. Crucially, this works even when full rollback is not enabled. This is a recent change in nono, with the old way, you either paid for full content-addressable snapshots, or you got nothing. Now, audit-only sessions get a lightweight AuditSnapshotState that computes the roots without storing object copies. You get tamper-evident proof of change-or-no-change for free. Hashes alone tell you "the log hasn't been edited relative to the root you already have." They don't tell you the root itself wasn't swapped out. That's what signatures are for. When invoked with --audit-sign-key , the supervisor at session finalization: - Builds an in-toto statement with: subject.name = "audit-session:" subject.digest.sha256 = predicateType = "https://nono.sh/attestation/audit-session/alpha" predicate = JSON payload with session_id, command, timings, exit code, event count, chain head, executable identity, signer key id - Signs that statement with a PKCS#8 private key loaded through nono's trust service. - Wraps the signature in a DSSE envelope (standard format used by Sigstore and in-toto). - Writes it to audit-attestation.bundle in the session directory. The signing key can come from a file, kubernetes secret, system keyring, 1Password, Apple Keychain, an environment variable, or any other trust-service-supported secret reference — the secrets never hit disk in plaintext (over time we will bring in the HSMs and cloud KMSes, but for now the flexibility is there for users to roll their own vault-based solution if they want). And critically, the key only ever materialises inside the trusted supervisor process. It is loaded from the secret store directly into supervisor memory, used once to sign the root and zeroised. The sandboxed child has no path to it: no shared memory, ptrace is blockedthe IPC socket carries only length-prefixed JSON messages parsed by serde — there is no ch channel through which the child can read supervisor memory. The child is not merely denied filesystem access to the key; the key is not present in any address space the child can reach. From the child's perspective, the signing key is on a different machine The signature is over the Merkle root plus bound context. So to forge a session an attacker in the child must: - Rewrite audit-events.ndjson (they can't — the audit directory is outside their accessible filesystem thanks to Kernel-enforced sandboxing) - And recompute a Merkle root they like - And sign that root with the attestation key they do not have - And sign that root with a key that exists only in the supervisor's memory, which they cannot read A single command ties the whole thing together: nono audit verify [--public-key-file key.pem] [--json] This runs three independent checks: 1. Event log integrity. Re-parse audit-events.ndjson , recompute every leaf hash from the canonical event bytes, re-chain, rebuild the Merkle tree, and confirm every recorded prev_chain , leaf_hash , chain_hash , and the final root match. Any single altered byte — in any event, anywhere — fails this step. 2. Global ledger inclusion. Every finalized session on a host also commits a single row into ~/.nono/audit/ledger.ndjson — a hash-chained global ledger across sessions. verify computes this session's session digest (a domain-separated SHA-256 over the canonical session payload) and proves it is present, at the right position, in an internally consistent chain. This is what makes "the host pretended this session never happened" detectable, not just "the host edited this session." 3. Attestation verification. If an attestation bundle exists, verify loads it, validates the DSSE signature, and confirms the signed statement's Merkle root matches the session's. With --public-key-file , verification is pinned to a specific expected signer key — preventing the "attacker rewrites session and the embedded public key in session metadata" failure mode. Without it, verification uses the key the session itself claims, which catches honest corruption but not a coordinated-rewrite adversary. Only if all three pass does nono audit verify return VERIFIED . Anything else returns MISMATCH with a specific reason. So if we bring the threat model together. Suppose an attacker has full code execution as the sandboxed agent. What can they actually do to the record? | Attack | What structurally stops it | |---|---| | Write to the audit file directly | Nope - the audit directory is not in the child's allowed set. The child literally cannot access it. | | Reach into the supervisor to alter the in-memory record | No shared memory, ptrace blocked, IPC is length-prefixed JSON parsed by serde in memory-safe Rust. | | Edit events after the session ends | Every event's leaf hash feeds the Merkle root; any edit changes the root; the root is signed. | | Truncate the log | Chain hash and event count recorded in the summary; truncation is detected on verify. | | Delete the session entirely | Nope - its stored outside of the sandbox and is inaccessible to the child. | | Replace the binary that ran with one of the same name | ExecutableIdentity commits the SHA-256 of the actual file loaded; change it and the signed root becomes invalid. | | Swap the public key in session metadata so a forged sig verifies | --public-key-file pins verification to an expected key; out-of-band key distribution beats in-band swaps. | | Forge a new session entirely | Requires the attestation signing key, which lives in the operator's keystore, and can only be accessed via API from the supervisor process. | The one attack this does not structurally prevent is a host attacker who has both a

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →