에이전트 컨테이너용 LLM 프록시

hackernews | 2026년 3월 24일 04:42 | 📦 오픈소스

#ai 딜 #anthropic #claude #kubernetes #llm #보안 #에이전트 #컨테이너

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

해당 기사의 본문이 비어있어 구체적인 내용을 파악하기 어렵습니다. 다만, 제목에 따르면 이 글은 AI 에이전트 컨테이너의 환경에서 대규모 언어 모델(LLM)의 효율적 운용과 관리를 지원하는 'LLM 프록시'의 구현이나 활용 방안을 다루는 것으로 추정됩니다. 구체적인 기술 세부사항이나 주요 장점에 대해서는 원문의 전체 내용이 필요합니다.

본문

Kubernetes communications controller for agent workspaces. Manages LLM calls and channel connections via the controller + Job pattern. Credentials never leave ephemeral Job pods. Three components: - Controller -- k8s controller, one per workspace namespace. Serves gRPC. Watches TightbeamModel andTightbeamChannel CRDs. Creates and manages LLM Jobs and Channel Jobs. Owns conversation history (PVC-backed NDJSON). - LLM Job -- stateless Job pod. Connects to the controller via gRPC, pulls a turn assignment (long-poll), reads the API key from a kubelet-mounted Secret, calls the LLM provider, streams the response back. Session-scoped keepalive: the Job loops on GetTurn until an idle timeout fires, then exits. - Channel Job -- holds an outbound connection to a messaging platform (Discord, Slack). Bot token mounted by kubelet. Forwards inbound messages to the controller, receives agent responses and sends them to the channel. The controller is the only gRPC server. Everything else connects back to it as a client. AI agents running in containers need to call LLM APIs, but giving them API keys means: - Credential exposure -- a compromised agent leaks your API key - No audit trail -- the agent calls whatever it wants with your credentials - No conversation control -- the agent manages its own context window Tightbeam solves this by isolating credentials inside ephemeral Job pods. The controller never sees API keys. It references k8s Secrets by name in Job specs; kubelet mounts them into the pod. The agent runtime (Transponder) knows nothing about keys, models, or providers. Use Airlock for MCP tool isolation. Use Tightbeam for LLM API isolation. gRPC Transponder ──────────────> Controller ─────> Conversation Log (PVC) │ gRPC │ creates k8s Jobs ┌───────────────┤ │ │ LLM Job Channel Job (api key (bot token mounted) mounted) │ │ v v Anthropic API Discord/Slack The controller watches CRDs to know which models and channels are available. When a turn arrives, it enqueues a TurnAssignment . The LLM Job pulls it via GetTurn (blocking long-poll), calls the LLM, and streams results back via StreamTurnResult . The controller appends the response to conversation history and forwards events to the caller. Declares an available LLM model. The controller creates LLM Jobs from these. apiVersion: tightbeam.dev/v1 kind: TightbeamModel metadata: name: claude-sonnet namespace: workspace-my-ws spec: provider: anthropic model: claude-sonnet-4-20250514 description: "Fast, capable model for code tasks" maxTokens: 8192 secretName: llm-anthropic-key image: ghcr.io/calebfaruki/tightbeam-llm-job:latest idleTimeout: 300 The secretName references a k8s Secret containing provider , model , api-key , and optionally max-tokens as individual keys. Kubelet mounts it into the LLM Job at /run/secrets/llm/ . Declares a channel connection. The controller creates Channel Jobs from these. apiVersion: tightbeam.dev/v1 kind: TightbeamChannel metadata: name: discord-bot namespace: workspace-my-ws spec: type: discord secretName: discord-bot-token image: ghcr.io/calebfaruki/tightbeam-channel-discord:latest targetModel: claude-sonnet Single service: tightbeam.v1.TightbeamController . Proto definition at crates/tightbeam-proto/proto/tightbeam/v1/tightbeam.proto . | RPC | Caller | Description | |---|---|---| GetTurn | LLM Job | Long-poll. Blocks until a turn is ready. Job sets gRPC deadline as idle timeout. | StreamTurnResult | LLM Job | Streams response chunks (content deltas, tool calls) back to the controller. | Turn | Transponder | Sends messages, receives streaming LLM response events. | ListModels | Transponder | Returns available models from CRDs. | ChannelStream | Channel Job | Bidirectional stream. Inbound user messages in, agent responses out. | - Transponder calls Turn with new messages - Controller appends messages to conversation history - Controller builds TurnAssignment from full history and enqueues it - LLM Job's GetTurn resolves with the assignment - LLM Job calls the LLM provider, streams chunks via StreamTurnResult - Controller forwards chunks as TurnEvent s on theTurn response stream - Controller appends assistant message to conversation log - If tool_use : transponder executes tools locally, sends results in a newTurn - If end_turn /max_tokens : turn complete message Message { string role = 1; repeated ContentBlock content = 2; repeated ToolCall tool_calls = 3; optional string tool_call_id = 4; optional bool is_error = 5; optional string agent = 6; } message TurnAssignment { optional string system = 1; repeated ToolDefinition tools = 2; repeated Message messages = 3; ModelConfig model_config = 4; } message TurnResultChunk { oneof chunk { ContentDelta content_delta = 1; ToolUseStart tool_use_start = 2; ToolUseInput tool_use_input = 3; TurnComplete complete = 4; TurnError error = 5; } } ToolDefinition.parameters_json and ToolCall.input_json are JSON strings, not protobuf Struct . ImageBlock.data is raw bytes, not base64. The LLM Job handles provid

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기