AegisFlow – Go에 내장된 정책 엔진을 갖춘 오픈 소스 AI 게이트웨이

hackernews | | 📦 오픈소스
#ai 게이트웨이 #ai 딜 #anthropic #go #llama #openai #오픈소스 #옵저버빌리티 #정책 엔진
원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

**AegisFlow**는 OpenAI, Anthropic 등 다양한 LLM 제공자를 단일 창구에서 관리할 수 있도록 돕는 Go 기반 오픈 소스 AI 게이트웨이입니다. 이 툴는 벤더 락인 해소 및 보안 강화를 위해 라우팅, 요금 제한, PII 탐지 등의 기능을 제공하며, MacBook Air M1 기준 초당 5만 8천 건 이상의 요청을 처리하는 고성능을 자랑합니다. 또한 OpenTelemetry 및 Prometheus와 연동하여 비용과 지연 시간 등 운영 상황을 투명하게 모니터링할 수 있는 통합 제어 기능을 갖추고 있습니다.

본문

Open-Source AI Gateway + Policy + Observability Control Plane Route, secure, observe, and control all your AI traffic from a single gateway. Quickstart | Features | Architecture | Configuration | API Reference | Contributing AegisFlow is a production-grade AI gateway built in Go that sits between your applications and LLM providers. It gives you a single control plane to manage routing, security policies, rate limiting, cost tracking, and observability across OpenAI, Anthropic, Ollama, and any OpenAI-compatible provider. Point any OpenAI SDK at AegisFlow by changing one line: # Before client = OpenAI(api_key="sk-...") # After - all traffic now flows through AegisFlow client = OpenAI(base_url="http://localhost:8080/v1", api_key="aegis-test-default-001") Teams running AI in production face real problems: - Vendor lock-in -- different SDKs, different formats, different billing - No fallback -- when OpenAI goes down, your product goes down - Blind spots -- no visibility into cost, latency, or failure patterns - Security gaps -- prompt injection, PII leakage, no tenant isolation - No governance -- no central policy for who can use what models AegisFlow solves all of these with a single, lightweight Go binary. Benchmarked on MacBook Air M1 (8GB RAM) with the full middleware pipeline enabled (auth, rate limiting, policy engine, routing, usage tracking): | Metric | Value | |---|---| | Throughput | 58,000+ requests/sec | | p50 Latency | 1.1 ms | | p95 Latency | 4.2 ms | | p99 Latency | 7.3 ms | | Memory | ~29 MB RSS after 10K requests | | Binary Size | ~15 MB | $ hey -n 10000 -c 100 -m POST \ -H "Content-Type: application/json" \ -H "X-API-Key: bench-key" \ -d '{"model":"mock","messages":[{"role":"user","content":"test"}]}' \ http://localhost:8080/v1/chat/completions Requests/sec: 58,308 Latency (p50): 1.1ms | (p95): 4.2ms | (p99): 7.3ms 10,000/10,000 succeeded - Single OpenAI-compatible API for all providers - Support for OpenAI, Anthropic, Ollama, and any OpenAI-compatible endpoint - Streaming (SSE) and non-streaming support - Request/response normalization across providers - Route by model name, cost, latency, or custom strategy - Automatic fallback when primary provider fails - Retry with exponential backoff - Circuit breaker to avoid cascading failures - Priority, round-robin, and least-latency strategies - Per-tenant and per-user rate limits - Sliding window algorithm (requests/minute, tokens/minute) - In-memory (default) or Redis-backed for distributed deployments - 429 responses with Retry-After headers - Input policies: Block prompt injection attempts, detect PII before it reaches providers - Output policies: Filter harmful or unwanted content in responses - Keyword blocklist, regex patterns, and PII detection (email, SSN, credit card) - Per-policy actions: block ,warn , orlog - Extensible filter interface for custom policies - Custom policy filters in any WASM-compatible language (Go, Rust, TinyGo, AssemblyScript) - Sandboxed execution via wazero runtime (pure Go, no CGo) - Configurable per-plugin timeout and error handling ( on_error: block/allow ) - Example plugin with ABI documentation included - OpenTelemetry traces with per-request spans (provider, model, latency, tokens, status) - Prometheus metrics endpoint ( /metrics ) - Structured JSON logging (powered by Zap) - Exporters: stdout (development), OTLP/gRPC (production) - Token counting and cost estimation per request - Per-tenant usage aggregation - Admin API for querying usage data - Foundation for budget alerts and billing integration - API key-based tenant identification - Per-tenant rate limits, model access controls, and policies - Tenant isolation at the gateway level - Support for multiple API keys per tenant graph TB Client[Client / OpenAI SDK] -->|POST /v1/chat/completions| Gateway subgraph AegisFlow["AegisFlow Gateway (single Go binary)"] Gateway[HTTP Serverchi router] Auth[Auth MiddlewareAPI key + tenant] RL[Rate Limitersliding window] PolicyIn[Policy Engineinput check] Router[Routermodel matching + strategy] PolicyOut[Policy Engineoutput check] Usage[Usage Trackertokens + cost] Telemetry[TelemetryOTel + Prometheus] Gateway --> Auth --> RL --> PolicyIn --> Router Router --> PolicyOut --> Usage --> Telemetry end Router -->|priority / round-robin / fallback| Providers subgraph Providers["Provider Adapters"] OpenAI[OpenAI] Anthropic[Anthropic] Ollama[Ollama] Mock[Mock Provider] end Telemetry -->|traces| OTel[OTel Collector] Telemetry -->|metrics| Prom[Prometheus] subgraph Storage["Storage (optional)"] Redis[(Redisrate limits)] PG[(PostgreSQLconfig + audit)] end RL -.->|optional| Redis Request --> Auth --> Rate Limit --> Policy(input) --> Route --> Provider --> Policy(output) --> Usage --> Response | | BLOCK (403) BLOCK (403) if violated if violated - Control plane / data plane separation -- config management is separate from request handling - Provider abstraction -- one interface, any provider. Adding a new provider = implementing 6 methods

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

공유

관련 저널 읽기

전체 보기 →