Digital Tap AI: 유휴 클라우드 클러스터를 감지하고 중지하는 OSS 에이전트
hackernews
|
|
📰 뉴스
#ai 모델
#llama
#mistral
원문 출처: hackernews · Genesis Park에서 요약 및 분석
요약
Digital Tap AI는 클라우드 컴러스터의 유휴 자원을 자동으로 탐지해 비용을 절감하는 오소스 AI 에이전트입니다. API 키 없이 로컬에서 실행되며 Ollama 기반 LLM과 연동되고, 유휴 감지(20~35% 절감), 클러스터 관리(20~40%), 비용 이상 탐지(5~15%), 최적 크기 조정(10~25%), 스케줄러(15~30%) 등 5개 에이전트를 제공합니다. 전체 플랫폼(digitaltap.AI)은 Databricks, EMR, Dataproc 등 멀티클라우드 지원과 팀 대시보드, Slack/Teams 알림, SSO/RBAC 등 엔터프라이즈 기능을 추가 제공합니다.
본문
Save 40%+ on cloud compute with local AI agents. No API keys. No cloud dependencies. Runs entirely on your machine. Digital Tap AI agents continuously analyze your cloud compute infrastructure and find waste — idle clusters burning money, oversized instances, cost anomalies, and scheduling opportunities. This open-source edition includes 5 agents that work with any local LLM via Ollama. No cloud API keys required. No data leaves your machine. | Agent | What it does | Typical savings | |---|---|---| | 🔍 Idle Detection | Finds clusters running with no workload | 20-35% | | ⚡ Cluster Manager | Automatically hibernates/stops idle clusters | 20-40% | | 📊 Cost Anomaly | Detects unexpected spend spikes | 5-15% | | 📐 Right-Sizing | Recommends optimal instance types | 10-25% | | 🕐 Scheduler | Suggests start/stop schedules based on usage patterns | 15-30% | ⚡ The Cluster Manager doesn't just detect — it acts. Dry-run by default, one flag to enforce. Configurable policies, grace periods, exclusion lists. Want the full platform? digitaltap.ai adds multi-cloud support, team dashboards, Slack/Teams alerts, and more. # macOS brew install ollama # Linux curl -fsSL https://ollama.ai/install.sh | sh # Pull a model (llama3 recommended, mistral also works great) ollama pull llama3 pip install digitaltap-ai # Or from source: git clone https://github.com/CruiseAI/digitaltap-oss.git cd digitaltap-ai-oss pip install -e . # Scan: analyze all clusters (read-only) digitaltap scan --demo # Manage: detect AND act on idle clusters (dry-run by default) digitaltap manage --demo # Manage: actually hibernate idle clusters digitaltap manage --demo --enforce # Manage with custom policy digitaltap manage --demo --enforce \ --idle-threshold 30 \ --grace-period 10 \ --protect stream-processing \ --protect ml-inference-api 🔮 Digital Tap AI — Open Source Edition ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 📡 Collecting cluster data (mock mode)... Found 12 clusters ⚡ Mode: DRY-RUN ⚡ Detects idle clusters and automatically hibernates/stops them Policy: idle ≥ 15m, CPU list[ClusterInfo]: # Fetch your cluster data return [ClusterInfo(id="c1", name="my-cluster", ...)] ┌─────────────────────────────────────────────┐ │ CLI / API │ ├─────────────────────────────────────────────┤ │ Agent Orchestrator │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Idle │ │ Cost │ │ Right- │ │ │ │Detection │ │ Anomaly │ │ Sizing │ ...│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ └─────────────┼───────────┘ │ │ ┌──────┴──────┐ │ │ │ Local LLM │ │ │ │ (Ollama) │ │ │ └─────────────┘ │ ├─────────────────────────────────────────────┤ │ Data Collectors │ │ ┌──────┐ ┌───────────┐ ┌─────┐ ┌──────┐ │ │ │ Mock │ │Databricks │ │ AWS │ │Custom│ │ │ └──────┘ └───────────┘ └─────┘ └──────┘ │ └─────────────────────────────────────────────┘ Two modes: digitaltap scan — read-only analysis across all agentsdigitaltap manage — detect and act on idle clusters (dry-run default,--enforce to act) Each agent: - Collects cluster metrics via pluggable collectors - Analyzes data using rules + LLM reasoning - Recommends specific actions with estimated savings - Acts (Cluster Manager only) — hibernates/stops clusters with full audit log The Cluster Manager uses a policy engine with configurable thresholds, grace periods, and exclusion lists. Every action is logged with timestamp and reason — nothing happens silently. The LLM is used for nuanced analysis — understanding workload patterns, generating natural-language explanations, and making judgment calls that pure threshold-based rules miss. All agents work without an LLM (rule-based fallback), the LLM just makes recommendations smarter. # digitaltap.yaml llm: provider: ollama model: llama3 # or mistral, codellama, etc. base_url: http://localhost:11434 agents: idle_detection: enabled: true idle_threshold_minutes: 15 cluster_manager: enabled: true idle_threshold_minutes: 15 # idle time before action cpu_threshold: 0.05 # CPU % below which cluster is "idle" grace_period_minutes: 5 # extra buffer after threshold default_action: hibernate # hibernate | stop enforce: false # true = take action, false = dry-run protected_clusters: # never touch these - stream-processing - ml-inference-api protected_tags: protected: "true" # skip clusters with this tag protected_workspaces: [] # skip entire workspaces cost_anomaly: enabled: true spike_threshold: 1.5 # 50% above baseline right_sizing: enabled: true utilization_threshold: 0.3 scheduler: enabled: true min_schedule_savings_pct: 20 collector: type: mock # mock | databricks | aws import asyncio from digitaltap.agents import IdleDetectionAgent, CostAnomalyAgent from digitaltap.collectors.mock import MockCollector from digitaltap.llm.ollama import OllamaLLM async def main(): llm = OllamaLLM(model="llama3") collector = MockCollector() clusters = await collector.collect() agent = IdleDetectionAgent(llm=llm) findings = await agent.analyze(clusters) for f in findings: print(f"{f.severity}: {f.cluster_name} — {f.recommendation}") print(f" Estimated savings: ${f.estimated_savings_per_hour:.2f}/hr") asyncio.run(main()) We welcome contributions! See CONTRIBUTING.md (coming soon). Areas we'd love help with: - Additional collectors (GCP Dataproc, Azure HDInsight, Kubernetes) - More agent strategies (spot instance optimization, reserved capacity planning) - Dashboard UI - Prometheus/Grafana integration - Better LLM prompts for analysis Apache License 2.0 — see LICENSE. This is the open-source core. Digital Tap AI (the full platform) adds: - ⚡ Auto-remediation — automatically hibernate, resize, and schedule - 🌐 Multi-cloud — Databricks, EMR, Dataproc, Synapse, and more - 👥 Team dashboards — per-team cost attribution and savings tracking - 🔔 Alerts — Slack, Teams, PagerDuty, email - 📈 Historical analytics — trend analysis and forecasting - 🔒 Enterprise features — SSO, RBAC, audit logs, SOC2
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유