Show HN: A security scanner for AI Agent Skills

hackernews | 2026년 4월 10일 16:42 | 📦 오픈소스

#ai agent #llm #review #scanner #security #show hn

원문 출처: hackernews · Genesis Park에서 요약 및 분석

요약

SkillWard는 정적 분석, LLM 평가, 샌드박스 검증을 결합하여 AI 에이전트 스킬의 잠재적 보안 위협을 식별하는 스캐너입니다. 5,000개의 실제 스킬을 대상으로 한 테스트 결과 약 25%가 안전하지 않은 것으로 나타났으며, 단순 리뷰로는 발견하기 어려운 런타임 위협을 포착해냈습니다. 이 도구는 수상한 스킬을 고립된 Docker 컨테이너에서 실행하여 자격 증명 탈취나 데이터 유출 같은 구체적인 악의적 행위에 대한 증거를 확보합니다.

본문

SkillWard is a security scanner for AI Agent Skills that combines static analysis, LLM evaluation, and sandbox verification to comprehensively identify potential risks in Agent Skills. Highlights · Architecture · UI · Benchmark · Quick Start · Structure · English | 中文 "Five scanners on 238,180 Skills showed highly inconsistent results, only 0.12% were flagged by all five, with individual flag rates ranging from 3.79% to 41.93%." — Holzbauer et al., Malicious Or Not: Adding Repository Context to Agent Skill Classification, 2026 SkillWard enables security review of AI Agent Skills before they are published or deployed, reducing the potential risks of Agent usage. Beyond static analysis and LLM evaluation, it executes suspicious Skills in isolated Docker sandboxes, replacing uncertain warnings with runtime evidence. Across 5,000 real-world Skills, ~25% were flagged as unsafe; among the ~38% suspicious samples that entered the sandbox, ~one-third revealed runtime threats that review-only pipelines could not catch. We ran two existing open-source scanning tools on the same dataset as reference baselines (see Comparison for details). Here are three real-world cases: - Unique Detection: Threats missed by other tools, precisely caught by SkillWard — see ai-skill-scanner - Low False Positives: Compliant content wrongly blocked by other tools, correctly cleared by SkillWard — see roku - Deeper Analysis: For threats all tools detect, SkillWard provides more complete risk tracing and evidence — see amber-hunter - Three-Stage Security Coverage - Static analysis, LLM evaluation, and sandbox execution turn obvious threats and ambiguous warnings into high-confidence decisions - Autonomous Sandbox Execution - An in-container Agent provisions environments, installs dependencies, repairs common failures, and drives Skills end-to-end with up to 99% deployment success - Runtime Security Guard - A purpose-built Guard monitors Agent runtime behavior, capturing clear evidence for exfiltration, suspicious network access, sensitive writes, and hidden credential risks - Ready Out of the Box, Extensible on Demand - Single-skill or batch scans, Quick Scan / Sandbox Scan / Deep Trace modes, tunable via environment variables, LLM provider configuration, and Docker settings - Evidence-Rich Results - Every scan returns real-time logs, three-stage findings, threat evidence, and remediation guidance that security and platform teams can act on immediately SkillWard uses a static + dynamic three-stage analysis approach: Stage A · Static Analysis: Runs in seconds, catches known malicious patterns and suspicious signals Scans Skill code and configuration using YARA rules and regex to identify known malicious patterns (credential theft, code injection, etc.), validates that a Skill's declared permissions and capabilities match its actual code behavior, and detects hidden files, encoding obfuscation, prompt poisoning, and other suspicious characteristics. Stage B · LLM Evaluation: Semantic reasoning to judge intent and assign safety confidence Adds semantic reasoning on top of static signals. Skills that can be confidently classified are resolved here; Skills that remain uncertain advance to Stage C for sandbox verification. Stage C · Sandbox Verification: Actually runs suspicious Skills, leaving hidden risks nowhere to hide An in-container Agent executes the Skill end-to-end, with a custom Guard monitoring throughout. Pre-planted honeypot decoys lure malicious Skills into revealing credential theft, data exfiltration, supply chain attacks, and other hidden behavior. SkillWard UI provides a clean, intuitive web interface, supporting single or batch Skill submission, three scan modes (Quick Scan / Sandbox Scan / Deep Trace), and comprehensive scan results display. | Single Skill Scan | Batch Scan | | Report Overview + Three-Stage Analysis | Threat Details + Detection Evidence + Recommendations | Each report includes: Analysis Results (three-stage verdicts, confidence scores, threat levels), Issue Location (file path, line number, highlighted code snippets), and Remediation Suggestions (actionable security recommendations). We evaluated SkillWard on a real-world AI Agent Skills dataset containing Skills collected from ClawHub and known-malicious samples curated from security communities. Combining YARA rules, regex-based static analysis, and LLM semantic evaluation, all Skills are quickly triaged: safe ~49%, unsafe ~13%, suspicious ~38%, where suspicious Skills are escalated to Stage C for sandbox verification. After executing this batch of suspicious Skills end-to-end inside an isolated Docker sandbox, roughly one-third revealed potential threats that neither static analysis nor LLM evaluation could catch, including: - Credential exfiltration that only surfaces along the execution path - Persistence backdoors via crontab /SSH / startup scripts - Postinstall supply-chain attacks triggered during package installation - Outbound exfiltration chains identifiable only after correlating multi-step operations Stage C verdict breakdown for these suspicious Skills: | Level | Meaning | % of suspicious | |---|---|---| | safe | Confirmed safe after sandbox verification | ~69% | | medium risk | Medium-risk behavior (undeclared external requests, env-var harvesting, etc.) | ~17% | | high risk | High-risk behavior (credential theft, persistence backdoors, remote code execution, etc.) | ~14% | Across all stages: Stage A + B directly blocked ~13% unsafe Skills, and ~38% suspicious Skills entered the sandbox; among those suspicious Skills, ~17% were judged medium risk and ~14% were judged high risk. | Pattern | Occurrences | |---|---| | Credential theft (API keys, passwords, private keys) | 36% | | Undeclared external network requests | 24% | Env var / .env harvesting | 15% | | Remote code download and execution | 9% | | Persistence backdoor (crontab / SSH / startup) | 8% | | Supply chain and privilege escalation | 8% | For detailed case studies and comparison, see How does SkillWard address this challenge? above. Requirements: Python 3.10+ / Docker (sandbox) / Node.js 18+ (UI mode) # Clone the repository git clone https://github.com/Fangcun-AI/SkillWard.git cd SkillWard # Install dependencies pip install -r requirements.txt && pip install -e ./skill-scanner # Pull Docker sandbox image docker pull fangcunai/skillward:amd64 # Intel/AMD docker pull fangcunai/skillward:arm64 # Apple Silicon/ARM # Configure environment variables (.env.example lists all available options — fill in as needed) cp guardian-api/.env.example guardian-api/.env For detailed configuration, see Configuration Guide # Full pipeline (static + LLM + sandbox) python guardian-api/guardian.py /path/to/skills-dir -o ./output --enable-after-tool --parallel 4 -v # Stage A + B only (static + LLM, no Docker required) python guardian-api/guardian.py /path/to/skills-dir --stage pre-scan -o ./output -v # Stage C only (Docker sandbox) python guardian-api/guardian.py /path/to/skills-dir --stage runtime -o ./output --enable-after-tool --parallel 4 # Scan specific Skills only python guardian-api/guardian.py /path/to/skills-dir -s skill-a,skill-b -o ./output # Quick test run (first 10 Skills) python guardian-api/guardian.py /path/to/skills-dir -n 10 -o ./output # Increase sandbox timeout for complex Skills python guardian-api/guardian.py /path/to/skills-dir --timeout 900 --prep-timeout 600 -o ./output For more options and usage details, see CLI Guide Tip Optional: Launch Web UI cd guardian-api && python guardian_api.py # API server cd guardian-ui && npm install && npm run dev # Frontend SkillWard/ ├── docs/ # Documentation (config, CLI, cases, comparison) ├── guardian-api/ # Backend: scanning pipeline & API server │ ├── guardian.py # Core three-stage scanning engine │ └── guardian_api.py # FastAPI server (SSE streaming) ├── guardian-ui/ # Frontend: Next.js web dashboard ├── skill-scanner/ # Static analysis engine (15 analyzers) ├── models/ # Data model definitions ├── services/ # Business logic services ├── utils/ # Utility functions ├── resources/ # Banner, screenshots, demo assets ├── requirements.txt ├── README.md └── README_CN.md | Guide | Description | |---|---| | Configuration | Quick start, LLM model providers, sandbox security monitoring, optional tuning | | CLI Guide | Full command-line reference, common usage, and output files | | Showcase | Real-world detection cases, how SkillWard catches threats in public Skills | | Comparison | Side-by-side analysis with two open-source scanning tools |

원문 보기 (hackernews)

Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.

요약

본문

관련 저널 읽기