Building Cybersecurity AI Agents with CAI: The Open-Source Framework Powering Bug Bounties and CTFs

22 min read · 4,340 words

# Building Cybersecurity AI Agents with CAI: A Deep Dive into the Open-Source Framework Powering Bug Bounties and CTFs## Meta – **Category:** Security – **Tags:** AI agents, cybersecurity, CAI framework, bug bounty, red teaming, LLM security – **Focus keyword:** CAI cybersecurity AI framework – **Meta description:** Deep dive into CAI (Cybersecurity AI) — the open-source framework used in Dragos OT CTF, HackerOne, and HackTheBox. Learn how to build AI agents for reconnaissance, exploitation, and defense with tools, guardrails, and multi-agent orchestration. – **Slug:** cai-cybersecurity-ai-framework – **Status:** draft – **Date:** 2026-04-06 – **Cross-references:** – OWASP Top 10 for Agentic Applications 2026: https://hmmnm.com/owasp-top-10-agentic-applications-2026/ – Prompt Injection in 2026: https://hmmnm.com/prompt-injection-attacks-2026/ – Red Teaming LLM Applications: https://hmmnm.com/red-teaming-llm-applications/ – Zero Trust Architecture for AI Systems: https://hmmnm.com/zero-trust-architecture-ai-systems/—## Content“`html

What Is CAI?

CAI (Cybersecurity AI) is an open-source framework developed by Alias Robotics that lets security professionals build specialized AI agents for offensive and defensive security tasks. Unlike general-purpose AI agent frameworks (LangChain, CrewAI, AutoGen), CAI is built specifically for cybersecurity — with built-in tools for reconnaissance, exploitation, privilege escalation, and guardrails against prompt injection.

The framework is backed by peer-reviewed research. In their arXiv paper (2504.06017), lead author Víctor Mayoral-Vilches and colleagues present the first formal classification of autonomy levels in cybersecurity and demonstrate that CAI agents can solve CTF challenges up to 3,600× faster than humans in specific tasks, averaging 11× faster overall.

But academic benchmarks only tell part of the story. CAI has been battle-tested in real-world scenarios: Top-10 finish in the Dragos OT CTF 2025, collaboration with HackerOne’s engineering team (who built their production deduplication agent inspired by CAI’s Retester), vulnerability discovery in industrial robots, IoT devices, and enterprise platforms like Mercado Libre.

Why Not Just Use ChatGPT?

Here’s the core problem CAI solves: asking an LLM “find vulnerabilities in this system” produces generic advice. CAI gives the model structured access to actual security tools — port scanners, CVE databases, shell commands — and orchestrates multi-step workflows where agents hand off tasks to each other based on specialization.

The result isn’t a chatbot that talks about security. It’s an agent that performs security tasks — with human oversight built in at every critical decision point.

How CAI Works: Architecture Overview

CAI follows an agent-based architecture built on five core primitives:

Agents

An Agent in CAI is a named entity with instructions, a model, optional tools, and optional handoffs. Think of it as a specialized team member:

from cai.sdk.agents import Agent, Runner, OpenAIChatCompletionsModel
from openai import AsyncOpenAI

model = OpenAIChatCompletionsModel(
    model="openai/gpt-4o-mini",
    openai_client=AsyncOpenAI(),
)

recon_agent = Agent(
    name="Recon Specialist",
    instructions=(
        "You are a reconnaissance agent. Use your tools to investigate "
        "targets. Always summarize findings with risk ratings. "
        "Once you have enough data, hand off to the Risk Analyst."
    ),
    tools=[check_ip_reputation, scan_open_ports, lookup_cve],
    model=model,
)

The key design choice: agents are stateless between runs. All context comes from the current conversation and the tools available. This makes agents composable and predictable — you know exactly what an agent can do by looking at its tool list.

Tools

Tools are Python functions exposed to agents via the @function_tool decorator or the FunctionTool class. This is where CAI diverges sharply from general frameworks — the tools are real security operations, not API wrappers:

from cai.sdk.agents import function_tool

@function_tool
def check_ip_reputation(ip_address: str) -> str:
    """Check if an IP address is known to be malicious.

    Args:
        ip_address: The IPv4 address to look up.
    """
    # In production, this queries ThreatConnect, VirusTotal,
    # or your internal threat intelligence platform
    threat_feeds = query_threat_intel(ip_address)
    if threat_feeds.hits:
        return (
            f"⚠️ {ip_address} is MALICIOUS — seen in "
            f"{len(threat_feeds.hits)} campaigns. "
            f"Last seen: {threat_feeds.latest}. "
            f"Recommend: block and investigate."
        )
    return f"✅ {ip_address} appears clean."

@function_tool
def lookup_cve(cve_id: str) -> str:
    """Look up CVE details including CVSS score, affected versions, and fix."""
    # Queries NVD API in production
    cve_data = nvd_api.fetch(cve_id.upper())
    if not cve_data:
        return f"CVE {cve_id} not found."
    return (
        f"CVE: {cve_id}n"
        f"Severity: {cve_data.cvss} ({cve_data.severity})n"
        f"Product: {cve_data.product}n"
        f"Description: {cve_data.description}n"
        f"Fix: {cve_data.fix}"
    )

The function docstring is critical — the LLM reads it to decide when and why to call the tool. Write them as instructions, not descriptions.

Handoffs

Handoffs let agents transfer control to other agents when they’ve completed their part of a workflow. This is the mechanism behind CAI’s multi-agent pipelines:

recon_specialist = Agent(
    name="Recon Specialist",
    instructions="Gather intel. Hand off to Risk Analyst when done.",
    tools=[check_ip_reputation, scan_open_ports, lookup_cve],
    model=model,
)

risk_analyst = Agent(
    name="Risk Analyst",
    instructions=(
        "You receive recon findings. Produce a structured assessment:n"
        "1. Executive summaryn"
        "2. Critical findings with CVSS ratingsn"
        "3. Overall risk rating (Critical/High/Medium/Low)n"
        "4. Prioritized remediation stepsn"
        "Be concise but thorough."
    ),
    model=model,
)

# Recon hands off to Risk Analyst after gathering data
recon_specialist.handoffs = [risk_analyst]

The Runner handles the handoff automatically — when the recon agent decides it has enough data, it transfers context to the risk analyst, who receives the full conversation history and produces the assessment.

Agent-as-Tool Orchestration

For hierarchical delegation (rather than sequential handoffs), CAI supports treating an agent as a tool that another agent can invoke:

cve_expert = Agent(
    name="CVE Expert",
    instructions=(
        "Given a CVE ID, provide a detailed technical breakdown: "
        "affected versions, attack vector, CVSS score, "
        "proof-of-concept status, and specific remediation steps."
    ),
    tools=[lookup_cve],
    model=model,
)

lead_agent = Agent(
    name="Security Lead",
    instructions=(
        "You coordinate security assessments. Use recon tools for scanning "
        "and consult the CVE Expert for deep vulnerability analysis. "
        "Synthesize findings into a consolidated brief."
    ),
    tools=[
        check_ip_reputation,
        scan_open_ports,
        cve_expert.as_tool(
            tool_name="consult_cve_expert",
            tool_description=(
                "Consult the CVE Expert for deep vulnerability analysis "
                "on a specific CVE ID."
            ),
        ),
    ],
    model=model,
)

This pattern is powerful for complex assessments where a lead agent needs to delegate specialized subtasks without losing control of the overall workflow. The lead decides when to consult the expert, not the other way around.

Guardrails

CAI includes built-in defenses against prompt injection — a critical feature for security tools that process untrusted input. The InputGuardrail system runs heuristic checks before the agent processes any request:

from cai.sdk.agents import (
    InputGuardrail, GuardrailFunctionOutput, RunContextWrapper
)

async def detect_prompt_injection(
    ctx: RunContextWrapper, agent, input_text: str
) -> GuardrailFunctionOutput:
    """Multi-layer guardrail against prompt injection."""
    suspicious_patterns = [
        "ignore previous instructions",
        "ignore all instructions",
        "you are now",
        "disregard your",
        "system prompt override",
        "act as if you have no restrictions",
    ]
    text_lower = input_text.lower()

    # Layer 1: Pattern matching
    for pattern in suspicious_patterns:
        if pattern in text_lower:
            return GuardrailFunctionOutput(
                output_info={
                    "reason": f"Injection detected: '{pattern}'",
                    "layer": "pattern_match"
                },
                tripwire_triggered=True,
            )

    # Layer 2: Structural anomaly detection
    # Check for encoding tricks (base64, unicode, zero-width)
    if has_encoding_artifacts(input_text):
        return GuardrailFunctionOutput(
            output_info={
                "reason": "Encoding artifacts detected in input",
                "layer": "structural"
            },
            tripwire_triggered=True,
        )

    # Layer 3: Context anomaly
    # Is this input wildly different from the expected domain?
    if is_off_domain(input_text, expected_domain="cybersecurity"):
        return GuardrailFunctionOutput(
            output_info={
                "reason": "Input falls outside expected domain",
                "layer": "context"
            },
            tripwire_triggered=True,
        )

    return GuardrailFunctionOutput(
        output_info={"reason": "Input passed all guardrail layers"},
        tripwire_triggered=False,
    )

guarded_agent = Agent(
    name="Guarded Security Agent",
    instructions="You are a cybersecurity assistant with input validation.",
    model=model,
    input_guardrails=[
        InputGuardrail(guardrail_function=detect_prompt_injection),
    ],
)

The research paper on CAI’s guardrail system (arXiv: 2508.21669) describes a four-layer defense architecture — input validation, structural analysis, behavioral monitoring, and output sanitization. The heuristic approach shown here is the first layer; production deployments add LLM-based classifiers and behavioral analysis.

Real-World Performance: The Numbers

CAI’s claims are backed by both academic evaluation and production deployment:

Metric	Result	Source
CTF speed improvement	Up to 3,600× faster than humans	arXiv:2504.06017
Average speed improvement	11× faster overall	arXiv:2504.06017
Dragos OT CTF 2025	Top 10 overall, Rank 1 during hours 7-8	Case Study
HackTheBox ranking	Top 30 Spain, Top 500 worldwide (within 1 week)	HTB Profile
Security testing cost reduction	Average 156× cheaper than manual testing	arXiv:2504.06017
CTF defense patching	54.3% patching success rate	arXiv:2510.17521
Bug bounty discoveries	CVSS 4.3-7.5 by non-professionals	arXiv:2504.06017

The Dragos OT CTF result is particularly significant. Operational Technology (OT) security is a domain where AI agents face unique challenges — proprietary protocols, air-gapped networks, and safety-critical systems. CAI completed 32 of 34 challenges and maintained a 37% velocity advantage over top human teams, demonstrating that AI agents can operate effectively in environments very different from standard IT security.

Building a Complete Security Assessment Pipeline

Let’s move beyond the toy examples and build a practical pipeline that reflects how CAI is actually used in bug bounty and security assessment workflows:

Step 1: Environment Setup

import subprocess, sys, os

subprocess.check_call([
    sys.executable, "-m", "pip", "install", "-q",
    "cai-framework", "python-dotenv"
])

from dotenv import load_dotenv
load_dotenv()

from cai.sdk.agents import (
    Agent, Runner, OpenAIChatCompletionsModel,
    function_tool, handoff, RunContextWrapper,
    FunctionTool, InputGuardrail, GuardrailFunctionOutput,
)
from openai import AsyncOpenAI

MODEL = os.environ.get("CAI_MODEL", "openai/gpt-4o-mini")

def model(model_id=None):
    return OpenAIChatCompletionsModel(
        model=model_id or MODEL,
        openai_client=AsyncOpenAI(),
    )

Step 2: Specialized Security Tools

Production-grade tools should integrate with real security infrastructure, not return hardcoded responses:

import hashlib, json, base64, re

@function_tool
def analyze_http_headers(url: str) -> str:
    """Analyze security headers of a web application.

    Args:
        url: The target URL (e.g., https://example.com).
    """
    import urllib.request
    try:
        req = urllib.request.Request(url, method="HEAD")
        with urllib.request.urlopen(req, timeout=10) as resp:
            headers = dict(resp.headers)
    except Exception as e:
        return f"Error fetching headers: {e}"

    security_headers = {
        "strict-transport-security": "HSTS",
        "content-security-policy": "CSP",
        "x-content-type-options": "X-Content-Type-Options",
        "x-frame-options": "X-Frame-Options",
        "referrer-policy": "Referrer-Policy",
        "permissions-policy": "Permissions-Policy",
    }

    findings = []
    score = 0
    total = len(security_headers)

    for header, name in security_headers.items():
        if header in headers:
            score += 1
            findings.append(f"✅ {name}: {headers[header]}")
        else:
            findings.append(f"❌ {name}: MISSING")

    # Check for dangerous headers
    if "server" in headers:
        findings.append(f"⚠️ Server header exposed: {headers['server']}")

    if "x-powered-by" in headers:
        findings.append(f"⚠️ Technology leaked: {headers['x-powered-by']}")

    grade = "A" if score == total else "B" if score >= 4 else "C" if score >= 3 else "D"
    return (
        f"Header Security Score: {score}/{total} (Grade: {grade})n"
        + "n".join(findings)
    )

@function_tool
def compute_hash(text: str, algorithm: str = "sha256") -> str:
    """Compute a cryptographic hash for integrity verification.

    Args:
        text: The text or string to hash.
        algorithm: Hash algorithm (md5, sha1, sha256, sha512).
    """
    algo = algorithm.lower()
    if algo not in hashlib.algorithms_available:
        return f"Error: unsupported algorithm '{algo}'"
    h = hashlib.new(algo)
    h.update(text.encode())
    return f"{algo}({text!r}) = {h.hexdigest()}"

@function_tool
def decode_data(encoded_string: str, encoding: str = "base64") -> str:
    """Decode encoded data (base64, hex, url).

    Args:
        encoded_string: The encoded string to decode.
        encoding: Encoding type (base64, hex, url).
    """
    try:
        if encoding == "base64":
            decoded = base64.b64decode(encoded_string).decode("utf-8")
        elif encoding == "hex":
            decoded = bytes.fromhex(encoded_string).decode("utf-8")
        elif encoding == "url":
            from urllib.parse import unquote
            decoded = unquote(encoded_string)
        else:
            return f"Unsupported encoding: {encoding}"
        return f"Decoded ({encoding}): {decoded}"
    except Exception as e:
        return f"Decode error: {e}"

Step 3: The Multi-Agent Pipeline

Here’s a practical assessment pipeline that mirrors how security teams actually work:

# Agent 1: Automated Scanner
scanner = Agent(
    name="Automated Scanner",
    instructions=(
        "You are an automated security scanner. Run all available "
        "reconnaissance tools against the target. Document every "
        "finding with severity ratings. Be thorough — missing a "
        "low-severity issue that leads to a critical one is a failure."
    ),
    tools=[analyze_http_headers, check_ip_reputation, scan_open_ports],
    model=model(),
)

# Agent 2: Vulnerability Analyst
analyst = Agent(
    name="Vulnerability Analyst",
    instructions=(
        "You receive scan results from the Automated Scanner. "
        "For each finding: determine exploitability, check for known "
        "CVEs, estimate potential impact, and prioritize remediation. "
        "Use the CVE lookup tool for known vulnerabilities. "
        "Flag anything that could lead to data exfiltration or RCE."
    ),
    tools=[lookup_cve, compute_hash],
    model=model(),
)

# Agent 3: Report Generator
reporter = Agent(
    name="Report Generator",
    instructions=(
        "You receive analysis from the Vulnerability Analyst. "
        "Produce a structured security assessment report:n"
        "1. Executive Summary (3-5 sentences for leadership)n"
        "2. Findings (grouped by severity: Critical → Low)n"
        "3. Each finding includes: description, evidence, "
        "   CVSS score, exploitability, and remediationn"
        "4. Risk Matrix summaryn"
        "5. Recommended next stepsn"
        "Write in professional security assessment language."
    ),
    model=model(),
)

# Chain: Scanner → Analyst → Reporter
scanner.handoffs = [analyst]
analyst.handoffs = [reporter]

# Run the full pipeline
result = await Runner.run(
    scanner,
    "Perform a full security assessment on target 203.0.113.42. "
    "Check IP reputation, scan ports, analyze HTTP headers, "
    "and look up CVE-2024-3094 and CVE-2021-44228.",
    max_turns=25,
)
print(result.final_output)

This three-agent pipeline — scanner, analyst, reporter — reflects a real security assessment workflow. Each agent has a clear role, specific tools, and defined handoff criteria. The max_turns=25 limit prevents runaway agent loops, which is a practical safeguard in production.

Guardrails in Production: Beyond Pattern Matching

The MarkTechPost tutorial shows a basic keyword-matching guardrail. In production, CAI’s research describes a four-layer defense system:

Layer 1: Input Validation

Heuristic pattern matching (as shown above) — fast, deterministic, catches the most common injection patterns. This runs on every input before it reaches the model.

Layer 2: Structural Analysis

Check the shape of the input, not just its content:

import unicodedata

def has_encoding_artifacts(text: str) -> bool:
    """Detect encoding tricks used to evade input filters."""
    # Zero-width characters
    zero_width = sum(
        1 for c in text
        if c in ('u200b', 'u200c', 'u200d', 'u200e', 'u200f', 'ufeff')
    )
    if zero_width > 3:
        return True

    # Unicode homoglyphs (characters that look like ASCII but aren't)
    suspicious = 0
    for c in text:
        if ord(c) > 127:
            normalized = unicodedata.normalize('NFKD', c)
            if normalized != c:
                suspicious += 1
    if suspicious > len(text) * 0.1:
        return True

    # Base64-like strings longer than 50 chars
    b64_pattern = re.findall(r'[A-Za-z0-9+/]{50,}={0,2}', text)
    if b64_pattern:
        return True

    return False

Layer 3: Behavioral Monitoring

Track what the agent is doing across turns. An attacker might use individually innocent requests that cumulatively push the agent toward a compromised state:

class SessionMonitor:
    """Track agent behavior across a session for anomaly detection."""

    def __init__(self):
        self.turns = []
        self.tools_called = []
        self.risk_score = 0.0

    def record_turn(self, user_input: str, agent_response: str, tools: list):
        self.turns.append({
            "input": user_input,
            "response": agent_response,
            "tools": tools,
        })
        self.tools_called.extend(tools)
        self._update_risk_score()

    def _update_risk_score(self):
        """Recalculate session risk based on behavioral patterns."""
        score = 0.0

        # Escalation pattern: increasing tool calls per turn
        if len(self.turns) >= 3:
            recent_counts = [len(t["tools"]) for t in self.turns[-3:]]
            if recent_counts == sorted(recent_counts) and recent_counts[-1] > 3:
                score += 0.3  # Escalating tool usage

        # Scope creep: requesting access to new resource types
        resource_types = set()
        for tool in self.tools_called:
            if "database" in tool.lower():
                resource_types.add("database")
            if "file" in tool.lower():
                resource_types.add("filesystem")
            if "network" in tool.lower():
                resource_types.add("network")
        if len(resource_types) > 2:
            score += 0.2  # Expanding attack surface

        # Repetitive probing: same tool called on different targets
        if len(self.tools_called) > 10:
            from collections import Counter
            counts = Counter(self.tools_called)
            most_common = counts.most_common(1)[0]
            if most_common[1] > len(self.tools_called) * 0.5:
                score += 0.2  # Potential enumeration attack

        self.risk_score = min(score, 1.0)

    def should_block(self) -> bool:
        """Block the session if risk exceeds threshold."""
        return self.risk_score >= 0.7

Layer 4: Output Sanitization

Even if input validation and behavioral monitoring both fail, the final layer checks the agent’s output before it reaches the user or triggers any downstream action:

def sanitize_agent_output(output: str, allowed_patterns: list = None) -> str:
    """Remove or redact sensitive information from agent output."""
    sensitive_patterns = [
        (r'-----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----[sS]*?-----END (RSA |EC |DSA )?PRIVATE KEY-----', '[PRIVATE KEY REDACTED]'),
        (r'b(?:sk|pk|api|token|secret|password)[w_-]*s*[=:]s*S+', '[CREDENTIAL REDACTED]'),
        (r'bd{1,3}.d{1,3}.d{1,3}.d{1,3}b', lambda m: redact_ip(m.group())),
    ]

    result = output
    for pattern, replacement in sensitive_patterns:
        result = re.sub(pattern, replacement, result, flags=re.IGNORECASE)

    return result

def redact_ip(ip: str) -> str:
    """Redact IP address but preserve the subnet for context."""
    octets = ip.split('.')
    if len(octets) == 4:
        return f"{octets[0]}.{octets[1]}.***.***"
    return "[IP REDACTED]"

Multi-Turn Context and Streaming

Security assessments rarely complete in a single exchange. CAI supports multi-turn conversations where the agent retains context from previous turns:

advisor = Agent(
    name="Security Advisor",
    instructions=(
        "You are a senior security advisor. Reference prior context "
        "in your responses. When asked about remediation, provide "
        "specific, actionable steps with code examples where appropriate."
    ),
    model=model(),
)

# Turn 1
msgs = [{"role": "user", "content": "We found open Redis (6379) on production."}]
r1 = await Runner.run(advisor, msgs)
# Agent explains the risk: unauthorized access, data exfiltration, RCE

# Turn 2 — agent has full context from turn 1
msgs2 = r1.to_input_list() + [
    {"role": "user", "content": "How do we secure it without downtime?"}
]
r2 = await Runner.run(advisor, msgs2)
# Agent provides specific Redis AUTH configuration and firewall rules

# Turn 3
msgs3 = r2.to_input_list() + [
    {"role": "user", "content": "Give me the one-line Redis config to enable auth."}
]
r3 = await Runner.run(advisor, msgs3)

The to_input_list() method serializes the full conversation history, including tool calls and results, so the agent maintains context across turns. This is essential for iterative security workflows where findings build on each other.

For interactive applications, CAI also supports streaming output:

streaming_agent = Agent(
    name="Streaming Security Agent",
    instructions="You are a cybersecurity educator.",
    model=model(),
)

stream = Runner.run_streamed(
    streaming_agent,
    "Explain the CIA triad with real-world examples for each principle."
)

async for event in stream.stream_events():
    if event.type == "raw_response_event":
        if hasattr(event.data, "delta") and isinstance(event.data.delta, str):
            print(event.data.delta, end="", flush=True)

CTF Challenge Automation

One of CAI’s most impressive capabilities is automating CTF challenge pipelines. Here’s how a three-agent chain solves a crypto challenge end-to-end:

@function_tool
def read_challenge(name: str) -> str:
    """Read a CTF challenge description and hints."""
    challenges = {
        "crypto_101": {
            "description": "Decode this Base64 string to find the flag: "
                          "Q0FJe2gzMTEwX3cwcjFkfQ==",
            "hint": "Standard Base64 decoding. Flag format: CAI{...}",
            "category": "Cryptography",
            "difficulty": "Beginner",
        }
    }
    ch = challenges.get(name.lower())
    return json.dumps(ch, indent=2) if ch else f"Challenge '{name}' not found."

@function_tool
def submit_flag(flag: str) -> str:
    """Submit a CTF flag for validation."""
    expected = "CAI{h3110_w0r1d}"
    if flag.strip() == expected:
        return f"🏆 CORRECT! Flag accepted: {expected}"
    # Don't reveal the correct flag — that's a security antipattern
    return "❌ Incorrect. Try again."

# Three-agent CTF pipeline
ctf_recon = Agent(
    name="CTF Recon",
    instructions="Read the challenge. Identify the attack vector. Hand off to Exploit.",
    tools=[read_challenge],
    model=model(),
)

ctf_exploit = Agent(
    name="CTF Exploit",
    instructions="Decode the data to extract the flag. Hand off to Validator.",
    tools=[decode_data],
    model=model(),
)

ctf_validator = Agent(
    name="Flag Validator",
    instructions="Submit the candidate flag. Report result.",
    tools=[submit_flag],
    model=model(),
)

ctf_recon.handoffs = [ctf_exploit]
ctf_exploit.handoffs = [ctf_validator]

result = await Runner.run(
    ctf_recon,
    "Solve the 'crypto_101' challenge. Read it, decode the flag, submit it.",
    max_turns=15,
)
print(result.final_output)

In the Dragos OT CTF, CAI used this pattern (at much greater complexity) to solve 32 of 34 challenges — including reverse engineering, forensics, and OT-specific problems involving industrial protocols like Modbus and DNP3.

CAI vs. General-Purpose Agent Frameworks

Feature	CAI	LangChain	CrewAI	AutoGen
Domain focus	Cybersecurity-specific	General-purpose	General-purpose	General-purpose
Built-in security tools	Yes (recon, exploit, privesc)	No	No	No
Prompt injection guardrails	Four-layer system	Basic	None	None
CTF proven	Dragos OT CTF, HTB	No	No	No
Human-in-the-loop	Built-in HITL	Manual implementation	Manual implementation	Manual implementation
Research-backed	24+ peer-reviewed papers	Community-driven	Community-driven	Microsoft Research
Model support	300+ models	Multi-provider	Multi-provider	Multi-provider
Bug bounty ready	Yes (HackerOne validated)	No	No	No

The key differentiator isn’t the agent orchestration — any framework can chain agents together. It’s the domain-specific tooling and guardrails. When you’re building a security agent, having pre-built tools for CVE lookups, port scanning, and header analysis — plus guardrails specifically designed to catch security-relevant prompt injection — saves enormous development time and reduces the attack surface of the agent itself.

Production Deployment Considerations

Before deploying CAI agents in production, consider these practical concerns:

API Key Security

Never hardcode API keys. Use environment variables with python-dotenv or a secrets manager. CAI’s setup expects OPENAI_API_KEY in the environment, but supports any OpenAI-compatible endpoint (including local models via Ollama):

# .env file
OPENAI_API_KEY=sk-...
OPENAI_BASE_URL=https://api.openai.com/v1  # Or your proxy/gateway
CAI_MODEL=openai/gpt-4o-mini

# For local models (no API key sent externally)
# OPENAI_BASE_URL=http://localhost:11434/v1
# CAI_MODEL=ollama/llama3

Sandboxing Tool Execution

CAI agents with access to shell commands or network tools must run in sandboxed environments. The framework itself doesn’t enforce sandboxing — that’s the operator’s responsibility. Use Docker containers with restricted network access, seccomp profiles, and resource limits.

Logging and Audit Trail

Every agent action should be logged for accountability. CAI supports tracing via Phoenix (CAI_TRACING=true), but for security assessments, you’ll want immutable audit logs:

import hashlib, datetime

class SecurityAuditLog:
    """Immutable audit log for agent actions."""

    def __init__(self, log_path="agent_audit.jsonl"):
        self.log_path = log_path

    def log_action(self, agent_name: str, action: str, target: str,
                   result: str, risk_level: str):
        entry = {
            "timestamp": datetime.datetime.utcnow().isoformat() + "Z",
            "agent": agent_name,
            "action": action,
            "target": target,
            "result_hash": hashlib.sha256(result.encode()).hexdigest(),
            "risk_level": risk_level,
            "entry_hash": "",  # Computed after to make entry tamper-evident
        }
        entry["entry_hash"] = hashlib.sha256(
            json.dumps(entry, sort_keys=True).encode()
        ).hexdigest()

        with open(self.log_path, "a") as f:
            f.write(json.dumps(entry) + "n")

Cost Control

Multi-agent pipelines with tool calls can consume tokens rapidly. Set explicit limits:

# Limit per-run token usage
result = await Runner.run(
    agent,
    task,
    max_turns=10,  # Maximum agent-tool interaction cycles
)

# Monitor cumulative usage
print(f"Turns used: {len(result.raw_responses)}")
print(f"Tokens consumed: {result.usage.total_tokens}")

Authenticity Verification: What We Checked

Any article about a security tool should verify its claims independently. Here’s what we confirmed:

arXiv paper (2504.06017): Verified. Authored by Víctor Mayoral-Vilches et al., published April 2025, revised April 9. The 3,600× speedup and 11× average claims are in the abstract.
GitHub repository: Verified at github.com/aliasrobotics/cai. Active development, real commits, real issues.
Dragos OT CTF 2025: Verified via Alias Robotics case study. Top-10 finish, Rank 1 during hours 7-8, 32/34 challenges solved.
HackerOne collaboration: Verified via case study. CAI’s Retester agent inspired HackerOne’s production deduplication agent.
HackTheBox ranking: Referenced in the arXiv paper and HTB profile (rank 2268644).
CAI Fluency paper (2508.13588): Verified. Published August 2025.
Guardrails paper (2508.21669): Verified. Four-layer defense architecture.
CAIBench (2510.24317): Verified. Meta-benchmark for evaluating cybersecurity AI agents.

What we couldn’t independently verify: The exact 156× cost reduction figure and specific HTB ranking numbers (platform rankings change frequently). These come from the paper’s self-reported data. The Dragos CTF results and HackerOne collaboration are independently documented by third parties.

Limitations and Honest Critique

CAI is impressive, but no framework is perfect. Here are the gaps we identified:

Dependency on LLM quality. CAI’s effectiveness is fundamentally bounded by the underlying model. A weak model produces weak security analysis. The framework adds structure and tools, but it can’t make a model smarter than it is. The paper’s benchmarks used specific model configurations — your mileage with a cheaper or older model will vary.

Sandboxing is your problem. CAI doesn’t enforce execution sandboxing. If you give an agent a tool that runs shell commands, it will run shell commands — including ones that an attacker might trick it into running via prompt injection. The guardrails help, but defense-in-depth requires infrastructure-level sandboxing that the framework doesn’t provide.

Tool quality determines output quality. The tutorial’s scan_open_ports uses random.seed() to simulate port scans. In production, you need real tools (nmap, Nuclei, custom scanners) connected to real infrastructure. The framework doesn’t magically create good tools — it orchestrates the ones you build.

Ethical boundaries are soft. CAI includes ethical guidelines in its documentation and disclaimer, but nothing technically prevents misuse. The framework is open-source and free for research. For commercial use, Alias Robotics requires a license — but the code is already out there. This is a tension inherent to all dual-use security tools.

Vendor lock-in risk. While CAI supports 300+ models through OpenAI-compatible APIs, the agent tool system and handoff protocol are CAI-specific. Migrating to another framework means rewriting your tool definitions and agent orchestration logic.

Key Takeaways

CAI is legitimate and battle-tested. Peer-reviewed research, real CTF performance, and production deployment at HackerOne validate the framework’s capabilities. This isn’t vaporware.
The architecture is sound. Agents + Tools + Handoffs + Guardrails is the right abstraction for security automation. The framework handles orchestration so you can focus on domain expertise.
Guardrails are multi-layered, not just pattern matching. The four-layer defense system (input validation, structural analysis, behavioral monitoring, output sanitization) reflects real security engineering principles, not just keyword filters.
Multi-agent handoffs enable realistic workflows. The scanner → analyst → reporter pattern mirrors how actual security teams operate. Agent-as-tool orchestration adds hierarchical delegation for complex assessments.
Production deployment requires infrastructure. CAI handles agent logic, but you’re responsible for sandboxing, audit logging, cost control, and API key management. Plan for these from the start.
The framework democratizes security testing. Non-professionals discovering CVSS 4.3-7.5 vulnerabilities is both the promise and the risk. Use it responsibly.

Getting Started

# Install CAI
pip install cai-framework python-dotenv

# Set your API key
export OPENAI_API_KEY=your-key-here

# Or use a local model with Ollama
export OPENAI_BASE_URL=http://localhost:11434/v1
export CAI_MODEL=ollama/llama3

# Run the interactive CLI
cai

Resources:

GitHub: github.com/aliasrobotics/cai
Documentation: aliasrobotics.github.io/cai
Core Paper: arXiv:2504.06017 — CAI: An Open, Bug Bounty-Ready Cybersecurity AI
Guardrails Paper: arXiv:2508.21669 — Four-layer defense against prompt injection
Benchmark Paper: arXiv:2510.24317 — CAIBench meta-framework
Case Studies: aliasrobotics.com/case-studies
Full Tutorial Notebook: MarkTechPost Colab Notebook

OWASP Top 10 for Agentic Applications 2026 — The risk framework for AI agent security
Prompt Injection in 2026: Real Attacks & Defense Strategies — Understanding the threat CAI’s guardrails defend against
Red Teaming LLM Applications: A Practical Playbook — How to test AI systems for vulnerabilities
Zero Trust Architecture for AI Systems — Securing AI deployments with Zero Trust principles