AIRTA Red Team
Open source and free to use. A red team tool for LLM and LLM-driven application security — run it behind the firewall on internal staging, or use it for authorized black-box assessments from the outside. Generate adversarial suites from security playbooks, execute them against live targets via browser UI or HTTP API, and assess whether each attack was exploited or mitigated.
Authorized assessments only. Test systems you have explicit permission to assess.
What is AIRTA Red Team?
Open source and free: AIRTA Red Team is published on GitHub with no license fee. Clone the repo, run python start.py, and start testing — on your laptop, in CI, or behind the firewall. You bring your own LLM API key for generation and assessment; the tool itself costs nothing.
Playbook-driven adversarial testing: Generate attack suites from security playbooks - OWASP LLM, OWASP Agent, MITRE ATLAS, jailbreak techniques, system prompt exfiltration, and multimodal file-upload vectors - tailored to your target domain.
Flexible deployment: Run AIRTA Red Team on your internal network against staging or pre-production targets, or point it at external endpoints for black-box pentest-style assessments. Same pipeline, same playbooks — you choose the access model.
Live target execution: Run attacks against registered targets through a browser-based Playwright runner or direct HTTP API transports. Discovery handles authentication, file uploads, and component mapping automatically.
Expert assessment: Playbook-specific expert models plus a judge model determine whether each attack was exploited or mitigated, producing structured evidence in attack_log.json and pipeline_report.json.

The Pipeline
Generate
Create adversarial test suites from security playbooks using zero-shot, jailbreak, multimodal, and other strategies.
Discover
Connect your target, map UI selectors or API transports, and configure authentication.
Run
Execute attacks against live targets via browser automation or HTTP API - prompts, uploads, and tool calls included.
Assess
Playbook experts and a judge model score each result as exploited or mitigated, with severity per category.
Export
Push pipeline_report.json to AIRTA Systems for review, regression tracking, and compliance workflows.
Who This Is For
Red Teams
Map the target, discover and authenticate, generate attacks, run the suite, assess results, and export findings - a complete offensive pipeline for LLM and agentic applications.
Whitehats & Pentesters
Run the same pipeline on customer staging environments. Every prompt, response, and assessment is captured in structured JSON artifacts ready for client delivery.
AppSec & MLsec
Regression runs per release. Compare category_rollup across builds to catch regressions in jailbreak resistance, prompt injection defences, and multimodal upload handling.
Security Playbooks
Tests are organized by playbook and category, with optional vector_type and payload fields for multimodal delivery.
| Playbook | Focus |
|---|---|
| OWASP LLM | LLM01–LLM10 prompt injection, data leakage, excessive agency |
| OWASP Agent | ASI01–ASI10 agentic application security |
| MITRE ATLAS | ML kill-chain tactics and adversarial ML techniques |
| Jailbreak Core | DAN, encoding, injection, crescendo, and persona attacks |
| System Prompt Exfil | Direct, audit framing, format coercion, indirect file exfil |
| Custom Playbooks | LLM powered playbooks for your specific use case |
Attack Strategies
zero_shot - Single-message attacks for baseline detection floor.
multi_shot - Multi-turn pressure to wear down defences.
jailbreak - Jailbreak-focused techniques from the jailbreak core playbook.
multimodal - File-upload tests with vector_type and payload generators (PDF hidden text, polyglot files, OCR injection). Works with any security playbook.
few_shot, iterative, chain_of_thought - Additional adversarial shaping strategies for deeper coverage.
What AIRTA Red Team Delivers
- Free and open source — Full pipeline, web UI, playbooks, and CLI — no subscription required. Fork, extend, and run on your own infrastructure.
- Structured attack suites - JSON test files with playbook, categories, prompts, and optional multimodal payloads - editable in the web UI.
- Full attack evidence - Prompts, uploads, responses, and tool calls captured in
attack_log.jsonfor every run. - Expert + judge assessment - Playbook-specific experts score each attack; a judge model produces
pipeline_report.jsonwithcategory_rollupand severity per prompt. - Browser and API execution - Playwright browser-bot for UI targets;
api_documentandapi_multiparttransports for headless API testing. - Domain-grounded attacks - Optional company and component playbooks tailor prompts to your target's business context.
- AIRTA Systems export - Push assessment results into your compliance programme via the bulk import API for review and regression tracking.
Behind the Firewall or Black Box
AIRTA Red Team is not limited to a single testing posture. Internal red teams can run it behind the firewall against dev and staging environments with full network access. External pentesters and whitehats can use the same tool for authorized black-box runs against production or customer-facing endpoints — with no assumption of internal access.
In both cases, assessment focuses on observable runtime behaviour — prompts, uploads, responses, and tool calls — not source-code SAST. That complements static analysis by validating whether deployed defences hold under adversarial pressure.
Run locally in Docker with python start.py for the web UI, or drive the full pipeline from the CLI with main.py. The project is open source on GitHub and free to use. Pair with DVAIA for hands-on LLM vulnerability labs, or connect your own staging targets.
Get started free on GitHub, or request a free no-commitment discovery call.
Part of the End-to-End Framework
AIRTA Red Team is open source and free to use for offensive security testing of LLMs and agentic applications. Pair it with AIRTA (AI Risk Testing Agent) for compliance-aligned pre-deployment testing, and AILP for continuous post-market monitoring. Export red team findings to the AIRTA Systems dashboard for a complete security picture.