AIRTA Red Team

Open source and free to use. A red team tool for LLM and LLM-driven application security — run it behind the firewall on internal staging, or use it for authorized black-box assessments from the outside. Generate adversarial suites from security playbooks, execute them against live targets via browser UI or HTTP API, and assess whether each attack was exploited or mitigated.

Authorized assessments only. Test systems you have explicit permission to assess.

What is AIRTA Red Team?

Open source and free: AIRTA Red Team is published on GitHub with no license fee. Clone the repo, run python start.py, and start testing — on your laptop, in CI, or behind the firewall. You bring your own LLM API key for generation and assessment; the tool itself costs nothing.

Playbook-driven adversarial testing: Generate attack suites from security playbooks - OWASP LLM, OWASP Agent, MITRE ATLAS, jailbreak techniques, system prompt exfiltration, and multimodal file-upload vectors - tailored to your target domain.

Flexible deployment: Run AIRTA Red Team on your internal network against staging or pre-production targets, or point it at external endpoints for black-box pentest-style assessments. Same pipeline, same playbooks — you choose the access model.

Live target execution: Run attacks against registered targets through a browser-based Playwright runner or direct HTTP API transports. Discovery handles authentication, file uploads, and component mapping automatically.

Expert assessment: Playbook-specific expert models plus a judge model determine whether each attack was exploited or mitigated, producing structured evidence in attack_log.json and pipeline_report.json.

AIRTA Red Team dashboard - test management with jailbreak and prompt injection suites

The Pipeline

1
Generate

Create adversarial test suites from security playbooks using zero-shot, jailbreak, multimodal, and other strategies.

2
Discover

Connect your target, map UI selectors or API transports, and configure authentication.

3
Run

Execute attacks against live targets via browser automation or HTTP API - prompts, uploads, and tool calls included.

4
Assess

Playbook experts and a judge model score each result as exploited or mitigated, with severity per category.

5
Export

Push pipeline_report.json to AIRTA Systems for review, regression tracking, and compliance workflows.

Who This Is For

Red Teams

Map the target, discover and authenticate, generate attacks, run the suite, assess results, and export findings - a complete offensive pipeline for LLM and agentic applications.

Whitehats & Pentesters

Run the same pipeline on customer staging environments. Every prompt, response, and assessment is captured in structured JSON artifacts ready for client delivery.

AppSec & MLsec

Regression runs per release. Compare category_rollup across builds to catch regressions in jailbreak resistance, prompt injection defences, and multimodal upload handling.

Security Playbooks

Tests are organized by playbook and category, with optional vector_type and payload fields for multimodal delivery.

PlaybookFocus
OWASP LLMLLM01–LLM10 prompt injection, data leakage, excessive agency
OWASP AgentASI01–ASI10 agentic application security
MITRE ATLASML kill-chain tactics and adversarial ML techniques
Jailbreak CoreDAN, encoding, injection, crescendo, and persona attacks
System Prompt ExfilDirect, audit framing, format coercion, indirect file exfil
Custom PlaybooksLLM powered playbooks for your specific use case

Attack Strategies

zero_shot - Single-message attacks for baseline detection floor.

multi_shot - Multi-turn pressure to wear down defences.

jailbreak - Jailbreak-focused techniques from the jailbreak core playbook.

multimodal - File-upload tests with vector_type and payload generators (PDF hidden text, polyglot files, OCR injection). Works with any security playbook.

few_shot, iterative, chain_of_thought - Additional adversarial shaping strategies for deeper coverage.

What AIRTA Red Team Delivers

  • Free and open source — Full pipeline, web UI, playbooks, and CLI — no subscription required. Fork, extend, and run on your own infrastructure.
  • Structured attack suites - JSON test files with playbook, categories, prompts, and optional multimodal payloads - editable in the web UI.
  • Full attack evidence - Prompts, uploads, responses, and tool calls captured in attack_log.json for every run.
  • Expert + judge assessment - Playbook-specific experts score each attack; a judge model produces pipeline_report.json with category_rollup and severity per prompt.
  • Browser and API execution - Playwright browser-bot for UI targets; api_document and api_multipart transports for headless API testing.
  • Domain-grounded attacks - Optional company and component playbooks tailor prompts to your target's business context.
  • AIRTA Systems export - Push assessment results into your compliance programme via the bulk import API for review and regression tracking.

Behind the Firewall or Black Box

AIRTA Red Team is not limited to a single testing posture. Internal red teams can run it behind the firewall against dev and staging environments with full network access. External pentesters and whitehats can use the same tool for authorized black-box runs against production or customer-facing endpoints — with no assumption of internal access.

In both cases, assessment focuses on observable runtime behaviour — prompts, uploads, responses, and tool calls — not source-code SAST. That complements static analysis by validating whether deployed defences hold under adversarial pressure.

Run locally in Docker with python start.py for the web UI, or drive the full pipeline from the CLI with main.py. The project is open source on GitHub and free to use. Pair with DVAIA for hands-on LLM vulnerability labs, or connect your own staging targets.


Get started free on GitHub, or request a free no-commitment discovery call.


Part of the End-to-End Framework

AIRTA Red Team is open source and free to use for offensive security testing of LLMs and agentic applications. Pair it with AIRTA (AI Risk Testing Agent) for compliance-aligned pre-deployment testing, and AILP for continuous post-market monitoring. Export red team findings to the AIRTA Systems dashboard for a complete security picture.