AI Security Tools and Frameworks
Offensive Testing Tools
Garak (NVIDIA)
Garak is an open-source LLM vulnerability scanner maintained by NVIDIA. It probes language models for known vulnerabilities including prompt injection, data leakage, hallucination, and toxic output generation. Garak supports multiple LLM backends (OpenAI, Hugging Face, local models) and produces structured reports of discovered vulnerabilities.
Best for: Automated red teaming of LLM applications. Running systematic vulnerability assessments across multiple attack categories. Integrating into CI/CD pipelines for pre-deployment security checks.
Limitations: Automated scans miss creative or context-specific attacks. Garak finds known vulnerability patterns but does not generate novel attacks. Human red teaming remains essential alongside automated tools.
Microsoft Counterfit
Counterfit is a command-line tool for assessing the security of ML models. It supports multiple attack frameworks (ART, TextAttack) and can target models through APIs or local inference. Originally built for Microsoft's internal AI red team, it was open-sourced to support broader adoption.
Best for: Adversarial robustness testing of classification models. Evaluating model vulnerability to evasion attacks across different attack algorithms. Comparing model robustness across architectures.
Adversarial Robustness Toolbox (ART) by IBM
ART is a comprehensive Python library for ML security. It implements attacks (evasion, poisoning, extraction, inference) and defenses (adversarial training, input preprocessing, certified defenses) across multiple ML frameworks. ART is the most mature adversarial ML library available.
Best for: Research and experimentation with adversarial attacks and defenses. Building custom adversarial testing pipelines. Evaluating model robustness with standardized attack implementations.
TextAttack
TextAttack specializes in adversarial attacks on NLP models. It generates adversarial text examples through synonym substitution, character perturbation, and semantic-preserving transformations. Useful for testing text classifiers, sentiment analysis models, and content moderation systems.
Defensive Tools
NVIDIA NeMo Guardrails
NeMo Guardrails is an open-source toolkit for adding programmable safety and security controls to LLM applications. It defines guardrails as rules that control input validation, output filtering, topic restrictions, and fact-checking. Guardrails are defined in a domain-specific language (Colang) that is accessible to both engineers and non-technical stakeholders.
Best for: Adding structured safety controls to LLM applications. Implementing topic restrictions (keeping the model on-topic). Building fact-checking and hallucination detection into LLM workflows. Companies building production LLM applications that need auditable safety controls.
LLM Guard (Protect AI)
LLM Guard provides input and output scanning for LLM applications. It detects prompt injection attempts, PII in inputs and outputs, toxic content, and other security-relevant patterns. It can be deployed as a middleware layer between the user and the LLM.
Best for: Adding a security layer to existing LLM applications without rewriting the application. PII detection and redaction in LLM inputs and outputs. Quick deployment of baseline security controls.
Rebuff
Rebuff is a self-hardening prompt injection detection framework. It uses multiple detection methods (heuristic analysis, LLM-based classification, vectorDB for known attacks) and improves over time as it encounters new injection attempts. The self-hardening approach means the system gets stronger with each detected attack.
Best for: Prompt injection detection that improves over time. Teams that want a detection system that learns from their specific attack patterns. Applications where prompt injection is the primary security concern.
Prompt Armor
Prompt Armor provides API-based prompt injection detection for LLM applications. It scans user inputs for injection patterns before they reach the model, providing a detection score and classification. Commercial product with a free tier for evaluation.
Governance Frameworks
NIST AI Risk Management Framework (AI RMF)
The NIST AI RMF is the foundational governance framework for AI risk management in the United States. It organizes AI risk management into four core functions:
- Govern: Establishing organizational policies, roles, and accountability for AI risk management
- Map: Identifying and categorizing AI risks for specific systems and contexts
- Measure: Quantifying and assessing identified risks through testing and evaluation
- Manage: Prioritizing and acting on risk findings through controls and monitoring
AI security engineers use the AI RMF to structure security programs, communicate with leadership about AI risk posture, and satisfy federal compliance requirements. The companion Playbook provides specific actions and measures for implementing each function.
MITRE ATLAS
MITRE ATLAS (Adversarial Threat Landscape for Artificial Intelligence Systems) is a knowledge base of adversarial tactics and techniques specific to AI systems. Modeled after MITRE ATT&CK, it provides a structured taxonomy for AI threats including reconnaissance, resource development, initial access, ML model access, and exfiltration specific to ML systems.
Best for: Threat modeling AI systems using a standardized framework. Red team exercise planning. Communicating AI-specific threats using language that security organizations already understand from ATT&CK.
OWASP Top 10 for LLM Applications
Covered in our dedicated OWASP LLM guide, this framework identifies the ten most critical risks for LLM applications. It is the most widely referenced checklist for LLM security assessments.
Google SAIF (Secure AI Framework)
Google's Secure AI Framework provides a conceptual model for securing AI systems. It identifies six core elements: expanding security foundations to AI, extending detection and response to AI, automating defenses, harmonizing platform-level controls, adapting controls for AI, and contextualizing AI system risks. SAIF is less prescriptive than NIST AI RMF but provides useful strategic guidance.
Open-Source vs Commercial
The AI security tooling market is still forming. Open-source tools are more mature and more widely adopted than commercial alternatives for most use cases. This is partly because the field is young enough that commercial products have not had time to establish market dominance, and partly because the security community has a strong open-source culture.
Commercial tools (Lakera, HiddenLayer, Protect AI, Robust Intelligence) are emerging and offer enterprise features like managed deployment, SLAs, compliance reporting, and integration support. For large enterprises that need vendor support and compliance documentation, commercial tools are becoming viable. For smaller teams and individual practitioners, open-source tools provide equivalent or superior technical capabilities.
AI security engineers are expected to be proficient with open-source tools and aware of the commercial landscape. Building custom tooling to fill gaps that neither open-source nor commercial tools address is a core part of the job.
Get the AISec Brief
Weekly career intelligence for AI Security Engineers. Salary trends, who's hiring, threat landscape shifts, and certification updates. Free.