LLM red-teaming. Agentic blast-radius. Regulator-ready.
AI systems introduce attack surfaces that standard application security controls do not address. Prompt injection, model theft, indirect instruction hijacking via documents and tools, and agentic privilege escalation require purpose-built assessment methodologies. We assess, harden, and govern AI systems from the model layer through the application interface — and against regulatory obligations that are now enforceable.
OWASP LLM Top 10MITRE ATLASNIST AI RMFEU AI ActISO/IEC 42001
ATTACK SURFACE
Where AI breaks differently from web.
LLM Application Security
- Direct prompt injection — user-supplied input overriding system instructions, role confusion, and safety bypass techniques
- Indirect prompt injection — adversarial instructions embedded in documents, web content, emails, or tool outputs retrieved and processed by the model without user awareness
- Data exfiltration via output — extraction of system prompts, RAG context, other users' data, and confidential configuration through crafted queries
- Jailbreaks & safety bypass — multi-turn manipulation, roleplay exploitation, context window stuffing, and encoding evasion
- Function / tool call abuse — injection of malicious tool arguments, tool selection manipulation, and chained tool misuse
- Insecure output handling — LLM-generated content rendered in unsafe contexts (XSS, SQLi, path traversal downstream)
Agentic AI Security
- Agent privilege escalation — agents acquiring capabilities or access beyond their intended scope through instruction manipulation or tool chaining
- Multi-agent trust boundaries — in orchestrator/sub-agent architectures, assessment of trust assumptions between agents and the impact of one compromised agent on others
- Persistent memory poisoning — injecting adversarial content into agent memory stores that influences future decisions or actions
- Tool authorisation controls — whether agents can be induced to invoke tools with unintended arguments or outside authorised scope
- Human-in-the-loop bypass — techniques to circumvent approval workflows and safety checkpoints in agentic pipelines
- Blast radius assessment — mapping the real-world impact of a fully compromised agent given its tool access and permissions
RAG & Data Pipeline Security
- Vector database security — access controls, tenant isolation, and the risk of cross-tenant data retrieval in multi-tenant RAG deployments
- Document ingestion security — PII/PHI detection in preprocessing, metadata leakage, and malicious document content that survives into the index
- Retrieval manipulation — adversarial documents crafted to be preferentially retrieved and override legitimate context
- Embedding model integrity — validation that embedding models have not been substituted or tampered with in the pipeline
- Query-time injection — user input crafted to manipulate the retrieval query and surface attacker-controlled content
Model & Supply Chain Security
- Model provenance & integrity — cryptographic signing, provenance tracking (in-toto / SLSA for ML), and registry access controls
- Data poisoning — assessment of training and fine-tuning data pipeline integrity and controls against adversarial data injection
- Model theft & extraction — API-level model replication attacks and intellectual property exfiltration through repeated querying
- Membership inference — determining whether specific data was present in training data, with privacy and compliance implications
- Dependency & plugin security — SBOM for ML dependencies, scanning of LLM plugins and integrations, and third-party model risk
CONTROLS & RED TEAMING
Defensive engineering and continuous evaluation.
Controls & Guardrails
- Gateway policies — input/output filtering, content classification, rate limits, and tool scope enforcement at the API gateway layer
- Context governance — allowlists for retrievable document sets, chunk-level metadata access controls, and tenant isolation enforcement
- PII / PHI redaction — pre-query sanitisation and post-response scrubbing for regulated data types
- Prompt/response observability — telemetry, audit logging, and anomaly detection across all model interactions
- Secrets hygiene — ensuring API keys, credentials, and system prompt contents cannot be extracted via model outputs
Red Teaming & Safety Evals
- Adversarial prompt libraries — curated attack sets covering OWASP LLM Top 10 categories with application-specific adaptations
- Automated safety evals — regression test suites integrated into CI/CD to detect guardrail regressions on model or prompt updates
- Manual red team campaigns — chained, multi-turn attack scenarios including indirect injection via realistic document payloads
- Model drift monitoring — detecting behavioural changes in updated models that weaken previously validated safety properties
- Multimodal testing — vision, audio, and code inputs used as injection vectors where the application accepts non-text inputs
GOVERNANCE & REGULATION
Documentation regulators will accept.
NIST AI RMF
- AI risk register and control catalogue mapped to Govern, Map, Measure, and Manage functions
- Model cards and system cards documenting intended use, limitations, and known risks
- AI risk profile and organisational risk tolerance documentation
EU AI Act
- Risk classification — assessment of whether systems fall under prohibited, high-risk, limited-risk, or minimal-risk categories
- High-risk system obligations — risk management system, data governance, technical documentation, human oversight, and accuracy/robustness requirements
- GPAI model obligations — transparency, copyright policy, and systematic risk assessment for general-purpose AI models
ISO/IEC 42001 & Privacy
- ISO/IEC 42001 AI management system gap assessment and implementation roadmap
- Data retention and consent documentation for training and inference data
- Privacy impact assessment for AI systems processing personal data
- Export controls and cross-border data transfer obligations for model deployment
DELIVERABLES
What ships at the end.
Assessment
- AI system threat model with MITRE ATLAS technique mapping
- OWASP LLM Top 10 assessment report with findings, evidence, and remediation guidance
- Agentic system blast radius analysis and privilege boundary assessment
- RAG pipeline security review with data flow analysis
Engineering
- Gateway and guardrail configuration with policy-as-code rules
- Automated safety eval suite with CI/CD integration instructions
- Observability configuration: prompt/response logging and anomaly detection rules
- SBOM for ML dependencies with vulnerability assessment
Governance
- AI risk register mapped to NIST AI RMF or EU AI Act as applicable
- Model card and system card templates populated for assessed systems
- EU AI Act risk classification memo with obligations analysis
- Executive summary and remediation roadmap
Get a written proposal
Send scope + timeline. Detailed SoW within 1 business day.
Open the form →
Email a senior practitioner
Direct line for scoping questions. NDA available on request before you share details.
hello@grillisecurity.com →
Active incident?
24/7 incident line. Triage call + retainer set-up inside the hour for new engagements.
+372 5610 1641 →
