Active AI Guardrails (AAIG) are the safety layer between user queries and AI responses. Every message is evaluated against your configured guardrail rules before the assistant responds. If any rule is violated, the response is blocked — protecting your organization from unsafe, non-compliant, or off-topic interactions.
How Guardrails Work
Each guardrail contains a natural-language rule that a language model evaluates independently. Guardrails run concurrently (up to three at a time) to minimize response latency, and each evaluation is recorded in the audit trail.
Three key principles govern how guardrails operate:
- Independent evaluation — each guardrail is evaluated by a separate LLM call, so rules never interfere with one another. A safety guardrail cannot affect how a compliance guardrail evaluates the same message.
- Tripwire blocking — if any single guardrail fails, the entire response is blocked. There is no "majority rules" — one violation is enough to prevent delivery.
- Concurrent execution — guardrails run in parallel to keep response times low. The system does not wait for one guardrail to finish before starting the next.
Five Categories
Guardrails are organized into five categories, each targeting a different type of risk:
Safety guardrails prevent harmful, abusive, or dangerous content from entering or leaving the system.
Example rules:
- Block requests that contain threats or harassment
- Prevent the assistant from providing medical, legal, or financial advice
- Flag messages that discuss self-harm
Compliance guardrails enforce regulatory and organizational policy requirements.
Example rules:
- Ensure responses include required disclaimers for regulated topics
- Block discussion of topics outside the organization's mandate
- Enforce data handling policies for sensitive information
Quality guardrails maintain the standard and relevance of AI responses.
Example rules:
- Reject queries that are too vague to produce a useful response
- Ensure responses stay on-topic for the organization's domain
- Block nonsensical or spam-like input
Privacy guardrails protect personally identifiable information (PII) and sensitive data.
Example rules:
- Block messages containing social security numbers, credit card numbers, or addresses
- Prevent the assistant from requesting personal information
- Redact or reject queries that expose sensitive employee data
Custom guardrails address organization-specific requirements that don't fit the other categories.
Example rules:
- Restrict responses to a specific language or dialect
- Enforce branding guidelines in AI-generated text
- Block discussion of competitors or sensitive internal topics
Priority and Reporting
Each guardrail has two additional configuration options:
- Priority (0–10) — indicates criticality for admin triage. Higher-priority guardrails appear first in dashboards and alerts. Priority does not affect evaluation order — all guardrails run in parallel regardless of priority.
- Reporting level — controls what gets logged to the audit trail:
| Level | What is recorded |
|---|---|
| None | Evaluate silently — no audit entries |
| Alerts | Log only violations (failed evaluations) |
| All | Log every evaluation result, pass or fail |
Guardrails and Groups
Guardrails are assigned to groups, not applied globally. This means different user populations can have different safety rules. For example:
- An internal employees group might have relaxed quality guardrails but strict compliance rules
- A public access group might have aggressive safety and privacy guardrails
- An admin group might bypass certain guardrails for testing purposes
A user's effective guardrails are the union of all guardrails from every group they belong to.
False Positive Tracking
Not every guardrail trigger is a real violation. AskRAI supports false positive marking in the conversation logs, allowing you to:
- Identify guardrails that fire too aggressively
- Track false positive rates over time in the guardrail analytics
- Refine guardrail prompts to reduce false triggers without weakening protection
Next Steps
- Guardrails — create, edit, and monitor guardrails in the admin console
- Confidence & Escalation — understand what happens after guardrails pass
- Governance & Audit — learn how guardrail evaluations feed into the audit trail
- Sandbox — test guardrail behavior before deploying to production