Does this skill help with PII protection?

Yes, it includes specialized patterns for detecting and redacting sensitive data such as social security numbers, emails, and credit card information before they are displayed to the user.

How does the hallucination check work?

The hallucination guardrail compares the agent's output against the provided source context using a grounding score to determine if the information is factually supported or fabricated.

What is the difference between input and output guardrails?

Input guardrails check the user's prompt before it reaches the LLM to prevent attacks like prompt injection, while output guardrails validate the AI's response to ensure it doesn't contain PII or harmful content.

Can I integrate this with Langfuse?

Yes, the skill is designed to work seamlessly with Langfuse decorators for deep observability into which guardrails triggered and why.

Will these safety checks slow down my AI agent?

The skill includes patterns for async, parallel execution of guardrails and detailed latency tracking to help you balance security with real-time performance.

Guardrails & Safety Instrumentation

Name: Guardrails & Safety Instrumentation
Author: nexus-labs-automation

bynexus-labs-automation

Seguridad y Pruebas

Implements comprehensive input and output safety checks to secure AI agent interactions and protect sensitive data.

Acerca de

The Guardrails & Safety skill provides a robust framework for instrumenting safety checks at both the input and output stages of AI agent workflows. It enables developers to detect prompt injections, filter harmful content, redact PII (Personally Identifiable Information), and perform hallucination checks using LLM-as-a-judge or NLI models. By integrating with observability platforms like Langfuse, it ensures that every safety intervention is logged, allowing teams to monitor latency overhead, track false positive rates, and maintain a high-quality user experience without compromising security.

Características Principales

Automated PII detection and redaction (SSN, Email, Phone, CC)
Performance tracking for latency, block rates, and accuracy metrics
Multi-stage instrumentation for both user inputs and agent outputs
Robust prompt injection and content filtering heuristics
Hallucination detection to ensure output grounding in context
0 GitHub stars

Casos de Uso

Protecting customer-facing AI agents from prompt injection attacks
Monitoring and reducing hallucination rates in RAG-based systems
Automatically redacting sensitive personal information from agent responses

Acerca de

Características Principales

Automated PII detection and redaction (SSN, Email, Phone, CC)
Performance tracking for latency, block rates, and accuracy metrics
Multi-stage instrumentation for both user inputs and agent outputs
Robust prompt injection and content filtering heuristics
Hallucination detection to ensure output grounding in context
0 GitHub stars

Casos de Uso

Protecting customer-facing AI agents from prompt injection attacks
Monitoring and reducing hallucination rates in RAG-based systems
Automatically redacting sensitive personal information from agent responses