01Detects harmful content across multiple categories (e.g., violence, hate speech, sexual content)
02Identifies prompt injection attempts and jailbreak techniques to prevent model manipulation
03Detects language with confidence scores and enforces language policies
04Returns structured assessments including threat categories and confidence scores
05Provides a 'Request Reject' boolean indicating policy decisions
060 GitHub stars