Which benchmarks are supported?

It supports leading datasets including HarmBench, JailbreakBench, ToxiGen, TruthfulQA, RobustBench, and AdvGLUE for comprehensive AI evaluation.

Does this skill help with OWASP compliance?

Yes, it maps results to OWASP LLM 2025 categories such as Prompt Injection (LLM01), Sensitive Data Disclosure (LLM02), and Model Poisoning (LLM04).

Can I use this for vision models?

Yes, the skill includes specific configurations for LLM, vision, multimodal, and embedding model types.

What metrics does it provide?

It provides detailed metrics such as Attack Success Rate (ASR), Toxicity Scores, Accuracy Disparity, and Robust Accuracy to help compare model performance.

What is the benchmark-datasets skill used for?

It is used to run standardized security and safety tests against AI models to identify vulnerabilities like jailbreaks, toxicity, and adversarial weaknesses.

AI Security & Safety Benchmarks

Name: AI Security & Safety Benchmarks
Author: pluginagentmarketplace

bypluginagentmarketplace

•

セキュリティとテスト

Evaluates AI model security, robustness, and safety using standardized datasets like HarmBench, JailbreakBench, and AdvGLUE.

概要

This skill provides a comprehensive framework for running industry-standard benchmarks against AI models to assess vulnerabilities, bias, and safety risks. By integrating datasets such as HarmBench for harmful behaviors, JailbreakBench for prompt injection defense, and RobustBench for adversarial robustness, it enables security researchers and developers to quantify an AI system's resistance to attacks and alignment with safety standards. It bridges the gap between model development and security auditing by providing structured mappings to the OWASP Top 10 for LLMs and the NIST AI Risk Management Framework.

主な機能

Adversarial Robustness Assessment (RobustBench, AdvGLUE)
Jailbreak & Prompt Injection Testing (JailbreakBench, AdvBench)
1 GitHub stars
Standardized Safety Evaluation (HarmBench, ToxiGen, TruthfulQA)
Privacy & Data Extraction Audits (Membership Inference, Model Inversion)
Mapping to OWASP LLM 2025 and NIST AI RMF standards

ユースケース

Red teaming Large Language Models (LLMs) before production deployment
Auditing AI models for compliance with safety, bias, and truthfulness requirements
Benchmarking the adversarial robustness of vision, multimodal, and embedding models

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add pluginagentmarketplace/custom-plugin-ai-red-teaming benchmark-datasets

For use in Claude.ai and ChatGPT

Download Skill

GitHub

概要