Why is clean-label poisoning particularly dangerous?

Clean-label attacks are difficult to detect because the poisoned samples have correct labels, allowing them to bypass simple data validation while still manipulating the model's behavior.

Is this skill compatible with industry security standards?

Yes, it specifically maps findings to OWASP LLM 2025 (LLM03, LLM04) and MITRE ATLAS (AML.T0020, AML.T0019) classifications.

Can it help detect existing poisoning in my datasets?

While its primary focus is red teaming (attacking), it includes detailed guidance on detection methods such as Statistical Analysis, Spectral Signatures, and Neural Cleanse.

What types of attacks does this skill simulate?

The skill supports label flipping, backdoor injection, clean-label attacks, and LLM-specific training poisoning targeting instruction tuning and RLHF.

AI Data Poisoning Red Teaming

Name: AI Data Poisoning Red Teaming
Author: pluginagentmarketplace

bypluginagentmarketplace

•

보안 및 테스팅

Tests AI training pipelines for data poisoning vulnerabilities and backdoor injection to ensure model integrity.

소개

This skill provides a comprehensive framework for red teaming machine learning training pipelines by simulating advanced adversarial attacks. It enables security analysts to evaluate how models respond to label flipping, backdoor triggers, and sophisticated clean-label attacks that are often undetectable by standard validation. By integrating this skill into the development lifecycle, teams can identify critical weaknesses in data ingestion, fine-tuning, and RLHF processes, ultimately mapping vulnerabilities to industry-standard frameworks like OWASP LLM and MITRE ATLAS.

주요 기능

Maps findings to OWASP LLM Top 10 and MITRE ATLAS frameworks
Simulates label flipping and backdoor injection attacks
1 GitHub stars
Tests LLM fine-tuning and RLHF pipeline integrity
Implements sophisticated clean-label poisoning scenarios
Provides automated impact assessment and attack success metrics

사용 사례

Testing fine-tuned LLMs for resilience against instruction and preference poisoning
Validating the security of crowdsourced or externally scraped training datasets
Evaluating the effectiveness of data provenance and integrity checks in MLOps

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add pluginagentmarketplace/custom-plugin-ai-red-teaming data-poisoning

For use in Claude.ai and ChatGPT

Download Skill

GitHub

소개