关于
This skill integrates OpenAI's CLIP (Contrastive Language-Image Pre-Training) model into the Claude Code environment, allowing developers to implement sophisticated multimodal capabilities without task-specific training. By leveraging a model trained on 400 million image-text pairs, it enables high-performance zero-shot classification, cross-modal retrieval, and semantic image searching. It is an essential tool for projects requiring automated content moderation, visual question answering, or any application where images need to be understood through the lens of natural language.