Implements state-of-the-art computer vision pipelines using YOLO26, SAM 3, and Vision Language Models for real-time detection and spatial reasoning.
The Computer Vision Expert skill equips Claude with the specialized knowledge required to architect and optimize high-performance vision systems as of 2026. It focuses on integrating NMS-free architectures like YOLO26 for low-latency detection, leveraging SAM 3 for promptable text-to-mask segmentation, and utilizing Vision Language Models (VLMs) for semantic scene understanding. This skill is essential for developers building autonomous systems, industrial inspection tools, or spatial computing applications that require bridging classical geometry with modern deep learning and edge-optimized deployment.
主要功能
01Monocular depth estimation and 3D scene reconstruction patterns
02End-to-end NMS-free object detection using YOLO26 for reduced latency
03Text-to-mask promptable segmentation with Segment Anything 3 (SAM 3)
0446 GitHub stars
05Visual grounding and reasoning using VLMs like Florence-2 and Qwen2-VL
06Edge-first optimization for ONNX, TensorRT, and NPU hardware
使用场景
01Building spatial awareness and SLAM pipelines for autonomous robotics
02Optimizing vision models for high-speed inference on low-power edge devices
03Developing real-time industrial inspection systems with zero-shot part segmentation