Implements cutting-edge vision systems using YOLO26, Segment Anything 3, and Vision Language Models for real-time spatial intelligence.
This skill equips Claude with advanced expertise in 2026-era computer vision architectures, enabling the design and optimization of high-performance pipelines. It specializes in NMS-free real-time detection with YOLO26, promptable text-to-mask segmentation via SAM 3, and sophisticated visual reasoning using state-of-the-art Vision Language Models (VLMs). Whether you are building autonomous systems, industrial inspection tools, or spatial awareness engines, this skill provides the architectural guidance needed to bridge classical geometry with modern deep learning for both edge and cloud deployment.
주요 기능
01Advanced visual reasoning and semantic scene understanding with VLMs
02Text-guided and zero-shot segmentation using SAM 3 and SAM 3D
03Optimization for edge devices via ONNX, TensorRT, and NPU-specific pipelines
04End-to-end real-time object detection with NMS-free YOLO26 architectures
0536,229 GitHub stars
06High-precision monocular depth estimation and 3D scene reconstruction
사용 사례
01Developing high-speed industrial inspection systems with real-time detection and precise mask refinement
02Building spatial awareness and SLAM capabilities for autonomous robotics or drones
03Creating visual question-answering systems that extract structured data from complex images