Builds and optimizes state-of-the-art visual intelligence systems using YOLO26, SAM 3, and advanced vision-language models.
This skill transforms Claude into a senior vision systems architect specializing in 2026-era SOTA technologies. It provides expert guidance on implementing NMS-free detection with YOLO26, promptable segmentation via SAM 3, and complex visual reasoning using VLMs. Whether you are designing real-time spatial analysis for robotics or optimizing vision pipelines for edge deployment on NPUs, this skill bridges the gap between modern deep learning and classical geometric calibration to deliver high-performance, production-grade vision solutions.
Características Principales
01Monocular Depth Estimation and 3D Reconstruction
02Text-to-Mask Segmentation with SAM 3
03Visual Grounding and Reasoning with VLMs
04High-Precision Sub-pixel Camera Calibration
0534,777 GitHub stars
06YOLO26 NMS-Free Detection & Edge Optimization
Casos de Uso
01Optimizing vision models for deployment on mobile and IoT edge devices.
02Implementing zero-shot semantic segmentation for complex scene understanding.
03Developing real-time industrial inspection systems with low-latency object detection.