Which Vision Language Models (VLMs) are supported?

The skill provides guidance for using SOTA models such as Florence-2, PaliGemma 2, and Qwen2-VL for complex visual grounding and reasoning tasks.

Does it support text-based image segmentation?

Absolutely. The skill specializes in text-to-mask capabilities provided by SAM 3, allowing users to isolate objects using natural language descriptions instead of manual points.

What is the main advantage of YOLO26 in this skill?

YOLO26 utilizes an NMS-free architecture and MuSGD optimization to provide significantly faster inference and training convergence compared to legacy detection models.

Can this skill handle 3D scene reconstruction?

Yes, it leverages SAM 3D and Depth Anything V2 for accurate 3D object reconstruction and monocular depth estimation from single or multi-view images.

Is this skill suitable for edge device deployment?

Yes, it includes specific patterns for optimizing models via ONNX and TensorRT, taking advantage of YOLO26's simplified module structure for NPUs and TPUs.

Computer Vision Expert (SOTA 2026)

Name: Computer Vision Expert (SOTA 2026)
Author: sickn33

bysickn33

•

36,229

•

데이터 과학 및 ML

Implements cutting-edge vision systems using YOLO26, Segment Anything 3, and Vision Language Models for real-time spatial intelligence.

This skill equips Claude with advanced expertise in 2026-era computer vision architectures, enabling the design and optimization of high-performance pipelines. It specializes in NMS-free real-time detection with YOLO26, promptable text-to-mask segmentation via SAM 3, and sophisticated visual reasoning using state-of-the-art Vision Language Models (VLMs). Whether you are building autonomous systems, industrial inspection tools, or spatial awareness engines, this skill provides the architectural guidance needed to bridge classical geometry with modern deep learning for both edge and cloud deployment.

주요 기능

01Advanced visual reasoning and semantic scene understanding with VLMs

02Text-guided and zero-shot segmentation using SAM 3 and SAM 3D

03Optimization for edge devices via ONNX, TensorRT, and NPU-specific pipelines

04End-to-end real-time object detection with NMS-free YOLO26 architectures

0536,229 GitHub stars

06High-precision monocular depth estimation and 3D scene reconstruction

사용 사례

01Developing high-speed industrial inspection systems with real-time detection and precise mask refinement

02Building spatial awareness and SLAM capabilities for autonomous robotics or drones

03Creating visual question-answering systems that extract structured data from complex images

주요 기능

01Advanced visual reasoning and semantic scene understanding with VLMs

02Text-guided and zero-shot segmentation using SAM 3 and SAM 3D

03Optimization for edge devices via ONNX, TensorRT, and NPU-specific pipelines

04End-to-end real-time object detection with NMS-free YOLO26 architectures

0536,229 GitHub stars

06High-precision monocular depth estimation and 3D scene reconstruction

사용 사례

01Developing high-speed industrial inspection systems with real-time detection and precise mask refinement

02Building spatial awareness and SLAM capabilities for autonomous robotics or drones

03Creating visual question-answering systems that extract structured data from complex images