What is SAM3 and how does this skill help?

SAM3 is a foundation model from Meta for image and video segmentation. This skill provides Claude with the specific API patterns and best practices needed to implement it quickly and correctly.

Does this skill support video object tracking?

Yes, it includes specific guidance for the SAM3 video predictor, including multi-GPU configurations and tracker-backbone integration.

Can I use SAM1-style point prompts with this skill?

Absolutely. It details the 'predict_inst' API and the 'enable_inst_interactivity' configuration required for interactive click-based segmentation.

How does it handle batched inference?

The skill provides patterns for using the DataPoint API, which allows for processing multiple images and complex queries in a single forward pass for maximum efficiency.

SAM3 Image Segmentation API

Name: SAM3 Image Segmentation API
Author: tordks

bytordks

0•

データサイエンスとML

Provides standardized API patterns and implementation guidance for Meta's Segment Anything Model 3 (SAM3) across image and video tasks.

The SAM3 Image Segmentation skill enables Claude to provide expert-level guidance for implementing Segment Anything Model 3, a unified foundation model for promptable segmentation. It covers essential patterns for text prompts (open-vocabulary), geometric bounding boxes, and SAM1-style interactive point prompts. Developers can leverage this skill to implement complex computer vision workflows, including multi-GPU video tracking, efficient batched inference using the DataPoint API, and performance optimizations for modern GPU architectures.

主な機能

01Advanced video tracking implementation for consistent object identification across frames

02Efficient batched inference patterns using DataPoint and FindQueryLoaded structures

030 GitHub stars

04Interactive mask refinement techniques using logit-based feedback loops

05Comprehensive patterns for text, box, and point-based promptable segmentation

06GPU optimization guidance including bfloat16 and TensorFloat32 configurations

ユースケース

01Building interactive annotation tools for medical, satellite, or consumer imagery

02Implementing zero-shot object detection and segmentation using natural language

03Developing high-performance video processing pipelines for automated tracking and masking

主な機能

01Advanced video tracking implementation for consistent object identification across frames

02Efficient batched inference patterns using DataPoint and FindQueryLoaded structures

030 GitHub stars

04Interactive mask refinement techniques using logit-based feedback loops

05Comprehensive patterns for text, box, and point-based promptable segmentation

06GPU optimization guidance including bfloat16 and TensorFloat32 configurations

ユースケース

01Building interactive annotation tools for medical, satellite, or consumer imagery

02Implementing zero-shot object detection and segmentation using natural language

03Developing high-performance video processing pipelines for automated tracking and masking