소개
The CUDA Optimization skill provides specialized expertise for auditing and enhancing GPU-accelerated code within the Claude Code environment. It automatically analyzes CUDA kernels to detect uncoalesced memory accesses, warp divergence, and suboptimal thread configurations while suggesting architecture-specific improvements for modern NVIDIA generations like Ampere, Hopper, and Ada Lovelace. By integrating deep profiling knowledge, it helps developers transition from functional GPU code to highly efficient, production-grade parallel algorithms through actionable code transformations and bottleneck identification.