关于
This skill provides standardized implementations and workflows for integrating Flash Attention (v1, v2, and v3) into Transformer-based architectures. It enables AI researchers and engineers to achieve up to 4x speed improvements and 20x memory reduction by utilizing IO-aware tiling and recomputation. The skill covers PyTorch native Scaled Dot Product Attention (SDPA), the standalone flash-attn library, and specialized H100 FP8 optimizations, making it essential for projects involving long-context sequences or GPU memory constraints.