01Linear O(n) complexity for processing million-token sequences
02Hardware-aware design utilizing optimized CUDA kernels
03Support for Mamba-1 and Mamba-2 multi-head architectures
04HuggingFace integration for loading and fine-tuning pretrained models
05384 GitHub stars
06Inference optimization with no KV cache requirement