01Implementation of Rotary Position Embeddings (RoPE) and YaRN scaling
02Position Interpolation techniques for extending LLaMA-style models
03Attention with Linear Biases (ALiBi) for zero-shot length extrapolation
04Integration patterns for HuggingFace Transformers and custom Torch modules
053,983 GitHub stars
06Optimized training strategies for long-context window fine-tuning