01Tree-based attention mechanisms for parallel candidate verification
02Medusa multi-head architecture integration for up to 3.6x throughput
03Lossless inference acceleration compatible with standard transformers and vLLM
04Speculative decoding with draft models for 2x speedups
05Jacobi iteration-based lookahead decoding for zero-training optimization
06384 GitHub stars