01384 GitHub stars
02Hardware-accelerated inference for Apple Silicon (Metal), AMD (ROCm), and Intel GPUs
03Minimal dependency footprint with pure C/C++ implementation
04Advanced GGUF quantization support (1.5-bit to 8-bit) for reduced memory usage
05OpenAI-compatible server mode for seamless API integration
06Support for a wide range of models including Llama 3, Mistral, Mixtral, and Phi-3