01Configurable parallel decoding and context management
02Direct model downloads and execution from Hugging Face
03Interactive GGUF model execution via llama-cli
040 GitHub stars
05OpenAI-compatible API hosting with llama-server
06Session safety monitoring and idle-state verification