Lemonade SDK
Serve, benchmark, and deploy large language models (LLMs) on various hardware platforms.
소개
The Lemonade SDK facilitates serving, benchmarking, and deploying large language models (LLMs) on diverse hardware like CPU, GPU, and NPU. It comprises a server interface using the OpenAI API for local LLM integration, a Python API for seamless LLM integration into Python applications, and a CLI tool for LLM experimentation. The CLI offers prompting, accuracy measurement, benchmarking (time-to-first-token and tokens per second), and memory usage profiling capabilities.
주요 기능
- OpenAI API compatible server for local LLMs
- Python API for easy integration into Python applications
- CLI for LLM prompting, accuracy measurement, benchmarking, and profiling
- NPU acceleration
- Supports PyTorch, ONNX, and GGUF frameworks
- 36 GitHub stars
사용 사례
- Benchmarking LLM performance on different hardware
- Deploying LLMs locally for privacy and cost efficiency
- Integrating LLMs into Python applications using a high-level API