Serve, benchmark, and deploy large language models (LLMs) on various hardware platforms.
The Lemonade SDK facilitates serving, benchmarking, and deploying large language models (LLMs) on diverse hardware like CPU, GPU, and NPU. It comprises a server interface using the OpenAI API for local LLM integration, a Python API for seamless LLM integration into Python applications, and a CLI tool for LLM experimentation. The CLI offers prompting, accuracy measurement, benchmarking (time-to-first-token and tokens per second), and memory usage profiling capabilities.