Acerca de
The Lemonade SDK facilitates serving, benchmarking, and deploying large language models (LLMs) on diverse hardware like CPU, GPU, and NPU. It comprises a server interface using the OpenAI API for local LLM integration, a Python API for seamless LLM integration into Python applications, and a CLI tool for LLM experimentation. The CLI offers prompting, accuracy measurement, benchmarking (time-to-first-token and tokens per second), and memory usage profiling capabilities.