Acerca de
This skill provides expert-level guidance for deploying and optimizing local Large Language Models with a focus on privacy and performance. It enables seamless integration of tools like llama.cpp and Ollama into applications, offering specialized patterns for model quantization, memory management, and secure API design. By emphasizing a security-first approach, it helps developers prevent vulnerabilities such as prompt injection and denial of service while ensuring low-latency responses through streaming and efficient context management.