01Multimodal instruction tuning for conversational image analysis
02Visual Question Answering (VQA) and detailed image captioning
03Quantization support (4-bit and 8-bit) for reduced VRAM usage
04Seamless integration with CLIP and Vicuna/LLaMA architectures
053,983 GitHub stars
06Support for multi-turn image-based dialogue and context retention