DINO-X: Visual Perception API for LLMs | Object Detection & Captioning