01Custom PromQL query execution for advanced cluster health metrics
025 GitHub stars
03Real-time vLLM performance analysis (latency, throughput, error rates)
04Cluster-wide GPU inventory and utilization monitoring
05Distributed tracing for inference requests via Tempo integration
06Cross-domain signal correlation across logs, metrics, and traces