What metrics can I analyze for vLLM models?

You can analyze p50/p95/p99 latency, requests per second, input/output token rates, error rates, and queue depth.

Can I use this skill to deploy or modify AI models?

No, this is a query-driven, read-only analysis skill. To deploy models, you should use the /model-deploy skill.

How does it help with troubleshooting slow requests?

It integrates with Tempo for distributed tracing, allowing you to view span waterfalls and identify exactly which service or operation is causing delays.

Does this skill work on any Kubernetes cluster?

This skill is specifically optimized for Red Hat OpenShift AI and requires the AI Observability MCP server to be deployed on the cluster.

AI Observability for OpenShift

Name: AI Observability for OpenShift
Author: RHEcosystemAppEng

byRHEcosystemAppEng

•

Analytics & Monitoring

Analyzes AI model inference performance, GPU utilization, and Red Hat OpenShift cluster health through query-driven diagnostics.

This skill provides a comprehensive monitoring suite for AI workloads running on Red Hat OpenShift AI. It enables developers to perform read-only analysis of vLLM performance metrics such as latency, throughput, and token rates, while also providing deep visibility into GPU inventory and power usage. By integrating tools like Tempo for distributed tracing and Korrel8r for cross-domain signal correlation, it allows users to diagnose slow inference requests and correlate errors across the entire infrastructure stack using natural language commands within Claude Code.

Key Features

01Custom PromQL query execution for advanced cluster health metrics

025 GitHub stars

03Real-time vLLM performance analysis (latency, throughput, error rates)

04Cluster-wide GPU inventory and utilization monitoring

05Distributed tracing for inference requests via Tempo integration

06Cross-domain signal correlation across logs, metrics, and traces

Use Cases

01Correlating inference errors with underlying OpenShift cluster health events

02Identifying performance bottlenecks in model inference latency

03Monitoring GPU resource availability and power usage across nodes

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add rhecosystemappeng/agentic-collections ai-observability

For use in Claude.ai and ChatGPT

Key Features

01Custom PromQL query execution for advanced cluster health metrics

025 GitHub stars

03Real-time vLLM performance analysis (latency, throughput, error rates)

04Cluster-wide GPU inventory and utilization monitoring

05Distributed tracing for inference requests via Tempo integration

06Cross-domain signal correlation across logs, metrics, and traces

Use Cases

01Correlating inference errors with underlying OpenShift cluster health events

02Identifying performance bottlenecks in model inference latency

03Monitoring GPU resource availability and power usage across nodes

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add rhecosystemappeng/agentic-collections ai-observability

For use in Claude.ai and ChatGPT