What type of logs can I retrieve using this skill?

You can access client logs, server node logs, Slurm job logs, and proxy logs directly from the MLflow artifact store.

What is the difference between an invocation_id and a run_id?

An invocation_id identifies a single launcher execution and is stored as a tag/param. One invocation can produce multiple MLflow runs (one per task), whereas a run_id is specific to a single MLflow entry.

Can I access failed or pending runs through this skill?

No, only successful evaluation runs are exported to MLflow and accessible through this integration.

Does this require a separate server installation?

Yes, it requires the mlflow-mcp server to be installed via uvx and configured in your Claude Code or Cursor settings.

How should I perform complex metric comparisons?

For exact mathematical results, it is recommended to have Claude fetch the data via MCP and then execute a Python script using pandas to compute deltas and averages.

MLflow Evaluation Access

Name: MLflow Evaluation Access
Author: NVIDIA-NeMo

byNVIDIA-NeMo

•

273

•

Ciencia de Datos y ML

Queries and analyzes AI model evaluation results stored in MLflow through natural language.

This skill empowers Claude to interact directly with MLflow tracking servers, specifically optimized for the NVIDIA NeMo Evaluator workflow. It enables developers to search for experiment runs using invocation IDs, compare performance metrics across various models, and drill down into specific artifacts like configuration files, evaluation logs, and runtime statistics. By leveraging the MLflow MCP server, it transforms raw evaluation data into actionable insights without leaving the terminal or coding environment, streamlining the post-evaluation analysis phase of the machine learning lifecycle.

Características Principales

01Search runs by unique hex invocation IDs and custom tags

02Access detailed logs from clients, servers, and Slurm jobs for debugging

03273 GitHub stars

04Compare model performance metrics across different experiment sets

05Natural language querying of MLflow tracking servers via MCP

06Retrieve and inspect artifacts including YAML configs and JSON metrics

Casos de Uso

01Analyzing benchmark results across multiple LLM model checkpoints

02Fetching specific configuration files from historical successful runs for reproducibility

03Investigating evaluation performance by inspecting runtime memory and latency stats

Características Principales

01Search runs by unique hex invocation IDs and custom tags

02Access detailed logs from clients, servers, and Slurm jobs for debugging

03273 GitHub stars

04Compare model performance metrics across different experiment sets

05Natural language querying of MLflow tracking servers via MCP

06Retrieve and inspect artifacts including YAML configs and JSON metrics

Casos de Uso

01Analyzing benchmark results across multiple LLM model checkpoints

02Fetching specific configuration files from historical successful runs for reproducibility

03Investigating evaluation performance by inspecting runtime memory and latency stats