NeMo Model Evaluator Launcher FAQs

Question 1

What is required before running a model on a new cluster?

Accepted Answer

For offline configurations, models must be pre-downloaded to the HF cache. The skill includes guidance on how to verify and download models via the Hugging Face CLI on cluster nodes.

Question 2

Can I use this to modify evaluation configurations?

Accepted Answer

No, this skill is focused on execution and monitoring. You should use the nel-assistant skill for creating or modifying evaluation configurations.

Question 3

How does this skill handle failed evaluation runs?

Accepted Answer

It provides specialized commands to inspect client and server logs, SSH into clusters to explore artifact paths, and resume interrupted runs using the original run directories.

Question 4

How do I access evaluation results and artifacts?

Accepted Answer

You can use the 'info' command to discover artifact paths. For remote runs, it provides specific rsync commands to pull metrics like results.yml or eval_factory_metrics.json without downloading massive log files.

Question 5

Does it support Slurm clusters?

Accepted Answer

Yes, it is highly optimized for Slurm environments, handling account management (PPP) and paired jobs for walltime restarts automatically.

NeMo Model Evaluator Launcher

Key Features

Use Cases

NeMo Model Evaluator Launcher

Key Features

Use Cases