HPC/AI Cluster Tool
byedwardsp
0Queries Kubernetes to retrieve GPU node information and cluster topologies for Azure HPC/AI clusters.
关于
This minimal Model Context Protocol (MCP) server is specifically engineered for Azure HPC/AI clusters, providing essential tools for Kubernetes introspection. It enables users to query cluster nodes for detailed GPU information, including their status and labels, as well as crucial InfiniBand-related topology data like agentpool, pkey, and torset. Built with `fastmcp` and utilizing synchronous `subprocess` calls for `kubectl`, it offers a straightforward approach to accessing vital cluster insights. A helper script is also included for local, in-process execution and testing of its functionalities.
主要功能
- List GPU nodes with their names, labels, and Ready/NotReady status.
- 0 GitHub stars
- Retrieve InfiniBand-related topology labels (agentpool, pkey, torset) for each node.
- Execute server tools locally for in-process testing and development without an MCP host.
使用案例
- Monitor GPU node status and health within Azure HPC/AI environments.
- Gather InfiniBand topology details for network configuration, optimization, or troubleshooting.
- Develop and test Kubernetes introspection tools for HPC/AI clusters.