Can I use this for real-time web APIs?

Yes, Modal supports creating web endpoints via FastAPI, ASGI, or WSGI with features like dynamic batching and sub-second cold starts for production-grade performance.

Do I need to write YAML or Dockerfiles to use this skill?

No, Modal is Python-native, meaning you define your environment, dependencies, and hardware requirements directly in your Python code.

Which GPUs are available through this Modal skill?

It supports a wide range of NVIDIA GPUs including T4, L4, A10G, L40S, A100 (40GB/80GB), H100, H200, and the latest Blackwell (B200) architecture.

Modal is a serverless cloud platform that allows developers to run code in the cloud with instant access to GPUs, without managing virtual machines or Kubernetes.

Does it support persistent storage?

Yes, the skill includes patterns for using Modal Volumes to persist model weights, datasets, and cache files across different function executions.

Modal Serverless GPU

Name: Modal Serverless GPU
Author: Orchestra-Research

byOrchestra-Research

•

3,983

•

云基础设施

Deploys and scales machine learning workloads on high-performance serverless GPUs using a Python-native framework.

This skill provides a comprehensive framework for orchestrating ML research and production workloads on the Modal platform, eliminating the need for complex infrastructure management or YAML configurations. It enables developers to define GPU-accelerated functions, persistent storage volumes, and auto-scaling web endpoints directly in Python, making it ideal for tasks ranging from rapid model prototyping and batch inference to deploying high-performance AI APIs with sub-second cold starts and pay-per-second pricing.

主要功能

013,983 GitHub stars

02Fast deployment of ML models as REST APIs using FastAPI or ASGI

03Automatic scaling from zero to hundreds of concurrent GPU instances

04Python-native infrastructure definition without YAML or Dockerfiles

05Built-in support for persistent Volumes and secure Secret management

06On-demand access to a wide range of GPUs including H100, A100, and L40S

使用场景

01Running massive batch processing jobs for data augmentation or model inference

02Deploying large language models (LLMs) as auto-scaling production APIs

03Scheduling periodic ML training or fine-tuning tasks using serverless cron jobs

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add orchestra-research/ai-research-skills modal

For use in Claude.ai and ChatGPT

Download Skill