Implements Amazon SageMaker asynchronous inference patterns for long-running workloads and large payloads using S3-based I/O.
This skill provides comprehensive guidance and production-ready patterns for implementing SageMaker Asynchronous Inference, a critical architecture for workloads that exceed standard real-time limits. It covers the entire implementation lifecycle, including CDK infrastructure setup with scale-to-zero capabilities, TypeScript client implementation for S3-based polling, and robust Lambda integration. This is an essential tool for developers building generative AI applications, high-resolution image processing, or data-intensive models where processing times exceed 60 seconds or payloads are larger than 6MB.
Key Features
01Asynchronous polling and SNS notification patterns for result retrieval
02Scale-to-zero auto-scaling configuration for maximum cost efficiency
03Robust error handling and exponential backoff retry logic
04S3-integrated I/O handling for payloads exceeding 6MB
05117 GitHub stars
06End-to-end CDK infrastructure templates for endpoint deployment
Use Cases
01Processing high-resolution media or large documents that exceed standard API timeout limits
02Executing long-running generative AI tasks with response times between 1 and 15 minutes
03Building cost-optimized ML pipelines that scale down to zero instances during idle periods