Acerca de
This skill provides specialized patterns for managing multi-GPU parallel processing, specifically optimized for high-performance computing (HPC) environments using CuPy and Jupyter notebooks. It addresses critical issues like GPU thread explosion, nested executor crashes, and Jupyter's suppression of worker thread output by enforcing a strict 'one task per GPU' rule through dynamic queue-based allocation. By integrating automated memory cleanup and main-thread progress reporting, it ensures stable execution of complex data pipelines without the common pitfalls of cumulative memory pressure or silent kernel failures.