About
This skill integrates the HuggingFace Accelerate library into your workflow, allowing you to scale PyTorch training scripts from a single CPU/GPU to multi-GPU, TPU, and multi-node clusters with just four lines of code. It abstracts the complexities of device placement, mixed-precision training (FP16, BF16, FP8), and advanced distributed strategies like DeepSpeed ZeRO and FSDP, providing a consistent interface and interactive configuration tools that eliminate the need for manual launcher setup and low-level boilerplate.