About
This skill provides specialized guidance for the end-to-end LLM training lifecycle, ranging from simple LoRA finetuning to massive-scale distributed training. It helps developers navigate complex frameworks like DeepSpeed, Accelerate, and TRL while implementing critical memory-saving optimizations such as QLoRA, Flash Attention, and gradient checkpointing. Whether you are performing instruction tuning with SFT or aligning models via DPO, this skill ensures you select the right architecture and configuration for your specific hardware and model size.