Guides the implementation of production-grade server management, process monitoring, and scaling strategies for high-availability environments.
This skill equips Claude with advanced operational logic for managing production environments, moving beyond simple command execution to strategic infrastructure decision-making. It provides frameworks for process management using tools like PM2 and systemd, structured logging, and observability. By focusing on core principles like zero-downtime reloads, security auditing, and intelligent scaling, this skill helps developers build and maintain stable server architectures that prioritize auto-recovery and resource efficiency. It acts as a Site Reliability Engineering (SRE) companion that teaches best practices rather than just memorizing shell commands.