Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.
7.0
Rating
0
Installs
Machine Learning
Category
Excellent skill for LLM model pruning with comprehensive coverage of modern techniques (Wanda, SparseGPT, N:M sparsity). The description clearly explains when to use pruning and what outcomes to expect. Task knowledge is strong with complete, runnable code examples for all major methods, calibration strategies, and production pipelines. Structure is logical with quick start, core concepts, strategies, and best practices sections. The skill is moderately novel—while pruning concepts are established, implementing one-shot methods like Wanda/SparseGPT with proper calibration data handling and hardware-aware N:M patterns would require significant effort and domain knowledge for a CLI agent. Minor improvement opportunities: could benefit from more explicit error handling and the wanda.md reference file content integration.
Loading SKILL.md…