Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.
8.1
Rating
0
Installs
Machine Learning
Category
Excellent skill covering model pruning techniques with clear, actionable guidance. The description accurately reflects comprehensive capabilities including Wanda, SparseGPT, and N:M sparsity methods. Task knowledge is strong with working code examples, complete implementation patterns, and production-ready pipelines. Structure is well-organized with logical flow from quick start to advanced strategies, though some sections are verbose. Novelty is moderate-to-good: while pruning concepts are established, implementing one-shot methods like Wanda and SparseGPT with proper calibration is non-trivial for a CLI agent and would require significant token expenditure to derive independently. The skill meaningfully reduces complexity by packaging activation-aware pruning logic, layer-wise strategies, and hardware-optimized N:M patterns. Performance benchmarks and best practices add substantial practical value. Minor improvements could include more concise core sections and clearer delineation between beginner and advanced content.
Loading SKILL.md…