GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.
7.6
Rating
0
Installs
AI & LLM
Category
Excellent skill documentation for GGUF quantization. The description clearly explains when to use this skill versus alternatives, making it easy for a CLI agent to decide when to invoke it. Task knowledge is comprehensive with complete workflows, code examples, and command sequences for conversion, quantization, and deployment across different hardware platforms. Structure is very clear with logical sections, tables for quick reference, and appropriate use of reference files for advanced topics. The skill provides significant value by consolidating complex llama.cpp workflows that would otherwise require extensive token usage and trial-and-error. Minor room for improvement in making the decision criteria even more explicit for agent invocation.
Loading SKILL.md…