llama-cpp

8.1

197

310

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

llm-inference

8.1

Rating

Installs

AI & LLM

Quick Review

Excellent skill for CPU/non-NVIDIA LLM inference. The description clearly states when to use llama.cpp vs alternatives (TensorRT-LLM, vLLM), making it easy for a CLI agent to decide. Task knowledge is comprehensive with concrete installation, quantization, and hardware acceleration commands. Structure is good with a well-organized main file and logical reference separation. Novelty is solid: while CLI agents could eventually figure out llama.cpp setup, this skill consolidates platform-specific commands (Metal/CUDA/ROCm), quantization trade-offs, and performance benchmarks that would require significant exploration and token usage to discover independently. Minor deduction on novelty because basic llama.cpp usage is relatively straightforward once discovered.