Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.
8.7
Rating
0
Installs
AI & LLM
Category
Excellent skill documentation for SGLang with comprehensive coverage of structured generation, RadixAttention prefix caching, and agentic workflows. The description clearly communicates when to use SGLang vs alternatives (vLLM, TensorRT-LLM), making it easy for a CLI agent to invoke appropriately. Task knowledge is outstanding with complete code examples for JSON/regex outputs, function calling, multi-turn conversations, and deployment patterns. Structure is very clean with a logical flow from quick start to advanced features, plus well-organized reference files for deep dives. The skill addresses a genuinely novel and complex use case—structured generation with automatic prefix caching provides 5-10× speedups that would be extremely difficult for a CLI agent to replicate manually. Minor room for improvement: could slightly expand on error handling and troubleshooting scenarios, but overall this is production-ready documentation for a high-value inference optimization skill.
Loading SKILL.md…

Skill Author