TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. tensorrt-llm
Improve

tensorrt-llm

8.1

by davila7

75Favorites
345Upvotes
0Downvotes

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

inference optimization

8.1

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill for TensorRT-LLM optimization with clear description, comprehensive task knowledge including installation, inference patterns, and serving examples. Well-structured with concise main file and referenced guides for advanced topics. Description accurately captures when to use this vs alternatives (vLLM, llama.cpp). Covers key features (quantization, multi-GPU, batching) with working code examples. Novelty score is moderate because while TensorRT-LLM setup requires expertise, many deployment scenarios can be handled by simpler tools or direct CLI usage; the skill adds value primarily for complex multi-GPU and quantization workflows that would otherwise require extensive documentation review.

LLM Signals

Description coverage9
Task knowledge9
Structure9
Novelty6

GitHub Signals

18,073
1,635
132
71
Last commit 0 days ago

Publisher

davila7

davila7

Skill Author

Related Skills

rag-architectprompt-engineerfine-tuning-expert

Loading SKILL.md…

Try onlineView on GitHub

Publisher

davila7 avatar
davila7

Skill Author

Related Skills

rag-architect

Jeffallan

7.0

prompt-engineer

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4

mcp-developer

Jeffallan

6.4
Try online