TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. llava
Improve

llava

8.1

by davila7

145Favorites
296Upvotes
0Downvotes

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

multimodal

8.1

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill documentation for LLaVA vision-language model. The description clearly explains capabilities (visual instruction tuning, image conversations, VQA) making it easy for a CLI agent to understand when to invoke. Task knowledge is comprehensive with complete code examples for loading models, single/multi-turn conversations, and common use cases. Structure is logical with clear sections and a helpful comparison table of alternatives. The skill provides meaningful value by packaging complex vision-language inference (multi-step setup, conversation management, quantization) that would require significant tokens for an agent to implement from scratch. Minor deductions: novelty is moderate as some simpler vision tasks could be handled by lighter tools; structure could benefit from extracting some code patterns to separate files for very large implementations.

LLM Signals

Description coverage9
Task knowledge9
Structure8
Novelty7

GitHub Signals

18,073
1,635
132
71
Last commit 0 days ago

Publisher

davila7

davila7

Skill Author

Related Skills

rag-architectprompt-engineerfine-tuning-expert

Loading SKILL.md…

Try onlineView on GitHub

Publisher

davila7 avatar
davila7

Skill Author

Related Skills

rag-architect

Jeffallan

7.0

prompt-engineer

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4

mcp-developer

Jeffallan

6.4
Try online