TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
© 2026 TacoSkill LAB
AboutPrivacyTerms
  1. Home
  2. /
  3. SkillHub
  4. /
  5. nemo-curator
Improve

nemo-curator

7.6

by zechenzhangAGI

51Favorites
166Upvotes
0Downvotes

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

data curation

7.6

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill documentation for GPU-accelerated LLM data curation. The description is comprehensive and actionable for CLI agents, clearly specifying use cases (web scrapes, deduplication, multi-modal datasets) and when to use alternatives. Task knowledge is outstanding with complete code examples for all major operations (quality filtering, deduplication variants, PII redaction, multi-modal processing), performance benchmarks, and cost analysis. Structure is well-organized with clear sections, quick start, common patterns, and references to external guides. The skill addresses a genuinely complex domain (GPU-accelerated data curation at TB scale) that would require significant token usage and expertise for a CLI agent to replicate, making it highly novel. The 16× speedup claims, concrete benchmarks, and production use cases (Nemotron-4) establish strong value. Minor improvement possible in creating a more explicit function/module index given the breadth of capabilities, but the current logical flow is still very clear.

LLM Signals

Description coverage10
Task knowledge10
Structure9
Novelty9

GitHub Signals

891
74
19
2
Last commit 0 days ago

Publisher

zechenzhangAGI

zechenzhangAGI

Skill Author

Related Skills

rag-architectprompt-engineerfine-tuning-expert

Loading SKILL.md…

Try onlineView on GitHub

Publisher

zechenzhangAGI avatar
zechenzhangAGI

Skill Author

Related Skills

rag-architect

Jeffallan

7.0

prompt-engineer

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4

mcp-developer

Jeffallan

6.4
Try online