Benchmark of ~10,000 real-world JSON schemas for evaluating LLM structured output generation across efficiency, coverage, and quality dimensions. Integrates with Hugging Face datasets; supports custom engine integration and leaderboard submission.
This page was last edited on 2026-03-03.
This page was last edited on 2026-03-03.