HalluHard

HalluHard

Hard multi-turn hallucination benchmark for evaluating language models across domains.

Multi-turn hallucination benchmark evaluating LLMs across diverse domains. Installed via Pixi; scripts provided for response generation, claim-based web-scraping judgment (or coding_direct mode), and report creation. Supports multiple models and CLI configuration. Designed to maximize hallucination elicitation difficulty.

2026 ProposalAI SafetyBenchmarkLarge Language Model
Key facts
Maturity
Support
C4DT
Inactive
Lab
Active
  • Technical

Machine Learning and Optimization Laboratory

Machine Learning and Optimization Laboratory
Martin Jaggi

Prof. Martin Jaggi

The Machine Learning and Optimization Laboratory is interested in machine learning, optimization algorithms and text understanding, as well as several application domains.

This page was last edited on 2026-03-03.