Name:
crow
Description:
Benchmarking Commonsense Reasoning in Real-World Tasks
Professor — Lab:
Antoine BosselutNatural Language Processing Lab

Home page:
crow
Layman description:
CRoW is a tool that tests how well computer models can use common sense when performing six different language-related tasks. It does this by taking examples from existing datasets and changing them in ways that violate common sense. The results show that these computer models are still far from matching human performance in using common sense in real-world tasks.
Technical description:
CRoW is a manually-curated, multi-task benchmark that evaluates the ability of models to apply commonsense reasoning in the context of six real-world NLP tasks. It is constructed using a multi-stage data collection pipeline that rewrites examples from existing datasets using commonsense-violating perturbations. The study reveals a significant performance gap when NLP systems are evaluated on CRoW compared to humans, indicating that commonsense reasoning is far from being solved in real-world task settings.
Project status:
active — entered showcase: 2024-02-20 — entry updated: 2024-02-20

Source code:
Personal Github - last commit: 2023-12-14
Code quality:
This project has not yet been evaluated by the C4DT Factory team. We will be happy to evaluate it upon request.
Project type:
Toolset
Programming language:
Python