Name:
distilled counterfactual data
Description:
Automated generation of high-quality counterfactual data.
Professor — Lab:
Antoine BosselutNatural Language Processing Lab

Layman description:
DISCO is a system that creates alternative versions of data, which can help machines learn better. It uses a language model, similar to how autocorrect works, to create these alternatives. When tested, machines trained with these alternatives performed better, especially in understanding and inferring language.
Technical description:
DISCO (DIStilled COunterfactual Data) is a method for automatically generating high-quality counterfactual data at scale. It uses a large general language model to generate phrasal perturbations, which are then filtered by a task-specific teacher model to distill high-quality counterfactual data. The method has been applied to natural language inference tasks, demonstrating improved robustness and generalization across distributions.
Project status:
inactive — entered showcase: 2024-02-20 — entry updated: 2024-02-20

Source code:
Personal Github - last commit: 2023-07-27
Code quality:
This project has not yet been evaluated by the C4DT Factory team. We will be happy to evaluate it upon request.
Project type:
Framework
Programming language:
Python
License:
MIT