Name:
LAMEN
Description:
Evaluating language models through negotiation tasks
Professor — Lab:
Robert WestData Science Lab

Home page:
LAMEN
Layman description:
This project introduces a new way to test the decision-making abilities of AI language models by having them engage in negotiations with each other. By designing various negotiation scenarios, such as dividing pizza slices or deciding on the amount of cheese, the researchers can evaluate how well the AI models perform in terms of reaching agreements, maximizing their own benefits, and cooperating when necessary. The study also examines how faithfully the AI models follow their own reasoning and instructions, providing insights into their reliability and alignment with human values.
Technical description:
The project proposes using structured negotiations as a dynamic benchmark for evaluating language model (LM) agents. The negotiation framework consists of a game setting, issues to negotiate, and optional preference weights, allowing for the design of complex games by increasing the number of issues, mixing issue types, and adding non-uniform preferences. The benchmark setup jointly evaluates performance metrics (utility and completion rate) and alignment metrics (faithfulness and instruction-following) in self-play and cross-play settings.
Project status:
active — entered showcase: 2024-05-03 — entry updated: 2024-05-03

Source code:
Lab Github - last commit: 2024-02-04
Code quality:
This project has not yet been evaluated by the C4DT Factory team. We will be happy to evaluate it upon request.
Project type:
Toolset
Programming language:
Python