Name:
Entity Insertion in Wikipedia
Description:
Multilingual entity insertion in Wikipedia articles
Professor — Lab:
Robert WestData Science Lab

Layman description:
Automatically adding relevant links to entities in Wikipedia articles across different languages is a challenging task. This project provides a solution by processing data from Wikipedia dumps and training machine learning models. The data processing extracts information like articles, links, and mentions from the dumps. The modeling code trains models to rank candidate text spans for inserting an entity link. The models are evaluated against various baselines like keyword matching and language models. This helps in improving the quality and consistency of Wikipedia by suggesting relevant entity links across multiple languages.
Technical description:
Proposes a framework for inserting entities into Wikipedia articles across multiple languages. It processes Wikipedia dumps to extract data and train models for entity insertion. The key components are: 1) Data processing pipeline to extract relevant data from Wikipedia dumps. 2) Modeling code for training entity insertion models using a ranking loss or pointwise loss. 3) Benchmarking code to evaluate models against baselines like BM25, EntQA, and GPT language models.
Project status:
active — entered showcase: 2024-04-16 — entry updated: 2024-04-16

Source code:
Lab Github - last commit: 2024-04-15
Code quality:
This project has not yet been evaluated by the C4DT Factory team. We will be happy to evaluate it upon request.
Project type:
Experiments
Programming language:
Jupyter Notebook