Entity Insertion in Wikipedia
Multilingual entity insertion in Wikipedia articles
Automatically adding relevant links to entities in Wikipedia articles across different languages is a challenging task. This project provides a solution by processing data from Wikipedia dumps and training machine learning models. The data processing extracts information like articles, links, and mentions from the dumps. The modeling code trains models to rank candidate text spans for inserting an entity link. The models are evaluated against various baselines like keyword matching and language models. This helps in improving the quality and consistency of Wikipedia by suggesting relevant entity links across multiple languages.
Proposes a framework for inserting entities into Wikipedia articles across multiple languages. It processes Wikipedia dumps to extract data and train models for entity insertion. The key components are: 1) Data processing pipeline to extract relevant data from Wikipedia dumps. 2) Modeling code for training entity insertion models using a ranking loss or pointwise loss. 3) Benchmarking code to evaluate models against baselines like BM25, EntQA, and GPT language models.
active
—
entered showcase: 2024-04-16
—
entry updated: 2024-04-16
This project has not yet been evaluated by the C4DT Factory team.
We will be happy to evaluate it upon request.
Experiments
Jupyter Notebook