Inactive

TiME

Training tiny monolingual language models via knowledge distillation from larger models.

Knowledge distillation pipeline for training tiny monolingual encoders from XLM-R or HPLT teacher models. Supports downstream evaluation for POS tagging, lemmatization, dependency parsing, and NER. Includes checkpoint selection scripts, distillation pipelines, and evaluation tooling.

2026 ProposalDeep Neural NetworksNatural Language

Maturity

Support

C4DT

Lab

Maturity

Support

C4DT

Lab

Technical

Source code: Lab Github
Last commit: 2025-10-18

Data Science Lab

Prof. Robert West

Our research aims to make sense of large amounts of data. Frequently, the data we analyze is collected on the Web, e.g., using server logs, social media, wikis, online news, online games, etc. We distill heaps of raw data into meaningful insights by developing and applying algorithms and techniques in areas including social and information network analysis, machine learning, computational social science, data mining, natural language processing, and human computation.

Go back

This page was last edited on 2026-03-03.

Go back

This page was last edited on 2026-03-03.