Name:
Megatron-LLM
Description:
Large language model training library
Professor — Lab:
Martin JaggiMachine Learning and Optimization Laboratory

Layman description:
Megatron-LLM is a software library that allows researchers and developers to train and fine-tune large language models, which are powerful AI systems that can understand and generate human-like text. It supports various model architectures and enables training on regular hardware by distributing the workload across multiple machines. The library offers advanced features to improve model performance and integrates with popular tools for tracking training progress and sharing models.
Technical description:
Megatron-LLM enables pre-training and fine-tuning of large language models (LLMs) at scale. It supports architectures like Llama, Llama 2, Code Llama, Falcon, and Mistral. The library allows training of large models (up to 70B parameters) on commodity hardware using tensor, pipeline, and data parallelism. It provides features like grouped-query attention, rotary position embeddings, BF16/FP16 training, and integration with Hugging Face and WandB.
Project status:
active — entered showcase: 2024-04-12 — entry updated: 2024-04-12

Source code:
Lab Github - last commit: 2023-12-03
Code quality:
This project has not yet been evaluated by the C4DT Factory team. We will be happy to evaluate it upon request.
Project type:
Application
Programming language:
Python
License:
various