Megatron-LLM

Large language model training library

Martin Jaggi — Machine Learning and Optimization Laboratory

Megatron-LLM is a software library that allows researchers and developers to train and fine-tune large language models, which are powerful AI systems that can understand and generate human-like text. It supports various model architectures and enables training on regular hardware by distributing the workload across multiple machines. The library offers advanced features to improve model performance and integrates with popular tools for tracking training progress and sharing models.

Megatron-LLM enables pre-training and fine-tuning of large language models (LLMs) at scale. It supports architectures like Llama, Llama 2, Code Llama, Falcon, and Mistral. The library allows training of large models (up to 70B parameters) on commodity hardware using tensor, pipeline, and data parallelism. It provides features like grouped-query attention, rotary position embeddings, BF16/FP16 training, and integration with Hugging Face and WandB.

active — entered showcase: 2024-04-12 — entry updated: 2024-04-12

Lab Github - last commit: 2023-12-03

This project has not yet been evaluated by the C4DT Factory team. We will be happy to evaluate it upon request.

Application

Python

various