Benchmarking suite for comparing optimizers (Adam, SGD, Muon, Scion, etc.) on LLM pretraining using Llama and MoE architectures. Supports configurable training parameters, WandB logging, multi-GPU and CPU setups, and guidelines for extending to new models or datasets.
This page was last edited on 2026-03-03.
This page was last edited on 2026-03-03.