Preference alignment framework for LLMs implementing DPO, RLHF, and related techniques. Uses Hydra for configuration management; supports SLURM cluster scheduling, CUDA systems, training and evaluation pipelines, and environment variable management. Pre-commit hooks included for code quality.
This page was last edited on 2026-03-03.
This page was last edited on 2026-03-03.