Name:
Garfield
Description:
System support for byzantine machine learning
Professor — Lab:
Rachid GuerraouiDistributed Computing Lab

Layman description:
Training ML is done in a distributed fashion these days due to the usage of big models and huge datasets (for scalability reasons). This distribution inevitably leads to a higher probability of failure somewhere in the network. Garfield is a library/tool to ensure the correctness/convergence of training despite the presence of these failures. Garfield can be used to do so with various ML applications and architectures.
Technical description:
Garfield is a library to build Byzantine machine learning (ML) applications on top of popular frameworks such as TensorFlow and PyTorch. We show how to use Garfield to build different architectures for ML applications like single server, multiple workers (SSMW), multiple servers, multiple workers (MSMW), and fully decentralized architecture.
Papers:
Relevant papers:
Project status:
inactive — entered showcase: 2021-01-20 — entry updated: 2024-03-22

Factory Development:
Started in Spring 2021, demo in Autumn.
C4DT Contact:
C4DT team

Source code:
Lab Github - last commit: 2021-09-24
Code quality:
Prototype
Project type:
Library
Programming language:
Python, Cuda, C++
License:
MIT