SPD

SPD

Single-pass detection classifier for jailbreaking inputs in large language models.

Single-pass jailbreak detection classifier for LLMs extending JailbreakBench. Trained on 4,000+ samples from GCG, AutoDAN, and PAIR attack methods. TMLR 2025 paper. Provides training and evaluation scripts and a unified detection interface.

2026 ProposalAI SafetyAttackLarge Language Model
Key facts
Maturity
Support
C4DT
Inactive
Lab
Unknown
  • Technical

Laboratory for Information and Inference Systems

Laboratory for Information and Inference Systems
Volkan Cevher

Prof. Volkan Cevher

At LIONS, we are concerned with optimized information extraction from signals or data volumes. We therefore develop mathematical theory and computational methods for information recovery from highly incomplete data.

This page was last edited on 2026-03-03.