Do Language Models Share Unsafe Directions in Activation Space?
Mohamad Zbib PRO
AI & ML interests
KAUST - AUB
Recent Activity
published
a dataset
2 days ago
zbeeb/mixeddatasafety7030
updated
a dataset
2 days ago
zbeeb/mixeddatasafety7030
published
a dataset
3 days ago
zbeeb/llama3_MathInstruct_data