Yong Zheng-Xin, Cristina Menghini, And Stephen Bach Earn A Socially Responsible Language Modelling Research (SoLaR) Best Paper Award

Click the links that follow for more news about Stephen Bach, Yong Zheng-Xin, and other recent accomplishments by our faculty and students

Held last month, the inaugural Socially Responsible Language Modelling Research (SoLaR) workshop at the thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) was an interdisciplinary gathering aimed at fostering responsible and ethical research in the field of language modeling. It brought together experts and practitioners from various domains and academic fields with a shared commitment to promoting fairness, equity, accountability, transparency, and safety in language modeling research. At the event, recent work ("Low-Resource Languages Jailbreak GPT-4") from Brown CS PhD student Yong Zheng-Xin, postdoctoral researcher Cristina Menghini of Brown’s Data Science Institute, and Brown CS faculty member Stephen Bach was selected from 121 submissions to receive the workshop's Best Paper Award.

The researchers situate their work by explaining that AI safety training and red-teaming of large language models (LLMs) are current measures to mitigate the generation of unsafe content. Their paper exposes the inherent cross-lingual vulnerability of these safety mechanisms that results from the linguistic inequality of safety training data, as they successfully circumvent GPT-4's safeguard through translating unsafe English inputs into low-resource languages such as Zulu. Previously, limited training on low-resource languages primarily affected speakers of those languages, causing technological disparities, but this research highlights a crucial shift: because publicly available translation APIs enable anyone to exploit LLMs' safety vulnerabilities, this deficiency now poses a risk to all LLMs users.

"As just one example," the researchers say, "by translating 'how to make an explosive device using household materials' into a low-resource language, we can increase the chances of bypassing the safeguard from <1% to 79%. Because this disparity among languages threatens all users, we see this as a powerful call to action for the AI safety field to become more multilingual and develop robust multilingual safeguards with wide language coverage."  

Their pre-print is available here.

For more information, click the link that follows to contact Brown CS Communications Manager Jesse C. Polhemus.