On localizing and deleting toxic memories in large language models


Conference paper


Anubrata Das, Manoj Kumar, Ninareh Mehrabi, Anil Ramakrishna, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Morteza Ziyadi, Rahul Gupta
NAACL Findings, 2025

Cite

Cite

APA   Click to copy
Das, A., Kumar, M., Mehrabi, N., Ramakrishna, A., Rumshisky, A., Chang, K.-W., … Gupta, R. (2025). On localizing and deleting toxic memories in large language models. In NAACL Findings.


Chicago/Turabian   Click to copy
Das, Anubrata, Manoj Kumar, Ninareh Mehrabi, Anil Ramakrishna, Anna Rumshisky, Kai-Wei Chang, Aram Galstyan, Morteza Ziyadi, and Rahul Gupta. “On Localizing and Deleting Toxic Memories in Large Language Models.” In NAACL Findings, 2025.


MLA   Click to copy
Das, Anubrata, et al. “On Localizing and Deleting Toxic Memories in Large Language Models.” NAACL Findings, 2025.


BibTeX   Click to copy

@inproceedings{das2025a,
  title = {On localizing and deleting toxic memories in large language models},
  year = {2025},
  author = {Das, Anubrata and Kumar, Manoj and Mehrabi, Ninareh and Ramakrishna, Anil and Rumshisky, Anna and Chang, Kai-Wei and Galstyan, Aram and Ziyadi, Morteza and Gupta, Rahul},
  booktitle = {NAACL Findings}
}