Sarah Wiegreffe – Assistant Professor Department of Computer Science University of Maryland last name: /ˈviɡɹɛfə/ ("VEE-gref-uh")

Welcome to my personal site! I am an academic researcher and assistant professor of computer science (CS) working on the interpretability and transparency of language models (LMs) and other neural networks, with the goal of increasing their reliability, safety, and performance. I also focus on providing natural language explanations for users of LMs.

Previously, I was a postdoctoral researcher at the Allen Institute for AI (Ai2) and the University of Washington, working with Ashish Sabharwal and Hannaneh Hajishirzi. Before that, I received my M.S. and Ph.D. degrees in CS from Georgia Tech, where I was advised by Mark Riedl.

I am recruiting 2 PhD students to start fall 2026. UMD CS’ application deadline is December 5, 2025. Please see my “Contact” page for more info, including topics of interest.

Short Bio for Talks

Recent Updates

Fall 2025: Invited talk at the Interplay workshop at COLM. See you in Montreal!
Fall 2025: Starting as an assistant professor in the CS department at the University of Maryland. Go terps!
Summer 2025: Congrats to Alec Bunn on having a paper accepted to the ACL 2025 GEM workshop.
Summer 2025: Invited talk at University of British Columbia.
Summer 2025: Our Mechanistic Interpretability Benchmark (MIB) was accepted to ICML 2025. Check out the website for all the benchmark resources. I am also organizing the Actionable Interpretability Workshop at ICML. See you in Vancouver!
Spring 2025: Two papers accepted to ICLR 2025, the former as a spotlight.
Winter 2024: Selected as a Rising Star in Machine Learning.
Winter 2024: Recognized as an outstanding area chair at EMNLP 2024.
Winter 2024: Paper proposing a taxonomy for model noncompliance accepted at NeurIPS 2024 Datasets and Benchmarks.
Winter 2024: Attending EMNLP in Miami. Checkout our 2 Findings papers and our position paper on mechanistic interpretability at the BlackBoxNLP workshop. Slides from my talk at BlackBoxNLP: here.
Fall 2024: Attending COLM in Philly.
Fall 2024: Selected as a Rising Star in Generative AI.
Summer 2024: Check out our NAACL 2024 tutorial on Explanation in the Era of Large Language Models. See you in Mexico City!
Spring 2024: One paper, The Unreasonable Effectiveness of Easy Training Data for Hard Tasks, will appear at ACL 2024.
Spring 2024: Gave a guest lecture in the graduate-level LLMs course at Washington University in St. Louis.
Winter 2023: Recording of Sarthak Jain and I's invited talk ("Is Attention = Explanation? Past, Present, and Future") at the Big Picture workshop is available here (from 1:57).
Winter 2023: Gave a talk to the Washington State Senate's Environment, Energy, and Technology Committee on "What is AI?".
Winter 2023: Recognized as a top reviewer at NeurIPS 2023.
Winter 2023: Two papers at EMNLP 2023, and giving an invited talk at the Big Picture workshop with Sarthak Jain. See you in Singapore!
Winter 2023: Selected as a Rising Star in EECS.
Winter 2023: Self-Refine published at NeurIPS.
Fall 2023: Talks at UC Irvine, UCSD, and USC.
Summer 2023: Slides and recording (from 7:45) of my invited talk at the ACL Natural Language Reasoning and Structured Explanations workshop are available.
Summer 2023: Awarded an Outstanding Area Chair award (top 1.5% of reviewers and chairs) at ACL 2023.
Spring 2023: Quoted in this article about language model interpretability.
Fall 2022: Talks at various NLP groups at the University of Washington (Tsvetshop, H2Lab, and Treehouse).
Fall 2022: Two Findings papers at EMNLP 2022.
Fall 2022: Co-organizing the BlackBoxNLP workshop at EMNLP 2022. See you in Abu Dhabi!