Fall 2025: CMSC848R Selected Topics in Information Processing; Language Model Interpretability

Course website coming soon.

This course focuses on state-of-the-art methods for interpreting language models and understanding their learned behaviors. We will discuss approaches centered on both understanding models’ internal mechanisms/representations and attributing behaviors back to the training data. We will focus on model tendencies including hallucination, factuality, memorization, and explanation/reasoning elicitation. If time allows, we will discuss recent developments in ameliorating learned behaviors, such as model editing, unlearning, and steering. This is primarily a seminar course focused on paper readings and presentations.

I am a UMD undergrad. Can I have permission to enroll in your seminar?

The course is primarily designed for graduate students doing ML/NLP research. It will revolve around student presentations and group discussion of cutting-edge research papers, and thus requires a deep familiarity with Transformer architectures and the current standard NLP training and inference methods. The undergraduate prerequisites are CMSC470 Intro to NLP and one or more of {CMSC421 Intro to AI, CMSC422 Intro to ML, CMSC472 Intro to Deep Learning} or equivalent courses. Alternatively, if you have done ML or NLP research with a professor in the department, that can qualify in lieu of the courses. If this applies to you, please email me with your transcript and a brief description of any relevant research experience.