Scaling Interpretability for LLM Agents | Gabriele Sarti

Scaling Interpretability for LLM Agents

Gabriele Sarti

Natural Language Processing, Academic

Code Slides

Date

May 14, 2026

Event

Seminar at the Algorithmic Alignment Group, MIT

Location

MIT CSAIL

Boston, MA, USA

Natural Language Processing Interpretability Sequence-to-sequence Language Modeling Feature Attribution Retrieval-augmented Generation NDIF Mechanistic Interpretability Agents Goal-directedness