Scaling Interpretability for LLM Agents | Gabriele Sarti
Home
About me
Publications
Blog
Talks
Projects
Activities
CV
Academic CV
Short CV
Tools
Inseq
LangLearn
Scholar Monitor
Communities
AI2S
AISIG
Scaling Interpretability for LLM Agents
Gabriele Sarti
Natural Language Processing
,
Academic
Code
Slides
Date
May 14, 2026
Event
Seminar at the Algorithmic Alignment Group, MIT
Location
MIT CSAIL
Boston, MA, USA
Natural Language Processing
Interpretability
Sequence-to-sequence
Language Modeling
Feature Attribution
Retrieval-augmented Generation
NDIF
Mechanistic Interpretability
Agents
Goal-directedness
Related
Scaling Interpretability for LLM Agents
Attribution: Tracing Influence to Inputs and Model Components
Interpretability for Language Models: Current Trends and Applications
Interpretability for Language Models: Current Trends and Applications
Interpreting Context Usage in Generative Language Models
Cite
×