Sparse Autoencoders | Gabriele Sarti

Sparse Autoencoders

From Insights to Impact: Actionable Interpretability for Neural Machine Translation

This presentation summarizes the main contributions of my PhD thesis, advocating for a user-centric perspective on interpretability research, aiming to translate theoretical advances in model understanding in practical benefits in trustworthiness and transparency for end users of these systems.