Gabriele Sarti

PhD in Natural Language Processing

CLCG, University of Groningen

Welcome to my website! 👋 I am a PhD student in the InCLoW team within the Natural Language Processing group (GroNLP 🐮) at the University of Groningen. I’m also a member of the InDeep consortium, working on user-centric interpretability for generative language models. My supervisors are Arianna Bisazza, Malvina Nissim and Grzegorz Chrupała.

Previously, I was a applied scientist intern at Amazon Translate NYC, a research scientist at Aindo, and a Data Science MSc student at the University of Trieste, where I helped found the AI Student Society.

My research aims to translate theoretical advances in language models interpretability into actionable insights for improving trustworthiness and human-AI collaboration. To this end, I lead the development of open-source interpretability software projects to enable reproducible analyses of model behaviors. I am also excited about the potential of human behavioral signals such as keylogging, gaze and brain recordings to improve the usability and personalization of AI solutions.

Your (anonymous) constructive feedback is always welcome! 🙂

Interests

Generative Language Models
Deep Learning Interpretability
Human-AI Interaction
User Modeling and Personalization

Education

PhD in Natural Language Processing

University of Groningen (NL), 2021 - Ongoing
MSc. in Data Science and Scientific Computing

University of Trieste & SISSA (IT), 2018 - 2020
DEC in Software Management

Cégep de Saint-Hyacinthe (CA), 2015 - 2018

Experience

Applied Scientist Intern

Amazon Web Services (US), 2022
Research Scientist

Aindo (IT), 2020 - 2021
Visiting Research Assistant

ILC-CNR ItaliaNLP Lab (IT), 2019

🗞️ News

Our paper QE4PE: Word-level Quality Estimation for Human Post-Editing was accepted by TACL, and Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement was accepted to EMNLP Main! I will present both at EMNLP in Suzhou, China 🇨🇳
I am co-organizing the BlackboxNLP Workshop at EMNLP 2025! Test your localization methods in our shared task! 🔍
I am visiting the IRT Saint-Exupéry in Toulouse, France, to collaborate on an interpretability project with the DEEL team! 🇫🇷
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation is accepted to EMNLP 2024, and Multi-property Steering of Large Language Models with Dynamic Activation Composition is accepted to BlackboxNLP 2024! See you in Miami! 🌴
PECoRe is accepted to ICLR 2024, and I presented it in Vienna! 🎉 I also co-organized the first Mechanistic Interpretability social at ICLR togehter with Nikhil Prakash, and we had more than 100 attendees!

PhD Thesis

From Insights to Impact: Actionable Interpretability for Neural Machine Translation

PhD Thesis at the University of Groningen

This dissertation aims to bridge the gap between method-centric interpretability research and outcome-centric real-world machine translation applications. We develop novel methods to understand and control language model generation, then study how to integrate these advances effectively into human translation workflows. Our research spans three interconnected macro-themes: understanding how language models exploit contextual information during generation, controlling model generation for personalized translation outputs, and integrating interpretability insights into human translation workflows.

Web Book PDF RUG Page Quarto template

Selected Publications

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

We evaluate unsupervised word-level quality estimation (WQE) methods for machine translation, focusing on their robustness to human …

Published in: EMNLP 2025

Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza

PDF Code Dataset ArXiv

Steering Large Language Models for Machine Translation Personalization

We evaluate prompting and steering based methods for machine translation personalization in the literary domain.

Published in: Arxiv

Daniel Scalena*, Gabriele Sarti*, Arianna Bisazza, Elisabetta Fersini, Malvina Nissim

PDF ArXiv

QE4PE: Word-level Quality Estimation for Human Post-Editing

We investigate the impact of word-level quality estimation on MT post-editing with 42 professional post-editors.

Published in: TACL 2025

Gabriele Sarti, Vilém Zouhar, Grzegorz Chrupała, Ana Guerberof Arenas, Malvina Nissim, Arianna Bisazza

PDF Code Dataset Paper ArXiv GroTE Demo

Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses

We evaluate the rebus-solving capabilities of large language models on a new Italian dataset.

Published in: CLiC-it 2024

Gabriele Sarti, Tommaso Caselli, Arianna Bisazza, Malvina Nissim

PDF Dataset Paper CALAMITA Challenge Models & Dataset Repository Demo

Multi-property Steering of Large Language Models with Dynamic Activation Composition

We propose Dynamic Activation Composition, an adaptive approach for multi-property activation steering of LLMs

Published in: BlackboxNLP 2024

Daniel Scalena, Gabriele Sarti, Malvina Nissim

PDF ArXiv Repository

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

MIRAGE uses model internals for faithful answer attribution in retrieval-augmented generation applications.

Published in: EMNLP 2024 * Equal contribution

Jirui Qi*, Gabriele Sarti*, Raquel Fernández, Arianna Bisazza

PDF Project ArXiv Demo Repository

See all publications

Blog posts

ICLR 2020 Trends: Better & Faster Transformers for Natural Language Processing

A summary of promising directions from ICLR 2020 for better and faster pretrained tranformers language models.

Gabriele Sarti

May 3, 2020 14 min read

Recent & Upcoming Talks

Interpretability for Language Models: Current Trends and Applications

Nov 19, 2025 Online Invited Lecture, MSc Course on Explainable and Neuro-symbolic AI, University of Trieste

Interpreting Context Usage in Generative Language Models

Oct 7, 2025 Online LanD Group Seminar, Fondazione Bruno Kessler (FBK)

Interpreting Latent Features in Large Language Models

May 22, 2025 Room 1313.0344, Harmonie Building, University of Groningen Paper Presentation at InCLoW Reading Group

See all talks

Gabriele Sarti

PhD in Natural Language Processing

CLCG, University of Groningen

Interests

Education

Experience

🗞️ News

PhD Thesis

From Insights to Impact: Actionable Interpretability for Neural Machine Translation

Selected Publications

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Steering Large Language Models for Machine Translation Personalization

QE4PE: Word-level Quality Estimation for Human Post-Editing

Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Blog posts

ICLR 2020 Trends: Better & Faster Transformers for Natural Language Processing

Recent & Upcoming Talks

Projects

Attributing Context Usage in Language Models

Inseq: An Interpretability Toolkit for Sequence Generation Models

Contrastive Image-Text Pretraining for Italian

Covid-19 Semantic Browser

AItalo Svevo: Letters from an Artificial Intelligence

Histopathologic Cancer Detection with Neural Networks