Avatar

Gabriele Sarti

PhD in Natural Language Processing

CLCG, University of Groningen

Biography

Welcome to my website! 👋 I am a PhD student at the Computational Linguistics Group of the University of Groningen. I am part of the NWO-funded project InDeep: Interpreting Deep Learning Models for Text and Sound, focusing on interpretability for neural machine translation. I am supervised by Arianna Bisazza and Malvina Nissim.

Previously, I was a research scientist at Aindo, a student in the Data Science MSc at University of Trieste & SISSA and a founding member of the AI Student Society. My master’s thesis with the ItaliaNLP Lab in Pisa was about the study of linguistic complexity using gaze recordings and neural language models.

My research focuses on interpretability for NLP models, in particular to the benefit of end-users and by leveraging human behavioral signals. I am also passionate about social applications of machine learning, ethical AI, and open source collaboration.

Interests

  • Conditional Text Generation
  • Interpretability for Deep Learning
  • Representation Learning for NLP
  • Computational Psycholinguistics

Education

  • PhD in NLP, 2021 - Ongoing

    University of Groningen, NL

  • MSc in Data Science, 2020

    University of Trieste & SISSA, IT

  • DEC in Software Management, 2018

    Cégep de Saint-Hyacinthe, CA

News

 

Selected Publications

Contrastive Language-Image Pre-training for the Italian Language

We present the first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs.

That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models

This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how …

Blog posts

ICLR 2020 Trends: Better & Faster Transformers for Natural Language Processing

A summary of promising directions from ICLR 2020 for better and faster pretrained tranformers language models.

Recent & Upcoming Talks

Neural Language Models: the New Frontier of Natural Language Understanding
The Literary Ordnance: When the Writer is an AI
Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Projects

 

Contrastive Image-Text Pretraining for Italian

The first CLIP model pretrained on the Italian language.

Covid-19 Semantic Browser

A semantic browser for SARS-CoV-2 and COVID-19 powered by neural language models.

AItalo Svevo: Letters from an Artificial Intelligence

Generating letters with a neural language model in the style of Italo Svevo, a famous italian writer of the 20th century.

Histopathologic Cancer Detection with Neural Networks

A journey into the state of the art of histopathologic cancer detection approaches.