Contrastive Image-Text Pretraining for Italian

Federico Bianchi, Raphael Pisoni, Giuseppe Attanasio, Silvia Terragni, Dario Balestri, Gabriele Sarti, Sri Lakshmi

Jul 23, 2021 Multimodality

Model Code Demo

CLIP is a multimodel model that can learn to represent images and text jointly in the same space. In this project, we aim to propose the first CLIP model trained on Italian data, that in this context can be considered a low resource language. Using a few techniques, we have been able to fine-tune a SOTA Italian CLIP model with only 1.4 million training samples.

For more information, refer to our demo.

Computer Vision Natural Language Processing Multimodality Italian HuggingFace Deep Learning Contrastive Learning CLIP

Contrastive Image-Text Pretraining for Italian

Related

Publications

Contrastive Language-Image Pre-training for the Italian Language