This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available.
We investigate whether and how using different architectures of probing models affects the performance of Italian transformers in encoding a wide spectrum of linguistic features.