In recent years, Transformer-based language models have achieved remarkable progress in most language generation and understanding tasks. However, the internal computations of these models are hardly interpretable due to their highly nonlinear structure, hindering their usage for mission-critical applications requiring trustworthiness and transparency guarantees. This presentation will introduce interpretability methods used for tracing the predictions of language models back to their inputs and discuss how these can be used to gain insights into model biases and behaviors. Several concrete examples of language model attributions will be presented throughout the presentation using the Inseq interpretability library.
With the astounding advances of artificial intelligence in recent years, the field of interpretability research has emerged as a fundamental effort to ensure the development of robust AI systems aligned with human values. In this talk, two perspectives on AI interpretability will be presented alongside two case studies in natural language processing. The first study leverages behavioral data and probing tasks to study the perception and encoding of linguistic complexity in humans and language models. The second introduces a user-centric interpretability perspective for neural machine translation to improve post-editing productivity and enjoyability. The need for such application-driven approaches will be emphasized in light of current challenges in faithfully evaluating advances in this field of study.