In recent years, Transformer-based language models have achieved remarkable progress in most language generation and understanding tasks. However, the internal computations of these models are hardly interpretable due to their highly nonlinear structure, hindering their usage for mission-critical applications requiring trustworthiness and transparency guarantees. This presentation will introduce interpretability methods used for tracing the predictions of language models back to their inputs and discuss how these can be used to gain insights into model biases and behaviors. Throughout the presentation, several concrete examples of language model attributions will be presented using the Inseq interpretability library.
This talk introduces the Inseq toolkit for interpreting sequence generation models. The usage of Inseq is illustrated with examples introducing state-of-the-art approaches for interpreting language models such as contrastive attribution, tuned lenses and causal mediation analysis.
This talk introduces the Inseq toolkit for interpreting sequence generation models. The usage of Inseq is illustrated with examples introducing state-of-the-art approaches for interpreting language models such as contrastive attribution, tuned lenses and causal mediation analysis.
This thesis presents a model-driven study of multiple phenomena associated with linguistic complexity, and how those get encoded by neural language models' learned representations.