Language Modeling

Explaining Language Models with Inseq

In recent years, Transformer-based language models have achieved remarkable progress in most language generation and understanding tasks. However, the internal computations of these models are hardly interpretable due to their highly nonlinear structure, hindering their usage for mission-critical applications requiring trustworthiness and transparency guarantees. This presentation will introduce interpretability methods used for tracing the predictions of language models back to their inputs and discuss how these can be used to gain insights into model biases and behaviors. Several concrete examples of language model attributions will be presented throughout the presentation using the Inseq interpretability library.

Post-hoc Interpretability for Language Models

This talk discusses the challenges of interpreting generative language models and presents Inseq, a toolkit for interpreting sequence generation models. The usage of Inseq is illustrated with examples introducing state-of-the-art approaches for interpreting language models such as contrastive attribution. Finally, the PECoRe framework is presented as a mean to evaluate the plausibility of context usage in language models.

Post-hoc Interpretability for NLG & Inseq: an Interpretability Toolkit for Sequence Generation Models

Post-hoc Interpretability for Neural Language Models

Explaining Neural Language Models from Internal Representations to Model Predictions

As language models become increasingly complex and sophisticated, the processes leading to their predictions are growing increasingly difficult to understand. Research in NLP interpretability focuses on explaining the rationales driving model predictions and is crucial for building trust and transparency in the usage of these systems in real-world scenarios. In this laboratory, we will explore various techniques for analyzing Neural Language Models, such as feature attribution methods and diagnostic classifiers. Besides common approaches to inspect models’ internal representations, we will also introduce prompting techniques to elicit model responses and motivate their usage as alternative methods for the behavioral study of model generations.

Post-hoc Interpretability for Neural Language Models

In recent years, Transformer-based language models have achieved remarkable progress in most language generation and understanding tasks. However, the internal computations of these models are hardly interpretable due to their highly nonlinear structure, hindering their usage for mission-critical applications requiring trustworthiness and transparency guarantees. This presentation will introduce interpretability methods used for tracing the predictions of language models back to their inputs and discuss how these can be used to gain insights into model biases and behaviors. Throughout the presentation, several concrete examples of language model attributions will be presented using the Inseq interpretability library.

Inseq: An Interpretability Toolkit for Sequence Generation Models

This talk introduces the Inseq toolkit for interpreting sequence generation models. The usage of Inseq is illustrated with examples introducing state-of-the-art approaches for interpreting language models such as contrastive attribution, tuned lenses and causal mediation analysis.

Explaining Language Models with Inseq

Post-hoc Interpretability for Language Models

Post-hoc Interpretability for NLG & Inseq: an Interpretability Toolkit for Sequence Generation Models

Post-hoc Interpretability for Neural Language Models

Explaining Neural Language Models from Internal Representations to Model Predictions

Post-hoc Interpretability for Neural Language Models

Inseq: An Interpretability Toolkit for Sequence Generation Models

Advanced XAI Techniques and Inseq: An Interpretability Toolkit for Sequence Generation Models

Interpreting Neural Language Models for Linguistic Complexity Assessment

ICLR 2020 Trends: Better & Faster Transformers for Natural Language Processing