We propose Dynamic Activation Composition, an adaptive approach for multi-property activation steering of LLMs
MIRAGE uses model internals for faithful answer attribution in retrieval-augmented generation applications.
IT5s are the first encoder-decoder transformers pretrained on more than 40 billion Italian words.
This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architecture.
We propose DecoderLens, a method to interpret the iterative refinement of representations in encoder-decoder Transformer models.
We introduce PECoRe, an interpretability framework for identifying context dependence in language model generations.
We introduce Retrieval and Attribute-Marking enhanced Prompting (RAMP) to perform attribute-controlled MT with multilingual LLMs.
We analyze input contributions of char-level MT models and show how they modulate word and character-level information.
We present Inseq, a Python library to democratize access to interpretability analyses of sequence generation models.
An interpretability framework to detect and attribute context usage in language models' generations