We evaluate unsupervised word-level quality estimation (WQE) methods for machine translation, focusing on their robustness to human label variation.
We evaluate prompting and steering based methods for machine translation personalization in the literary domain.
We investigate the impact of word-level quality estimation on MT post-editing with 42 professional post-editors.
We propose DecoderLens, a method to interpret the iterative refinement of representations in encoder-decoder Transformer models.
We introduce PECoRe, an interpretability framework for identifying context dependence in language model generations.
We introduce Retrieval and Attribute-Marking enhanced Prompting (RAMP) to perform attribute-controlled MT with multilingual LLMs.
We analyze input contributions of char-level MT models and show how they modulate word and character-level information.
We present Inseq, a Python library to democratize access to interpretability analyses of sequence generation models.
DivEMT is a publicly available post-editing study of Neural Machine Translation over a typologically diverse set of target languages.