Building accurate translation-tailored large language models with language-aware instruction tuning

Large language models exhibit remarkable capabilities in various natural language processing tasks such as machine translation, but the large number of parameters leads to significant inference costs. Previous studies have tried to train moderately sized translation-tailored large language models by fine-tuning on translation data, yet when dealing with zero-shot translation directions not present in the fine-tuning data, these models often ignore instructions and produce off-target translations in the wrong language, a problem that remains unsolved.

Large language models exhibit remarkable capabilities in various natural language processing tasks such as machine translation, but the large number of parameters leads to significant inference costs. Previous studies have tried to train moderately sized translation-tailored large language models by fine-tuning on translation data, yet when dealing with zero-shot translation directions not present in the fine-tuning data, these models often ignore instructions and produce off-target translations in the wrong language, a problem that remains unsolved.