Unsupervised Machine Translation: A technique for translating text between languages without relying on parallel data.
Unsupervised machine translation (UMT) is an emerging field in natural language processing that aims to translate text between languages without the need for parallel data, which consists of pairs of sentences in the source and target languages. This is particularly useful for low-resource languages, where parallel data is scarce or unavailable. UMT leverages monolingual data and unsupervised learning techniques to train translation models, overcoming the limitations of traditional supervised machine translation methods that rely on large parallel corpora.
Recent research in UMT has explored various strategies to improve translation quality. One approach is pivot translation, where a source language is translated to a distant target language through multiple hops, making unsupervised alignment easier. Another method involves initializing unsupervised neural machine translation (UNMT) with synthetic bilingual data generated by unsupervised statistical machine translation (USMT), followed by incremental improvement using back-translation. Additionally, researchers have investigated the impact of data size and domain on the performance of unsupervised MT and transfer learning.
Cross-lingual supervision has also been proposed to enhance UMT by leveraging weakly supervised signals from high-resource language pairs for zero-resource translation directions. This allows for the joint training of unsupervised translation directions within a single model, resulting in significant improvements in translation quality. Furthermore, extract-edit approaches have been developed to avoid the accumulation of translation errors during training by extracting and editing real sentences from target monolingual corpora.
Practical applications of UMT include translating content for low-resource languages, enabling communication between speakers of different languages, and providing translation services in domains where parallel data is limited. One company leveraging UMT is Unbabel, which combines artificial intelligence with human expertise to provide fast, scalable, and high-quality translations for businesses.
In conclusion, unsupervised machine translation offers a promising solution for translating text between languages without relying on parallel data. By leveraging monolingual data and unsupervised learning techniques, UMT has the potential to overcome the limitations of traditional supervised machine translation methods and enable translation for low-resource languages and domains.

Unsupervised Machine Translation
Unsupervised Machine Translation Further Reading
1.Unsupervised Pivot Translation for Distant Languages http://arxiv.org/abs/1906.02461v3 Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, Tie-Yan Liu2.Unsupervised Neural Machine Translation Initialized by Unsupervised Statistical Machine Translation http://arxiv.org/abs/1810.12703v1 Benjamin Marie, Atsushi Fujita3.Zero-Shot Language Transfer vs Iterative Back Translation for Unsupervised Machine Translation http://arxiv.org/abs/2104.00106v1 Aviral Joshi, Chengzhi Huang, Har Simrat Singh4.Cross-lingual Supervision Improves Unsupervised Neural Machine Translation http://arxiv.org/abs/2004.03137v3 Mingxuan Wang, Hongxiao Bai, Hai Zhao, Lei Li5.Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation http://arxiv.org/abs/1904.02331v1 Jiawei Wu, Xin Wang, William Yang Wang6.An Effective Approach to Unsupervised Machine Translation http://arxiv.org/abs/1902.01313v2 Mikel Artetxe, Gorka Labaka, Eneko Agirre7.Machine Translation with Unsupervised Length-Constraints http://arxiv.org/abs/2004.03176v1 Jan Niehues8.Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation http://arxiv.org/abs/1906.05683v1 Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, Jonathan May9.Multilingual Unsupervised Neural Machine Translation with Denoising Adapters http://arxiv.org/abs/2110.10472v1 Ahmet Üstün, Alexandre Bérard, Laurent Besacier, Matthias Gallé10.Explicit Cross-lingual Pre-training for Unsupervised Machine Translation http://arxiv.org/abs/1909.00180v1 Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, Shuai MaUnsupervised Machine Translation Frequently Asked Questions
What is unsupervised machine translation?
Unsupervised machine translation (UMT) is a technique in natural language processing that translates text between languages without relying on parallel data, which consists of pairs of sentences in the source and target languages. This approach is particularly useful for low-resource languages, where parallel data is scarce or unavailable. UMT leverages monolingual data and unsupervised learning techniques to train translation models, overcoming the limitations of traditional supervised machine translation methods that require large parallel corpora.
How do unsupervised translation algorithms work?
Unsupervised translation algorithms work by leveraging monolingual data in both the source and target languages. They use unsupervised learning techniques, such as clustering, autoencoders, or generative adversarial networks (GANs), to learn the underlying structure and patterns in the data. These algorithms then use this knowledge to generate translations by mapping the source language sentences to the target language sentences, without relying on parallel data.
What are the 4 types of machine translation in NLP?
There are four main types of machine translation in natural language processing: 1. Rule-based machine translation (RBMT): This approach uses linguistic rules and dictionaries to translate text between languages. It relies on expert knowledge of the source and target languages to create these rules. 2. Statistical machine translation (SMT): This method uses statistical models to learn the relationship between the source and target languages based on parallel data. It generates translations by selecting the most probable target language sentence given the source language sentence. 3. Neural machine translation (NMT): This approach uses deep learning techniques, such as recurrent neural networks (RNNs) or transformers, to learn the mapping between the source and target languages. NMT models can generate more fluent and accurate translations compared to SMT. 4. Unsupervised machine translation (UMT): As discussed earlier, UMT translates text between languages without relying on parallel data. It leverages monolingual data and unsupervised learning techniques to train translation models.
Is machine translation supervised?
Machine translation can be either supervised or unsupervised. Supervised machine translation, such as statistical machine translation (SMT) and neural machine translation (NMT), relies on parallel data to learn the relationship between the source and target languages. In contrast, unsupervised machine translation (UMT) does not require parallel data and instead leverages monolingual data and unsupervised learning techniques to train translation models.
What are the challenges in unsupervised machine translation?
Unsupervised machine translation faces several challenges, including: 1. Lack of parallel data: UMT relies on monolingual data, making it difficult to learn the relationship between the source and target languages directly. 2. Lower translation quality: UMT models often produce less accurate translations compared to supervised methods, especially for distant language pairs or complex sentences. 3. Domain adaptation: UMT models may struggle to adapt to new domains or genres, as they rely on the monolingual data available during training. 4. Scalability: Training UMT models can be computationally expensive, especially for large-scale applications or when dealing with multiple languages.
How can unsupervised machine translation be improved?
Recent research has explored various strategies to improve unsupervised machine translation, such as: 1. Pivot translation: Translating a source language to a distant target language through multiple hops, making unsupervised alignment easier. 2. Initializing unsupervised neural machine translation (UNMT) with synthetic bilingual data generated by unsupervised statistical machine translation (USMT), followed by incremental improvement using back-translation. 3. Cross-lingual supervision: Leveraging weakly supervised signals from high-resource language pairs for zero-resource translation directions, allowing for joint training of unsupervised translation directions within a single model. 4. Extract-edit approaches: Avoiding the accumulation of translation errors during training by extracting and editing real sentences from target monolingual corpora.
What are some practical applications of unsupervised machine translation?
Practical applications of unsupervised machine translation include: 1. Translating content for low-resource languages, where parallel data is scarce or unavailable. 2. Enabling communication between speakers of different languages, especially in situations where supervised translation models are not available or not accurate enough. 3. Providing translation services in domains where parallel data is limited, such as legal, medical, or technical texts. 4. Assisting businesses in expanding their global reach by translating websites, marketing materials, and customer support content without relying on parallel data.
Explore More Machine Learning Terms & Concepts