- Multilingual BERT, or M-BERT for short, is used to obtain the representation of text in different languages and not just English.
- Similar to BERT, M-BERT is also trained with masked language modeling and next-sentence prediction tasks, but instead of using only English language Wikipedia text, M-BERT is trained using Wikipedia text in 104 different languages.
- M-BERT works better for languages that have a shared word order (SVO-SVO, SOV-SOV) than for languages that have different word order (SVO-SOV, SOV-SVO).
- Mixing or alternating different languages in a conversation is called code-switching. In transliteration, instead of writing text in the source language script, we use the target language script.
- The XLM model is pre-trained using casual language modeling, masked language modeling, and translation language modeling tasks.
- Translation language modeling (TLM) is an interesting pre-training strategy. In casual language modeling and masked...





















































