Monday, December 23, 2024

Google Translate adding support for 110 new languages, including Cantonese

Must read

Google Translate is getting its “largest expansion ever” today with 110 new languages.

Google credits its PaLM 2 large language model (from 2023 before Gemini) as making this possible:

PaLM 2 was a key piece to the puzzle, helping Translate more efficiently learn languages that are closely related to each other, including languages close to Hindi, like Awadhi and Marwadi, and French creoles like Seychellois Creole and Mauritian Creole. 

These additions cover over 614 million speakers, thus “opening up translations for around 8% of the world’s population.” It is Google’s largest expansion of African languages to date, with a quarter of today’s additions coming from Africa. 

Some are major world languages with over 100 million speakers. Others are spoken by small communities of Indigenous people, and a few have almost no native speakers but active revitalization efforts.


  • Afar is a tonal language spoken in Djibouti, Eritrea and Ethiopia. Of all the languages in this launch, Afar had the most volunteer community contributions.
  • Cantonese has long been one of the most requested languages for Google Translate. Because Cantonese often overlaps with Mandarin in writing, it is tricky to find data and train models.
  • Manx is the Celtic language of the Isle of Man. It almost went extinct with the death of its last native speaker in 1974. But thanks to an island-wide revival movement, there are now thousands of speakers.
  • NKo is a standardized form of the West African Manding languages that unifies many dialects into a common language. Its unique alphabet was invented in 1949, and it has an active research community that develops resources and technology for it today.
  • Punjabi (Shahmukhi) is the variety of Punjabi written in Perso-Arabic script (Shahmukhi), and is the most spoken language in Pakistan. 
  • Tamazight (Amazigh) is a Berber language spoken across North Africa. Although there are many dialects, the written form is generally mutually understandable. It’s written in Latin script and Tifinagh script, both of which Google Translate supports. 
  • Tok Pisin is an English-based creole and the lingua franca of Papua New Guinea. If you speak English, try translating into Tok Pisin — you might be able to make out the meaning!

Moving forward, Google wants to “support even more language varieties and spelling conventions over time.” The broader goal is to “build AI models that will support the 1,000 most spoken languages around the world.”

FTC: We use income earning auto affiliate links. More.

Latest article