Monday, December 23, 2024

Google bets on African languages, including Dyula, Wolof, Baoulé and Tamazight

Must read

Sran ng’ɔ bo alɛ’n i jɔ’n, ɔ diman alɛ sɔ’n wie.” This Baoulé proverb is now translatable with Google Translate: “He who declares war does not participate in it.”

Since its new update on June 27, the software from the American giant has been able to translate 110 new languages, including Breton and Occitan, as well as 31 African languages, among them Tamazight (Berber), Afar, Wolof, Dyula and Baoulé. According to Google, these languages represent 200 million speakers on the continent.

“Today, you can photograph a label in Mandarin and see it translated by Google Lens into Dyula,” said Abdoulaye Diack, program manager at Google’s artificial intelligence (AI) lab in Accra, Ghana, who said he wants to “bring communities together” with this new service.

Establishing these translation models was a major challenge, given the lack of available resources. Half of the content written on the internet is in English. French accounts for just 3%, and the many African languages for less than 1%. “There are blogs and news sites in Swahili, Hausa and Wolof, but many African languages have predominantly oral uses,” Diack explained. “So the first task was to identify the written sources available.” In addition to these sites, some major texts have been translated into almost every language on the planet, including the Universal Declaration of Human Rights, the Bible and the Quran.

‘An incentive effect’

Google’s teams then worked with linguists from several faculties, such as the University of Ghana, and NGOs to accumulate data on all the targeted languages. These components were then used to train Google’s AI, a learning model called PaLM2, which has already been tested with almost 400 languages. “Artificial intelligence is like a child,” Diack summarized. “The more data the model receives, the more it learns, and the better the result.”

Google’s partners in the target communities – including organizations defending endangered languages and researchers – were asked to evaluate and improve the first AI translations until they reached a sufficient quality and quantity to launch the update. “This process takes several years. The results are not perfect, but they are satisfactory enough to be usable,” explained Diack. “There are bound to be mistakes, but it will be useful for a lot of people.”

This view is shared by Ivorian linguist Jérémie N’Guessan Kouadio, co-author of a French-Baoulé dictionary, whom Le Monde asked to test the new Google Translate update. “The Baoulé language is inseparable from its orality,” he said. “To improve the result, we’d need, for example, to be able to render tones, those phonemes that can change the meaning of a word, which we note with diacritical marks below the syllable. Take ‘sa’: If I pronounce it with a high tone, it means ‘the hand.’ But with a low tone, it means ‘thus.’ All the languages of Côte d’Ivoire work like that, including Dyula.”

Despite his reservations, N’Guessan Kouadio acknowledged that the software “has its uses.” “For years, people have been trying to convince Africans – and Ivorians – that they can speak French or English, but also speak and write in their mother tongue,” said the researcher. “I think software like this will have an incentive effect, particularly on young people in the diaspora who have drifted away from their language of origin.”

Speech recognition and synthesis

Professional uses are also conceivable. The African languages previously added (five in 2020 and 10 in 2022, including Bambara, Lingala and Twi) are available as open source through Application Programming Interfaces (APIs), which enable a Google program or service to be connected. The software could also facilitate the work of human interpreters, predicted Yao Kanga Tanoh, from Côte d’Ivoire, whose translation orders mainly concern administrative documents: “Of course, I’ll have to rework the result, but a machine translation will save me a lot of time.”

The Silicon Valley giant has no intention of stopping there. It has set itself the medium-term goal of integrating a thousand languages, prioritized according to several criteria: the number of speakers, the feasibility of the project in terms of the abundance of written resources, but also the desire of the relevant community. “People had been asking us for Wolof for years,” said Diack. His team also intends to develop a speech recognition and synthesis system for the recently added languages, as already exists for the previous ones. With this technology, a telephone will be able to instantly repeat a French sentence in Baoulé, a particularly useful option for illiterate speakers.

New

Le Monde’s app

Get the most out of your experience: download the app to enjoy Le Monde in English anywhere, anytime

Download

Google also claims to want to immortalize endangered languages, largely not used by younger generations. One of these is the N’Ko language, invented in 1949 by Guinean writer Solomana Kanté, with its unique alphabet designed to empower Mandingo communities by providing them with their own writing system.

Translation of an original article published in French on lemonde.fr; the publisher may only be liable for the French version.

Reuse this content

Latest article