Mind your language: The battle for linguistic diversity in AI

«Back to Home

Mind your language: The battle for linguistic diversity in AI

UN
23 Mar 2025, 17:30 GMT+10

For two years, one international organization under the umbrella of the UN has been leading a relentless campaign in the corridors of global digital diplomacy. Its mission? To bring linguistic diversity to English-dominated artificial intelligence.

Online attacks against women are getting worse, according to the UN, causing lasting damage that can spill over into real world violence. Leading activists from Spain and Latin America met at UN Headquarters on Wednesday to rally women and share strategies on fighting back.

Humanitys future depends on investing in the machinery of peace, not the machinery of war,saidSecretary-General Antnio Guterres in a message marking the International Day for Disarmament and Non-Proliferation Awareness.

With his signature geeky glasses and TED-Talk-style headset, Sundar Pichai looked straight out of a Silicon Valley incubator.

That Monday, February 10, Googles chief executive took the stage at the Artificial Intelligence Action Summit in Paris. From the Grand Palais podium, he heralded a new golden age of innovation.

"Using AI techniques, we added over 110 new languages to Google Translate last year, spoken by half a billion people around the world," said the tech mogul, his eyes fixed on his notes. "That brings our total to 249 languages, including 60 African languages more to come."

Delivered in a monotone, his statement barely registered among the summits attendees an assembly of world leaders, researchers, NGOs, and tech executives.

Permanent Mission of Canada

But for advocates of linguistic diversity in artificial intelligence, Mr. Pichais words marked a quiet victory one achieved after two years of intense, behind-the-scenes negotiations in the arcane world of digital diplomacy.

"It shows the message is getting through and tech companies are listening," said Joseph Nkalwo Ngoula, digital policy advisor at the UN mission of the International Organisation of La Francophonie, in New York.

Linguistic divide

Mr. Pichais speech was a far cry from the linguistic missteps of early generative AI a branch of artificial intelligence capable of creating original content, from text to images, music and animation.

When OpenAI launched ChatGPT in 2022, non-English speakers quickly discovered its limitations.

A query in English would generate a detailed, informative response. The same prompt in French? Two paragraphs, followed by a sheepish apology: "Sorry, I havent been trained on that," or, "my model isn't updated beyond this date."

Such a gap lies in the intricate mechanics of AI tools, which rely on so-called large language models (LLMs) like GPT-4, Metas LlaMA, or Googles Gemini to digest vast troves of internet data that help them understand and generate text.

But the internet itself is overwhelmingly Anglophone. While only 20 per cent of the worlds population speaks English at home, nearly half of the training data for major AI models is in English.

Even today, ChatGPTs responses in French, Portuguese, or Spanish have improved but remain less illuminating than their English counterparts.

UN Photo/Elma Okic

Sharper focus

"The volume of available information in English is much greater, but its also more up to date," said Mr. Nkalwo Ngoula. By default, AI models are conceived, trained, and deployed in English, leaving other languages struggling to catch up.

The divide isnt just quantitative. AI, when deprived of robust training in any given language, starts to "hallucinate" generating incorrect or absurd answers with unsettling authority much like an overconfident friend bluffing his way through trivia night.

A classic AI hallucination consists of responding to a request for biographical details about a famous person by inventing a Nobel Prize or coming up with an odd parallel career, as in this example generated by ChatGPT, at the behest of :

: Who is Victor Hugo?

🚀😆

Black box

"Its a black box absorbing data," Mr. Nkalwo Ngoula explained. "The results might be formally coherent and logically structured, but factually, they can be wildly inaccurate."

Beyond factual errors, AI tends to flatten linguistic richness. Chatbots struggle with regional accents and language variations, such as Quebecois French or Creole languages spoken in Haiti and the French Caribbean.

AI-generated French often feels sanitized, stripped of its stylistic nuances.

"Molire, Lopold Sdar Senghor, Aim Csaire, Mongo Beti - theyd all be turning in their graves if they saw how A.I. writes French today," joked Mr. Nkalwo Ngoula.

The issue runs deeper in multilingual countries, as in the diplomats native Cameroon, where youth commonly speak Camfranglais a hybrid of French, English, Pidgin, and local languages.

"I doubt young people could ask an AI something in Camfranglais and get a meaningful response," he said. Expressions like "Je yamo ce pays" (I love this country) or "Rponds-moi sharp-sharp" (Answer me quickly) would likely leave A.I. models bewildered.

UN Photo/Loey Felipe

Shadow Campaign of La Francophonie

Mr. Nkalwo Ngoulas organization, La Francophonie which brings together 93 states and governments around the use of French, representing more than 320 million people worldwide has made this linguistic gap a centerpiece of its digital strategy.

The groups efforts culminated in last years UN Global Digital Compact, a framework for AI governance adopted by the Member States. From 2023 onward, La Francophonie leveraged its diplomatic network including the influential Francophone Ambassadors Group at the UN to ensure linguistic diversity became a core principle in AI policymaking.

Along the way, unexpected allies emerged. Lusophone and Hispanic advocacy groups joined the fight, and even Washington sided with their cause. "The US defended language inclusion in AI development," Mr. Nkalwo Ngoula noted.

Their push paid off. The final Global Digital Compact explicitly recognizes cultural and linguistic diversity an issue that had initially been buried under broader discussions on accessibility. Our goal was to bring it to the forefront," he said.

The movement even reached Silicon Valley. At the UNSummit for the Futurein September 2024, where the Compact was officially adopted, Sundar Pichai, Googles CEO, surprised many by emphasizing the need for A.I. to provide access to global knowledge in multiple languages.

"Were working toward 1,000 of the worlds most spoken languages," he pledged a commitment he reaffirmed in Paris months later.

Limits of the Global Digital Compact

Despite these gains, challenges remain. Chief among them is visibility. "Francophone content is often buried by platform algorithms," Mr Nkalwo Ngoula warns.

Streaming giants like Netflix, YouTube, and Spotify prioritize popularity, meaning English-language content dominates search results.

"If linguistic diversity were truly considered, a French-speaking user should see French-language films at the top of their recommendations," he argued.

The overwhelming dominance of English in AI training data is another hurdle sidestepped by the Compact, which also omits any reference toUNESCOs Convention on Cultural Diversity an oversight that, according to Mr. Nkalwo Ngoula, should be rectified.

"Linguistic diversity must be the backbone of digital advocacy for La Francophonie," Nkalwo Ngoula insisted.

Given the pace of AI development, those changes cant come a moment too soon.

Permanent Mission of CanadaUN Photo/Elma OkicUN Photo/Loey Felipe