Feature… Not Made For Us: Global AI translation tools distort Nigerian languages

There are over 7,000 languages spoken in the world, but global reverence achieved through colonialism, military conquests, and economic prowess has helped a selected few dominate the global stage. From international conference halls to digital platforms, one’s voice is tantamount to their prowess in certain languages. Thousands of languages have been silenced, even in the new world of artificial intelligence.

AI-powered translation tools such as Google Translate, DeepL, and Amazon Translate are designed to aid communication across cultures and facilitate research in the areas of linguistics and cultural studies. However, these tools are extending inequalities into the future. Nigerian languages, like thousands of languages in the global south, are either absent in them or poorly represented.

Adebayo Wisdom, a final-year student of Linguistics at the University of Ibadan, shares his frustration over the limited use of artificial intelligence in his department. He acknowledges the transformative significance of AI in his field of study, but laments that he and his colleagues who are exploring Nigerian languages, do not get as much value from the tools as their counterparts who major in European languages.

“As a Linguistics student who explores speech sounds in phonology, Google Translate is not a credible tool for translating our indigenous languages because it doesn’t get the tonal features. For instance, it mixes up Yoruba words like: “igbá, ìgbà, ìgbáá, and igba.”

It’s a similar experience for Sakeena Kareem who is a double Honors student of Communication and Language Arts, and European Studies. She majors in German and uses such tools as DeepL, Google Translate, and NotebookLM.

She said translating to and from her mother tongue, Yoruba, would enhance her understanding, but “these tools do not even support Nigerian indigenous languages, apart from Google Translate, which is not reliable.”

Igbo, Yoruba Languages Are Poorly Translated By Google Translate

An experiment by this writer confirms the unreliability of Google Translate in translating Nigerian indigenous languages. Sentences and proverbs in the Yoruba and Igbo languages were put as prompts into Google Translate for translations, and most of them were either poorly translated or outrightly mistranslated. For instance, it translated the Yoruba saying “Bí ò̩tá e̩ni bá pa odù ò̩yà, wó̩n á so̩ pé ọmọ ẹlé̩bó̩ró̩ ló pa” to “If one’s enemy kills the river, they will say that he killed the son of a thief”. This makes no sense and is outrightly misleading. The correct translation is: “If one’s enemy kills an aged bush rat, they would say he killed a small millipede.”

In another instance, the saying “Pípẹ́ ni ọdẹ ńpẹ́, ọdẹ kìí sọnù” was translated to “The hunt is long, the prey is never lost.” The correct translation is: “The hunter can only be late (to return from a hunt), he can’t go missing (in the wild)” It is observed that in this particular instance, Google Translate almost got the first clause in the saying right but its translation exposes a clear isolation of the two clauses and lack of cultural context. Translating “Pípẹ́ ni ọdẹ ńpẹ́” to “the hunt is long” is also literally correct in isolation, but becomes misleading in the translation of the full sentence.

The prompts in Igbo were also mistranslated: The saying “Ọkukọ si na nkpu ya n’eti aburọ ka ya laputa ya. O ka ha nu onu ya” was translated by Google Translate to “The rooster crowed and shouted to bring him back. He let them hear his voice.” The correct translation is “The fowl said that her shouts are not so you would help her but so you could hear her”. In another instance, the saying “Nkpi si na ka ha go shiri ya ọfọ ogonogo, ka ha gọ wa ru ya ọfọ ndu” was wrongly translated to “The goat said, ‘Let them go and give him a big meal, let them go and give him a life meal.” However, the correct translation is “The he goat said you should say prayers of long life and not height for him.”

The results of the same experiment for Hausa were near-accurate. Linguists and language students who have tested Google Translate with Hausa say this near-accuracy is limited to writing, and is because it is not a tonal language like Igbo and Yoruba. “In comparison to other Nigerian languages, Hausa does a little better in machine translation because it is in the Afro-Asiatic language family, which has some well-resourced languages like Arabic. Its phonological structure is also simpler than Igbo, Yoruba, and many African languages,” explained Sadiq Abubakar, a Hausa language instructor and translator.

Meanwhile, these mistranslations and other forms of linguistic inequalities suffered by Nigerian and African languages in global AI translation tools, as well as Large Language Models have real-life consequences for linguists, language students, researchers and cross-cultural communicators.

Quadri Yahaya, a 300-level student of linguistics at the University of Abuja, said the poor representation of Nigerian languages in global LLMs makes research frustrating. He noted that he had tried translating Yoruba words with Google Translate and ChatGPT, but “when it comes to African languages, I find it difficult to do quick research with LLMs. That means going through the route of reviewing research papers, for what LLMs could have helped with easily, if only they were African-centric.”

Just as Adebayo pointed out that they misinterpret tones in speeches, these AI tools also misconstrue contexts and cultural meanings in translating Nigerian languages. Sometimes, they bastardise our languages.

Rasheed Adeniyi, a Yoruba language instructor and academic, explained that the tonal nature of the Yoruba language makes it difficult for popular translation and Gen AI tools to translate Yoruba sentences or give accurate information on the language.

“As you know, the Yoruba language is a tonal language. We have the low tone, the middle tone and the high tone. So it is very difficult for AI or machine learning to understand that. Likewise, the context is also important. For example, if I say Igbá, ìgbà, igba. They are of the same spelling, but have different pronunciation and different meanings. So these are some of the challenges that make it difficult for AI to understand.”

He said he doesn’t encourage his students in diaspora to use AI at all, “because the information they would get from there is likely to be wrong.” He added that “learning through AI can only help someone who already knows a large part of the language.”

Illusion of Inclusion

While DeepL, and Amazon Translate do not support Nigerian languages at all, Google Translate boasts of including languages in the world’s global South, including major Nigerian languages — Hausa, Igbo, and Yoruba. As outlined on the Google Cloud Translation languages page, the multilingual neural machine translation tool developed by Google supports over 200 languages as of 2025, and the list includes Nigeria’s Hausa, Igbo, and Yoruba.

However, Nigerian indigenous languages and many other African languages use the Neural Machine Translation model for automated translation. Experts say because the model relies on large volumes of high-quality data, it performs poorly when translating low-resource languages like Igbo and Yoruba. Researchers specifically note serious inaccuracies in the NMT’s performance when translating complex sentences, idiomatic expressions, and culturally nuanced languages.

On the contrary, high-resource languages, particularly European languages, are paired under custom models for automated translations. Hence, they perform better when translating from and to one another.

Low Resources, Inadequate Digital Data

Meanwhile, experts say English is the default language of Generative AI because it is a highly resourced language. Only about 100 of the 7,000 languages spoken in the world have a moderate to substantial amount of natural language processing (NLP) resources, and just about 20 languages are considered high-resource. Languages such as English, Chinese (Mandarin), Spanish, French, Dutch, Japanese, and Arabic are notable high-resource languages that have extensive datasets, linguistic research, dictionaries, corpora, annotated data, and technological support such as speech recognition and machine translation systems, enabling robust AI and NLP development.

On the other hand, most languages spoken in Asia and Africa, such as Yoruba, Igbo, Somali, Swahili, Tigrinya, Kinyarwanda, Thai, and Myanmar, are low-resource languages with insufficient digital linguistic resources to effectively support computational tasks like machine translation, text understanding, and language generation. Although new multilingual models like mT5, BLOOM, and xLLMs‑100 are emerging to reduce this inequality, experts say their performances are still poor on low-resource languages due to data scarcity and tokenisation challenges.

There is also a severe shortage of both labeled and unlabeled data for the low-resource languages. Existing data is often mislabeled, insufficient, or inappropriate for NLP tasks. Experts say available data is often limited to religious texts, legal documents, or Wikipedia articles, which do not reflect everyday language usage or sociocultural nuances.

The Inequality Is Not In The Tools But In Data Representation

Nigerian Linguist, Writer and Scholar, Kola Tubosun noted that the reason African languages are poorly represented in LLMs and translation tools is because of a dearth of African data, “particularly in digital form”. He stressed that while the Nigerian indigenous languages are rich in literature, most of the literature that explores the intricacies of the languages are not available in digital formats and therefore are difficult to be employed in the training of AI language models.

Ayantola Alayande, a Researcher at the Global Center on AI Governance discourages looking at the challenge from a victim perspective. He argues that “the inequality is not in the tools but in the data representation.” This is because African data often constitute less than 1% of the total dataset employed in global LLMs and NLPs.

Alayande who described the situation as a chicken and egg problem explained that the datasets in which global AI tools are built on do not contain sufficient data on Nigerian or African languages, hence their poor performance in these languages.

The PhD candidate at the University of Oxford makes a case for AI/Data sovereignty and capabilities in Africa, “because we don’t have to use OpenAI; we don’t have to use LLaMA 3; we don’t have to use Google Gemini and so on.”

He continues: “I think the conversation should be broader about capabilities and sovereignty on the African continent… Even today, you have so many people building local language tools, right? I don’t expect people to start to say, ‘why is this model not performing well in French?’ Because initially, that’s not the primary, that’s not the primary target of the product.

For Tubosun “the focus shouldn’t be in the ‘representation’ for its own sake,” but in its usefulness to Africans. He stressed that our focus should be on how AI tools can be employed in solving real life problems and drive educational and technological growth.

“If all we end up with is a chance for more colonial exploitation, because our languages and cultures are now more accessible, then what’s the point of that representation? Can a non-English speaker of any Nigerian language use the modern tools of technology to solve their problems? That should be the goal, and we can arrive there in many ways — which include improved education in our local languages, in English, and in technology education from an early age, which can lead to scaled competence with the current and future iterations of these technologies with our needs in mind.”

Local Initiatives Trying to Bridge the Gap

Meanwhile, some Nigerian startups are working on bridging the gap, particularly by building datasets. As Tubosun and Alayande have identified, the inequality stems from poor data representation. Notable initiatives in Nigeria include Awarri and HausaNLP, as well as Masakani for African languages.

Awarri is developing Nigeria’s first multilingual Large Language Model (LLM), aimed at promoting local language representation. Founded in 2019 by Nigerian-British robotics engineer, Silas Adekunle, Awarri says it wishes to democratise AI in Africa through “collection and annotation of high-quality contextualised and localised data (i.e., native intelligence).

The natural language Data it is collecting, its vision document says, would include all forms of texts, audio and videos of different dialects, accents etc, as well as “geographic data, Demographic data, Cultural data such as arts, music, history, traditions, customs, and beliefs etc, Agricultural data such as farming practices, agricultural inputs, soil data etc, and other environmental data such as natural resources, plant, and animal species etc.”

HausaNLP is focused on building a Natural Language Processing (NLP) for the Hausa language. It is developing an open-source repository of datasets and tools that would be able to perform multiple NLP tasks, including text classification, machine translation, speech recognition, question answering, and named entity recognition.

Various other startups and nonprofits are working in different intersections of Nigerian languages and technology, including NaijaNLP, Yorubanames, and Data Science Nigeria, among others. However, these endeavours are yet to produce products that can serve as alternatives to global translation or generative AI tools.

Alayande commends these initiatives but acknowledges challenges in data computation and digitisation, the availability of local talent, and funding. He stressed that the LLMs and NLPs value chain is complex, expensive and takes time to bear fruit.

For Africa to efficiently join the AI revolution, according to Alayande, the conversation should be more about data sovereignty as well as efficient digitisation and computation. Tubosun advised that the continent and her people have to be clear on the desired goals for joining the AI wave. “What are we trying to achieve? Improvement in educational outcomes? Increased language use? Better integration of African languages in technology? Improved technological literacy? More language interoperability among African cultures, etc. Each will need different strategies.” Without a sense of clarity on purpose, our inclusion may be unhelpful.

By: Oluwatobi Odeyinka

This report was produced with support from the Centre for Journalism Innovation and Development (CJID) and Luminate.

Feature… Not Made For Us: Global AI translation tools distort Nigerian languages

Related Articles

Check Also

Leave a Reply Cancel reply