Arabic AI Systems: Seizing the Opportunities

24 July 2024

As we’ve already explored, there is no shortage of challenges in building AI systems for the Arabic language. So why are massive corporations sinking significant resources into the task? Simply put, there are too many opportunities to ignore. 

The previous blog delved into the challenges that Samsung, an international electronics giant, faced when building Arabic AI systems for its Galaxy AI systems. But the serious efforts they put into overcoming those issues represent the potential Samsung — and many other corporate giants — see in perfecting Arabic AI. What’s more, business advocates in the MENA region believe a proactive attitude toward emerging technologies is essential to keep their countries from lagging behind. 

The problem? The vast majority of large language models fueling AI systems are composed in Latin-alphabet languages, mostly English. 

“Making access to AI tools exclusive to those who can speak specific languages could prevent disadvantaged cross-sections of societies from reaping the benefits of AI,” Mohammed Soliman, director of strategic technologies and the cyber security program at the Middle East Institute in Washington DC, told CNN. “[These LLMs] lack awareness of other cultures, adversely affecting the user experience for people of diverse backgrounds.”
But companies are sinking big money into meeting those problems by developing solutions. Microsoft, for instance, plans to invest $1.5 billion in Abu Dhabi-based company G42, promising “world-leading standards for safe, trusted and responsible AI.” 

“Microsoft’s investment deepens the reciprocal commitment to this strategic partnership,” the company stated in its blog post. “G42 will run its AI applications and services on Microsoft Azure and partner to deliver advanced AI solutions to global public sector clients and large enterprises.”

The Washington Post reports that geopolitical considerations impact their decision-making. Dominating investments within the MENA region could reduce “China’s influence in the Gulf region amid rising technological competition with the United States.”

“[Coordinating business and national security interests] includes collaborating with countries like the UAE, a global player in cutting-edge technology, and working toward verifiable commitments on how these technologies should be safely developed, protected, and deployed,” U.S. Department of Commerce spokesperson Brittany Caplin told the Washington Post. “When responsibly managed, investments like the one announced today have the potential to further innovation in digital technologies around the world.” 

According to the Washington Post, the G42 deal with Microsoft began unfolding in late 2022. With an official partnership announced, G42 told the post it has busied itself removing Chinese components from its technology while incorporating more of Microsoft’s. 

“Through Microsoft’s strategic investment, we are advancing our mission to deliver cutting-edge AI technologies at scale,” Peng Xiao, G42 chief executive officer, said in the deal announcement. “This partnership significantly enhances our international market presence, combining G42’s unique AI capabilities with Microsoft’s robust global infrastructure. Together, we are not only expanding our operational horizons but also setting new industry standards for innovation.”

According to CNN, Jais is another AI system being developed with an eye toward the language’s unique challenges. A collaboration between Abu Dhabi’s Mohamed bin Zayed University of Artificial Intelligence, Silicon Valley-based Cerebras Systems, and G42 subsidiary Inception, Jais is trained on both English and Arabic datasets, substantially improving its accuracy and clarity over systems trained solely on English datasets. It makes the system far more dexterous at managing differing Arabic dialects as well, according to Timothy Baldwin, acting provost and professor of natural language processing at Mohamed bin Zayed University of Artificial Intelligence.

“There’s certainly room for improvement there, but the focus has been more on the robustness in terms of being able to understand if we do have more informal inputs to the model,” he told CNN.