:A group of engineers, researchers and a Silicon Valley-based chip company collaborated to release advanced Arabic language software that can power generative AI applications.
The new large language model called Jais contains 13 billion parameters that was made from a big batch of data combining Arabic and English, a portion of which is from computer code.
The group, which included academics and engineers embarked, on the project in part because they said there are few large language models that are bilingual.
The new language model was created with the help of supercomputers produced by the Silicon Valley-based Cerebras Systems, which designs dinner plate-sized chips that compete with Nvidia’s powerful AI hardware. Nvidia’s chips are in short supply, which has driven companies around the world to seek alternatives.
Named after the highest peak in the United Arab Emirates, Jais is a collaboration between Cerebras, Mohamed bin Zayed University of Artificial Intelligence and a subsidiary of the Abu Dhabi-based tech conglomerate G42 called Inception, which focuses on AI.
Because there is not enough Arabic data to train a model of Jais’ size, the computer code within the English language data helped train the model’s ability to reason, according to Mohamed bin Zayed University of Artificial Intelligence professor Timothy Baldwin.
“(Code) gives the model a big leg up in terms of reasoning abilities, because it spells out the (logical) steps,” Baldwin told Reuters.
Jais will be available via an open source license.
The group trained the Jais model on a Cerebras supercomputer called a Condor Galaxy built in partnership with G42. This year Cerebras announced it had agreed to build three such units with G42, with the first scheduled to arrive this year and two additional units to be delivered in 2024.
“This model was trained, from start to finish, of 13 billion (parameters), in three and a half days,” Cerebras CEO Andrew Feldman said. “But there was months of work before that.”