Microsoft has success with Mixture of Experts technique in Phi-3.5

Microsoft announces a new family of LMs. The Phi-3.5 line includes three models, including, for the first time, a model that uses Mixture of Experts technology. This technology brings the model close to the level of GPT-4o-mini.

Phi-3.5 has been made available by Microsoft on Hugging Face. There are three variants: Phi-3.5-vision, Phi-3.5-MoE and Phi-3.5-mini. In this series of LMs, Microsoft is experimenting with the Mixture of Experts technology for the first time, and that approach appears to be paying off. Phi-3.5-MoE finishes higher than 8B’s Llama-3.1, Gemma-2-9B and Gemini-1.5 Flash in the most commonly used benchmarks for AI models. That is true, while the Phi-3.5 variant consists of much smaller models of 3.8B.

In a Mixture of Experts technique, there are multiple models present, called “experts,” in this case sixteen models. In operation, however, the model uses only 6.6 billion active parameters, by deploying two experts.

Another advantage of the use of this technology is the training of the LLM. In particular, training is less bulky and requires less computing power with cheaper models. Phi-3.5-MoE was trained on 4.9 trillion tokens on 512 H100 GPUs. The mini model from the Phi-3.5 family has the same computing power available and was trained on 3.4 trillion parameters.

Text and graphics

Finally, Phi-3.5-vision was trained with 500 billion parameters on 256 A100 GPUs. The result is a 4.2B model. This model’s notable feature is its ability to process both text and images. Images or videos can, therefore, be given as input.

All three models contain a context window of 128K tokens. They are available through Hugging Face under an MIT license. Developers can thus use the AI models as Microsoft releases them or adapt them to their own needs.

Also read: Does the future of (generative) AI lie in open source?

Microsoft has success with Mixture of Experts technique in Phi-3.5

Must read

Exclusive | Dave Portnoy is quietly shopping a book

Chiefs’ Xavier Worthy arrested on assault charge; team ‘aware and gathering information’

Bracketology: First automatic bid to 2025 NCAA Tournament field on tap as regular season winds down

NetApp, Inc. (NTAP) Gains Momentum as Key AI Infrastructure Supplier Amid Strong Q3 Results

Text and graphics

Latest article

Exclusive | Dave Portnoy is quietly shopping a book

Chiefs’ Xavier Worthy arrested on assault charge; team ‘aware and gathering information’

Bracketology: First automatic bid to 2025 NCAA Tournament field on tap as regular season winds down

NetApp, Inc. (NTAP) Gains Momentum as Key AI Infrastructure Supplier Amid Strong Q3 Results

Viant Technology Inc. (DSP) Reports Record Q4 2024, Driven by ViantAI Innovations

About Us

Popular Category

Latest News

Exclusive | Dave Portnoy is quietly shopping a book

Chiefs’ Xavier Worthy arrested on assault charge; team ‘aware and gathering information’