Microsoft released the family of Phi-3.5 artificial intelligence (AI) models on Tuesday, as the successor of the Phi-3 models introduced in April. The new release comprises Phi-3.5 Mixture of Experts (MoE), Phi-3.5 Vision, and Phi-3.5 Mini models. These are instruct models, so they will not work as a typical conversational AI but will require users to add specific instructions to get the desired output. The open-source AI models are available to download from the tech giant’s Hugging Face listings.
Microsoft Releases Phi-3.5 AI Models
The release of the new AI models was announced by Microsoft executive Weizhu Chen in a post on X (formerly known as Twitter). The Phi-3.5 models offer upgraded capabilities over the predecessor, but the architecture, dataset and training methods largely remain the same. The Mini model has been updated with multilingual support, and the MoE and Vision models are new inclusions in the AI model family.
Coming to technicalities, the Phi-3.5 Mini has 3.8 billion parameters. It uses the same tokeniser (a tool that breaks down text into smaller units) and a dense decoder-only transformer. The model only supports text as input and supports a context window of 1,28,000 tokens. The company claims it was trained using 3.4 trillion tokens between June and August, and its knowledge cut-off is October 2023.
One key highlight of this model is that it now supports multiple new languages including Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, and Ukrainian.
The Phi-3.5 Vision AI model has 4.2 billion parameters and it includes an image encoder that allows it to process information within an image. With the same context length as the Mini model, it accepts both text and images as input. It was trained between July and August on 500 billion tokens of data and has a text knowledge cutoff of March.
Finally, the Phi-3.5 MoE AI model has 16×3.8 billion parameters. However, only 6.6 billion of them are active parameters when using two experts. Notably, MoE is a technique where multiple models (experts) are trained independently and then combined to improve the accuracy and efficiency of the model. This model was trained on 4.9 trillion tokens of data between April and August, and it has a knowledge cutoff date of October 2023.
On performance, Microsoft shared benchmark scores of all of the individual models, and based on the data shared, the Phi-3.5 MoE outperforms both Gemini 1.5 Flash and GPT-4o mini in the SQuALITY benchmark which tests the readability and accuracy when summarising a long block of text. This tests the long context window of the AI model.
However, it should be mentioned that it is not a fair comparison since MoE models use a different architecture and require more storage space and more sophisticated hardware to run. Separately, the Phi-3.5 Mini and Vision models have also outperformed relevant competing AI models in the same segment in some metrics.
Those interested in trying out the Phi-3.5 AI models can access them via Hugging Face listings. Microsoft said that these models use flash attention which will require users to run the systems on advanced GPUs. The company has tested them on Nvidia A100, A6000, and H100 GPUs.