Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Meta Platforms has created smaller versions of its Llama artificial intelligence models that can run on smartphones and tablets, opening new possibilities for AI beyond data centers.
The company announced compressed versions of its Llama 3.2 1B and 3B models today that run up to four times faster while using less than half the memory of earlier versions. These smaller models perform nearly as well as their larger counterparts, according to Meta’s testing.
The advancement uses a compression technique called quantization, which simplifies the mathematical calculations that power AI models. Meta combined two methods: Quantization-Aware Training with LoRA adaptors (QLoRA) to maintain accuracy, and SpinQuant to improve portability.
This technical achievement solves a key problem: running advanced AI without massive computing power. Until now, sophisticated AI models required data centers and specialized hardware.
Tests on OnePlus 12 Android phones showed the compressed models were 56% smaller and used 41% less memory while processing text more than twice as fast. The models can handle texts up to 8,000 characters, enough for most mobile apps.
Tech giants race to define AI’s mobile future
Meta’s release intensifies a strategic battle among tech giants to control how AI runs on mobile devices. While Google and Apple take careful, controlled approaches to mobile AI — keeping it tightly integrated with their operating systems — Meta’s strategy is markedly different.
By open-sourcing these compressed models and partnering with chip makers Qualcomm and MediaTek, Meta bypasses traditional platform gatekeepers. Developers can build AI applications without waiting for Google’s Android updates or Apple’s iOS features. This move echoes the early days of mobile apps, when open platforms dramatically accelerated innovation.
The partnerships with Qualcomm and MediaTek are particularly significant. These companies power most of the world’s Android phones, including devices in emerging markets where Meta sees growth potential. By optimizing its models for these widely-used processors, Meta ensures its AI can run efficiently on phones across different price points — not just premium devices.
The decision to distribute through both Meta’s Llama website and Hugging Face, the increasingly influential AI model hub, shows Meta’s commitment to reaching developers where they already work. This dual distribution strategy could help Meta’s compressed models become the de facto standard for mobile AI development, much as TensorFlow and PyTorch became standards for machine learning.
The future of AI in your pocket
Meta’s announcement today points to a larger shift in artificial intelligence: the move from centralized to personal computing. While cloud-based AI will continue to handle complex tasks, these new models suggest a future where phones can process sensitive information privately and quickly.
The timing is significant. Tech companies face mounting pressure over data collection and AI transparency. Meta’s approach — making these tools open and running them directly on phones — addresses both concerns. Your phone, not a distant server, could soon handle tasks like document summarization, text analysis, and creative writing.
This mirrors other pivotal shifts in computing. Just as processing power moved from mainframes to personal computers, and computing moved from desktops to smartphones, AI appears ready for its own transition to personal devices. Meta’s bet is that developers will embrace this change, creating applications that blend the convenience of mobile apps with the intelligence of AI.
Success isn’t guaranteed. These models still need powerful phones to run well. Developers must weigh the benefits of privacy against the raw power of cloud computing. And Meta’s competitors, particularly Apple and Google, have their own visions for AI’s future on phones.
But one thing is clear: AI is breaking free from the data center, one phone at a time.