It is simply not sustainable to keep training more advanced AI models using current energy technology. We need models to be trained faster, cheaper, and in more environmentally friendly ways. Google DeepMind has now shared new research on JEST (Joint Example Selection Training), a way of training AI models that is 13 times faster and 10 times more power-efficient than current techniques.
As the AI industry grows, so are concerns about the environmental impact of data centers required to train these sophisticated models. The JEST method arrives just in time, addressing the escalating energy demands of AI training processes. By significantly reducing the computational overhead, JEST could help mitigate the carbon footprint associated with AI advancements.
Traditional AI training methods typically focus on individual data points, which can be time-consuming and computationally expensive. JEST innovates by shifting the focus to entire batches of data. Here’s a simplified breakdown of the JEST process:
- Small Model Training: A smaller AI model is trained to evaluate and grade the quality of data from high-quality sources.
- Batch Ranking: This model then ranks data batches based on their quality.
- Large Model Training: The ranked batches are used to train a larger model, selecting only the most suitable data for efficient learning.
By utilizing a smaller model to filter and select high-quality data, the larger model can be trained more effectively, leading to significant performance improvements.
JEST’s efficiency stems from its ability to evaluate batches of data rather than individual examples. This method leverages multimodal contrastive learning, which looks at how different types of data (like text and images) interact with each other. By scoring entire batches and selecting the most learnable subsets, JEST accelerates the training process.
The method can be broken down into two main components:
- Learnability Scoring: This involves using both a learner model (the main model being trained) and a reference model (a pretrained smaller model). The learner’s loss (error rate) and the reference model’s loss are compared to prioritize batches that are both challenging and informative.
- Batch Selection: JEST uses an efficient algorithm inspired by Gibbs sampling to choose the best batches for training. This approach not only speeds up the process but also ensures that the selected batches provide the most learning value.
DeepMind’s experiments with JEST have shown remarkable results. The method achieves state-of-the-art performance with significantly fewer training iterations and lower computational costs. For instance, JEST matches the performance of existing models with up to 13 times fewer training iterations and ten times less energy consumption.
These improvements are not just incremental—they represent a substantial leap forward in making AI training more sustainable and scalable. By reducing the energy required for training, JEST not only cuts costs but also helps address the pressing issue of AI’s environmental impact. According to an analysis by the Electric Power Research Institute, data centers could consume between 4.6% and 9.1% of US electricity by 2030.
However, the researchers note some limitations of their approach. For example, JEST still relies on having access to smaller, well-curated datasets to guide the selection process. Developing methods to automatically infer optimal reference distributions remains an open challenge.
Nevertheless, the dramatic efficiency improvements demonstrated by JEST point to significant headroom for optimizing AI training. As models grow ever larger and more energy-intensive, such innovations will likely prove crucial for sustainable scaling of artificial intelligence capabilities.