Researchers from Google have unveiled a new framework called Conditioned Language Policy (CLP) that promises to revolutionise the finetuning of language models by enabling them to balance multiple conflicting objectives efficiently.
The framework addresses the limitations of traditional single-objective finetuning methods, which often require multiple expensive runs to achieve the desired balance between conflicting goals such as creativity and safety.Â
CLP leverages techniques from multi-task training and parameter-efficient fine tuning to create steerable language models that can dynamically adjust to different objectives during inference without the need for retraining.
Read the full paper here
The key advantage of CLP lies in its ability to combine multiple reward weightings through a parameter-space conditioning mechanism, resulting in models that not only outperform existing methods but also exhibit superior steerability. This allows users to select from diverse outputs that best meet their needs, enhancing both model quality and flexibility.Â
Unlike traditional methods that require separate models for different objectives, CLP uses a single model adaptable to various reward weightings, significantly reducing computational overhead and simplifying deployment.
The CLP framework has significant implications for various applications, including summarisation, conversational agents, and encoding social norms. By enabling language models to balance multiple objectives effectively, CLP can enhance the flexibility and usability of these models in real-world scenarios.
The researchers acknowledge that while CLP offers robust performance across different conditions, further evaluations, including human assessments and red-teaming, are necessary to mitigate potential risks associated with more flexible language models. Future research directions include exploring other conditioning mechanisms, automated tuning of weight sampling distributions, and addressing non-linear reward scalarisation.
Google is making constant moves towards making AI models and frameworks that simplify AI development. Recent one being, at the Google I/O Connect, Google expanded access to the multimodal AI model Gemini 1.5 Pro and the family of open models, Gemma 2, for Indian developers.Â
With the introduction of CLP, it advances language model finetuning by providing a flexible, efficient method for balancing multiple objectives, creating versatile models that adapt to different needs, potentially leading to more capable AI systems.