Thursday, February 13, 2025

Don’t Sleep On Your B-Roll, AI Giants Are Buyers (Guest Column)

Must read

What makes a “good” generative AI product? How does one generative AI product distinguish itself from another? As more AI products hit the market and competition increases, these are the key questions many are asking. The answers to these questions depend on the quality of the content used to train the AI models that power these generative AI products.

For an AI model to generate desired outputs, the model has to first be trained on large content sets that are comprehensive, accurate and reliable. Many models are trained using content that is available on the internet but has not necessarily been vetted. This can lead to problems with product development. For example, if multiple AI models are trained on the same readily available content, then the algorithms of these AI models may begin to make comparable inferences and generate similar or even identical outputs, making the AI models and the outputs they produce less unique, and thus less valuable, to end users.

In addition, using content for training that hasn’t been evaluated in any manner, including with respect to its provenance, can create risk from a number of angles. For instance, does using certain content for AI training implicate intellectual property rights of third parties? These complex IP issues are currently being considered by the courts. And what about bias issues or other inaccuracies in the content, how is that risk accounted for?

This has opened the door for a growing licensing market where AI developers can procure licensed content for training from more trusted sources. Amidst this overcrowded AI product market, AI developers are looking not only to mitigate risks during their training process but also to differentiate their AI models from peer technologies. By training their AI models using content and data that is different, perhaps never been used, they aim to create algorithms that produce unique and therefore more valuable outputs.

At the same time, content creators and owners are seeking ways to monetize their content in the AI marketplace, and to limit the use of their intellectual property for the training of AI models without their express permission.

Quality content is usually at a premium, and the AI training space is no different. A relatively untapped source of quality content that will further the goals of both AI developers and content creators and owners is B-roll and other archived footage. B-roll and archived content provides a fresh revenue stream for content owners who previously didn’t have a way to monetize this content. And for AI developers, this content presents a large cache of potentially unique training material with less risk. 

Content creators on platforms like YouTube, Instagram and TikTok are already participating in this valuable market by selling their unused video content to AI developers. According to Bloomberg News, these deals often net thousands of dollars each, with AI developers paying an estimated $1 to $4 per minute of footage. The value of this footage is determined by a variety of factors, such as the quality of the video, whether it is unique, and how readily available it is to other AI developers.  

This licensing model offers benefits to both the content owner and the AI developer licensee. On the one hand, content owners can now monetize unused or underutilized assets while still maintaining control of their content through selective licensing. They can negotiate contractual provisions that limit use cases and protect against unwanted use or reproduction of premium content. For example, content creators may seek to limit uses of their content in AI models used for purposes that may harm their own business or advance the business of any competitors.

On the other hand, for AI developers, this licensed content offers four key advantages. The first is differentiation from competitors through exclusive, non-public training content in a landscape where many AI models were trained on the same public data. This can provide a competitive advantage to AI developers looking to create the next cutting-edge AI model.

The second is maintaining quality assurance by using human-generated content. This helps to avoid the growing problem of AI-generated content contamination, where AI models are trained on content previously generated by AI, thereby reducing the quality and originality of the outputs.

The third is mitigation of potential security risks through content with known data provenance — a detailed record of the data’s origin, creation process, transformations, and the individuals involved in the data’s manipulation. This helps to prevent malicious training data from compromising their AI models.

The fourth is that the licensing of content for training provides significantly greater certainty in regard to intellectual property and other third party risk concerns. These benefits from exclusive content come at a price, namely in increased costs for development and training.  

Recently making headlines is DeepSeek, which is a Chinese AI model similar to ChatGPT that has quickly become one of the most downloaded apps in the world. Although similar to other AI models on the market, DeepSeek is making a name for itself due to its low development and training costs. However, there is speculation that DeepSeek was trained using distillation, a fairly common training technique in which data is extracted from larger, more capable AI models and allows for training models at a fraction of the cost. Despite DeepSeek’s popularity in the AI industry, the practice of distillation emphasizes the value of seeking exclusive content and data for training purposes, such that an AI model can differentiate itself from others on the market by producing unique outputs from an algorithm trained on unique data.

This newly developing B-roll licensing market represents a unique common ground for AI developers and content owners to create a mutually advantageous relationship. To take advantage of this opportunity, content creators and owners need to develop procedures for reviewing and identifying suitable assets, incorporate clear licensing restrictions to protect premium content, ensure proper rights clearance, and establish pricing strategies based on factors like exclusivity, quality, and type of content. 

Meanwhile, AI developers should focus on establishing due diligence procedures for content verification and quality assessment, mechanisms for handling potential content replication in outputs, and scalable data management systems for managing multiple licensing relationships and securing the content. If both sides of the equation can take these steps, it could mean bolstering an environment where both content owners and AI developers can thrive.

Adrian Perry and Robyn Polashuk are partners at Covington & Burling and co-chairs of its Entertainment and Media Industry group.  

Latest article