Sunday, January 19, 2025

Google bids for multimodal AI leadership – SiliconANGLE

Must read

With some forecasts calling for the multimodal artificial intelligence market to grow more than 35% annually over the next few years, Google LLC is betting it can grab a pole position.

The company’s cloud computing unit recently asserted that multimodal AI, which combines text, images, video, audio and other unstructured data with generative AI processing, will be one of the top five AI trends of 2025.

At the heart of Google’s multimodal AI strategy is BigQuery, a data warehouse that the company is repositioning as a data lakehouse that can collect and analyze a multitude of data types, said Yasmeen Ahmad, product executive for strategy and outbound product management in data, analytics & AI at Google.

“BigQuery was the original lakehouse before the term even existed,” she said. “We built a relational engine to support all the structured analysis enterprise customers wanted to do.”

Google’s Ahmad says multimodal AI can unlock insights from previously unusable data. Photo: SiliconANGLE

Google estimates that 90% of enterprise data is unstructured, Ahmad said in an interview with SiliconANGLE. By combining technologies such as image and voice recognition with structured data for retrieval-augmented generation training, organizations can unlock insights from previously unusable data, she said.

One of them is the restaurant chain Wendy’s Co. It’s testing an application that combines BigQuery, Google’s Vision AI and Gemini to analyze video footage of drive-through traffic to identify bottlenecks. Data from the observed video images is combined with staffing and scheduling information to optimize staffing levels. “This isn’t just video analysis,” Ahmad said “Video data sits alongside operational data in one unified platform.”

United Parcel Service Inc. built a dashboard that uses truck-mounted sensor data to optimize real-time delivery routes by issuing specific instructions to drivers in real-time. Bell Canada is using AI-generated transcripts of calls to its contact center to train a coaching assistant that delivers feedback to agents.

Multimodal AI can enable retailers to gather customer sentiment from multiple sources such as call centers, social media reviews and mobile app feedback and feed it to a generative AI engine to discover new segments for targeted marketing campaigns. “This combination of multimodal data and AI enables a level of personalization and scalability that was previously unattainable,” Ahmad said.

Gemini can run directly on top of BigQuery’s data foundation, eliminating the need for data transfers. That speeds up application development. Ahmad said many organizations are now rolling out pilots within a few weeks.

Most early applications are for internal use, as organizations exercise caution about turning generative AI loose on their customers. But there is no shortage of opportunities behind the firewall, Ahmad said. “The low-hanging fruit is where customers have lots of data they’ve collected for a long time, and they haven’t been able to do anything with it,” she said. “With BigQuery’s multimodal data foundation, integration with Vision AI and Gemini, it’s easy to make something happen.”

Image: SiliconANGLE/DALL-E

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Latest article