Google Moves First in Multimodal AI Integration

AI Trend

Google Moves First in Multimodal AI Integration

Dong-A Ilbo | Updated 2026.03.05

Following music-generation AI Lyria 3, even an AI music agent for editing
Targeting a ‘multimodal ecosystem’… Meta, OpenAI, and others also move quickly

Google has recently launched its music-generation artificial intelligence (AI) model “Lyria 3,” moving to build a multimodal ecosystem spanning video, images, and music. With a single click, users will be able to create everything from songs and music videos to album covers at once, and even distribute them via the YouTube platform. Industry observers forecast that Google’s influence within the AI content ecosystem will expand further than before.

According to the information technology (IT) industry on the 4th, Google has accelerated the sophistication of its music AI services by acquiring Producer AI, a developer of “AI music agents,” following the recent release of Lyria 3. Producer AI will serve as an intermediary platform that converts user requirements into audio tracks based on the Lyria model. Even after a track is generated, it can reflect specific user requests such as “Turn up only the drum sound in the song I just made” or “Change the chorus to a female vocal.”

In the case of music content, its broad range of commercial applications, including advertising and dramas, can significantly contribute to monetization. Demand is not limited to consumer use; corporate demand is also substantial, making the business-to-business (B2B) market a viable target. Market research firm Market.us projected that the AI music generation market will grow rapidly from USD 294 million (approximately KRW 423.5 billion) in 2023 to USD 2.66 billion (approximately KRW 3.8317 trillion) by 2032.

In addition, through this acquisition, Google is pursuing a strategy focused on building a multimodal ecosystem encompassing video, images, and music. Multimodal AI refers to systems that can simultaneously understand and process various forms of data, such as text, images, audio, and video. Google previously released the video-generation AI “Veo 3” last year and unveiled the image-generation AI “NanoBanana.” Both AIs run on Google’s AI chatbot “Gemini.” With the addition of Lyria 3, users will be able to produce high-quality songs, music videos, and album covers entirely within Gemini, and distribute the resulting content via YouTube.

As AI-based content businesses expand, not only Google but other global big tech companies such as Meta and OpenAI are also locked in competition to grow their multimodal ecosystems. Meta is reportedly planning to launch “Mango” (project name), which specializes in image and video generation, in the first half of this year (January–June). OpenAI released its video-generation AI “Sora 2” in September last year and launched “GPT Image 1.5,” an advanced version of its image-generation AI “DALL·E.” As with Google, all of these can be run on OpenAI’s AI chatbot “ChatGPT.” OpenAI is reportedly planning to unveil its AI music generator, currently under development, in the first quarter of this year.

In South Korea, both Naver and Kakao are developing multimodal AI models that can understand and generate various forms of data. Kakao in January unveiled the “Kanana Template,” which transforms images shared via its KakaoTalk messenger into videos. Naver is also advancing its solutions that automatically generate images and promotional videos based on its large-scale AI model “HyperCLOVA X.” South Korean AI startup Upstage recently decided to acquire the portal site “Daum” to secure large volumes of image, video, and text data.

Choi Ji-won

AI-translated with ChatGPT. Provided as is; original Korean text prevails.

LIST

Editions

Google Moves First in Multimodal AI Integration