Panasonic HD develops multimodal generative AI “OmniFlow” which enables Any-to-Any generation between text, image, and audio | Innovations/Technologies | Company | Press Releases (2025)

Osaka, Japan, June 4, 2025 – Panasonic Holdings Co., Ltd. (Panasonic HD) and Panasonic R&D Company of America (PRDCA), in collaboration with researchers at the University of California, Los Angeles (UCLA), have developed OmniFlow, a multimodal generative AI that can freely convert different data formats such as text, images, and audios (hereinafter referred to as “Any-to-Any”).

In recent years, research on multimodal generative AI that realizes conversion between different data formats has been actively conducted, but since it is usually necessary to prepare all pairs of data to be handled for training data, the cost of acquiring data increases as the type of data to be handled increases. By flexibly combining generative AI (text ↔ audio, text ↔ image) specialized for each data format, OmniFlow can learn high-precision Any-to-Any models even with a small number of data (text ↔ audio ↔ images) consisting of all three sets of modalities, and has succeeded in significantly reducing the cost of creating training data. (Fig. 1)

This technology has been internationally recognized for its advanced technology and has been accepted at CVPR 2025, a top conference for AI and Computer Vision. It will be presented at the plenary conference to be held in Nashville, USA from June 11, 2025 to June 15, 2025.

■Details of the technology

Panasonic HD and PRDCA are working on research on multimodal generative AI. In recent years, multimodal generative AI that incorporates audio in addition to text and images has been attracting attention, but the method of obtaining data that includes all text, images, and audio has been limited, and it has been costly to increase variations.

The solution to this problem is the key to accelerating the use of multimodal generative AI, and research has been actively conducted in recent years. In fact, a method that can learn even if the combination of different data including all the data formats you want to handle is not completely aligned has recently been proposed, but it is realized by averaging the input data. It can be said that there is still a lot of room for improvement in terms of expressive ability.

On the other hand, we have developed OmniFlow, which extends the existing framework of image generation flow matching*, and can learn complex relationships between data that cannot be obtained by averaging by connecting and processing three different data features during the generation process. (Fig. 2)

* A technology that uses Flow to find the optimal conversion path between arbitrary data.
In recent years, it has been attracting attention as it has been adopted for various generative models, including image generation.

A big advantage of OmniFlow is that you can easily connect AIs that specialize in text-to-image and text-to-audio generation into a single multimodal generative AI. (Fig. 3) Since specialized AI is excellent at generating each data, it was possible to obtain high multimodal performance without learning a large amount of data consisting of all modalities.

In the evaluation experiment, the performance of the “text→image” and “text→audio” generation tasks was compared with existing methods. (Fig. 4) As a result, it was confirmed that OmniFlow has the best performance among any-to-any methods (Generalist) and specialized methods for each task. We also found that the data size required to train OmniFlow can be reduced to up to 1/60 compared to other any-to-any methods.

Panasonic HD develops multimodal generative AI “OmniFlow” which enables Any-to-Any generation between text, image, and audio | Innovations/Technologies | Company | Press Releases (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Arielle Torp

Last Updated:

Views: 5673

Rating: 4 / 5 (61 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Arielle Torp

Birthday: 1997-09-20

Address: 87313 Erdman Vista, North Dustinborough, WA 37563

Phone: +97216742823598

Job: Central Technology Officer

Hobby: Taekwondo, Macrame, Foreign language learning, Kite flying, Cooking, Skiing, Computer programming

Introduction: My name is Arielle Torp, I am a comfortable, kind, zealous, lovely, jolly, colorful, adventurous person who loves writing and wants to share my knowledge and understanding with you.