LG AI Research Institute announced on the 9th that it has unveiled “EXAONE 4.5,” a multimodal artificial intelligence (AI) model capable of understanding and reasoning with text and images simultaneously. Multimodal AI refers to AI that can concurrently understand and process various forms of data such as text, images, audio, and video.
EXAONE 4.5 combines an internally developed vision encoder with a large language model (LLM) to understand text and images together. It has strengths in reading and analyzing complex materials used in real industrial settings, such as contracts, technical drawings, and financial statements. LG AI Research Institute explained that this model represents a developmental stage that will enable its proprietary foundation model “K-EXAONE” to handle a wider variety of data formats in the future.
It has also demonstrated competitiveness in terms of performance. Its average score across five indicators in the science, technology, engineering, and mathematics (STEM) domains was 77.3 points, exceeding OpenAI’s “GPT-5 mini” (73.5 points), Anthropic’s “Claude Sonnet 4.5” (74.6 points), and Alibaba’s “Qwen3” (77.0 points). It also surpassed GPT-5 mini and Claude Sonnet 4.5 in the average score across 13 visual capability evaluation indicators. LG AI Research Institute stated, “AI has now reached a level where it can understand the context of images and text together and respond to questions.”
Lee Min-a
AI-translated with ChatGPT. Provided as is; original Korean text prevails.
ⓒ dongA.com. All rights reserved. Reproduction, redistribution, or use for AI training prohibited.