Persona AI Unveils Dialect-Savvy Voice Tech Amid Sovereign AI Race

Persona AI

Persona AI Unveils Dialect-Savvy Voice Tech Amid Sovereign AI Race

Dong-A Ilbo | Updated 2026.01.19

Image provided by Persona AI

As global competition for hegemony in artificial intelligence (AI) intensifies, countries are accelerating efforts to build “Sovereign AI” based on their own languages, data, and infrastructure.

Sovereign AI goes beyond simply possessing AI; it refers to AI sovereignty that enables a country to control and operate its linguistic, cultural, and industrial data independently, without external dependence. In particular, speech AI is regarded as a core technology that directly determines language sovereignty.

Against this backdrop, PersonaAI (CEO Yoo Seung-jae, hereafter Persona AI) has unveiled “SSTT (Sovereign AI Speech to Text),” a next-generation speech AI model that precisely reflects the characteristics of the Korean language, following two years of development. SSTT is characterized not only by speech recognition, but also by achieving the highest level of precision in speech data in Korea.

The new model was trained on a Korean speech dataset of more than 40 million utterances (over 50,000 hours of audio data), with 13,200 hours—equivalent to one-quarter of the total training volume—allocated to dialect data. This enables it to accurately distinguish regional dialects and unique vocabulary across five major regions: Gyeongsang, Jeolla, Chungcheong, Gangwon, and Jeju. It also incorporates heavy dialects that are difficult for AI to recognize, unique regional expressions, and the vocal characteristics of speakers aged 60 and above, enabling communication that spans both generations and regions.

By going beyond the limits of existing speech recognition focused on standard Korean, the model is designed to recognize Korean dialects and perform speaker diarization, functioning both in real time and offline. It supports pre-processing features such as noise and reverberation reduction, automatic gain control (AGC) for far-field recognition, deep learning-based voice activity detection, and speaker change point detection, integrating a suite of high-quality speech technologies.

Existing speech recognition models (STT, Speech to Text) are core technologies that convert sound into text, but they have had limitations in industrial settings due to low recognition accuracy caused by differences in dialect, intonation, and speech speed. As a result, market adoption has been slow even in sectors with high demand for speech recognition, such as call centers, public civil services, and medical and manufacturing sites.

Persona AI’s SSTT can separate up to 20 speakers, achieving a dramatic performance improvement compared with existing technologies that typically handled only 4 to 5 speakers. It can precisely distinguish “who said what” even in multi-party simultaneous conversations, expanding its potential use cases to meeting transcription, on-site monitoring and control, and multi-user interfaces.

This level of technological advancement is viewed as a key component in preparing for the era of Physical AI. Most Physical AI devices, including robots, kiosks, industrial equipment, and autonomous systems, are expected to be controlled and interacted with primarily via voice. In this process, reliance on foreign speech models from specific countries or companies could create structural risks related to data sovereignty, security, and service continuity.

Industry observers consider Persona AI’s next-generation speech AI model to be a highly important strategic asset from a Sovereign AI perspective. Large-scale speech models that can accurately recognize Korean, including regional dialects, are technologies that are difficult to replace externally in the short term, and are seen as directly linked to securing AI sovereignty at the national level.

Persona AI develops AI models and also provides industry-specific solutions, focusing on AICC (AI Contact Center) and Generative AI (Gen AI). Recently, the company won a CES 2026 Innovation Award, following last year’s achievement, marking a three-crown win for the second consecutive year and demonstrating its technological competitiveness on the international stage. It is also developing VLA (Vision-Language-Action) technology, regarded as the core engine of Physical AI, and is presenting a next-generation operating architecture that connects robots, devices, and AI.

A Persona AI representative stated, “In the competition for Sovereign AI, the most important factor is not simply the size of the model, but how deeply it understands the national language and real industrial environments,” adding, “SSTT is a core model that can serve as a practical foundation for Korean-style Sovereign AI.”

Choi Yong-seok

AI-translated with ChatGPT. Provided as is; original Korean text prevails.

LIST

검증된 경영 콘텐츠

DBR의 교육솔루션

Persona AI Unveils Dialect-Savvy Voice Tech Amid Sovereign AI Race