SqueezeBits wins over Big Tech with AI compression and optimization technology
Shrinking model size without performance loss… also tuning computation order to chip characteristics
A global technology embraced by both AI service providers and AI design companies
Founded by chip design experts… “Set to become key infrastructure in the NPU era”
Kim Hyoung-joon, CEO of Squeezebits, explains at the company’s Seoul Gangnam office on the 11th of last month how its AI compression and optimization technology will become critical infrastructure in the coming neural processing unit (NPU) era. Photo by reporter Heo Jin-seok, jameshur@donga.com
“We can cut the cloud server fees of AI service companies that currently spend from several hundred million to several billion KRW a month by as much as 90%. We are one of the few startups worldwide that reduce the ‘size’ of AI models and increase their speed so they can run optimally on any AI semiconductor.”
This is how 32-year-old Squeezebits CEO Kim Hyoung-joon described the company’s business model at the AI compression and optimization startup’s office in Seoul’s Gangnam district on the 11th of last month.
AI will grow more intelligent with each passing year. However, the volume of computations and data AI must handle will also grow, making it heavier. As AI becomes heavier, it slows down and cloud costs increase.
Hardware companies that produce AI semiconductors also have a dilemma. They need to persuade clients that using their semiconductors will allow existing AI services to be operated economically. Kim said, “We can improve AI so it runs efficiently in line with each semiconductor’s characteristics,” adding, “Thanks to this technology, we are receiving strong interest from both global AI service providers and AI semiconductor design companies.”
● AI on a diet… One server doing the work of 10
The problem Squeezebits aims to solve is clear: compression and optimization (efficiency enhancement) of AI services. Behind the large language model (LLM) boom triggered by ChatGPT lurk infrastructure costs that are difficult to bear. Nvidia’s latest GPU servers each cost several hundred million KRW. If AI programs are optimized, a single server can deliver the performance of 10. Kim said, “There are now many organizations building AI. But running it efficiently is a different problem altogether.”
What, then, does AI optimization actually mean? Kim explained, “When a computer slows down, people who do not understand hardware just reboot it blindly. But an expert who understands computer architecture closes unnecessary programs and tidies up the memory so it runs smoothly again. AI optimization is similar to that.”
The “hardware-based optimization” Kim describes is more than a simple analogy. AI semiconductors operate on two main axes: compute (calculation) and memory (storage). No matter how fast the calculations are, if the speed of fetching data (memory) is slow, the semiconductor sits idle. This is the so-called bottleneck phenomenon. Squeezebits’ technology targets this gap. For example, during the brief wait time while data is being fetched from memory, it assigns other calculations in advance to the idle compute units. It is like a chef who, instead of standing idle while waiting for water to boil, preps ingredients to reduce total cooking time.
Squeezebits CEO Kim Hyoung-joon speaks as a panelist in a session on optimizing the performance of large inference models at “AI Infrastructure Summit 2025,” held in September last year at the Santa Clara Convention Center in California. Provided by Squeezebits
● The magic of hardware-tailored optimization Squeezebits possesses a range of compression and optimization technologies. Compression makes AI models smaller and lighter, while optimization ensures that these smaller models run quickly and efficiently.
One of the key technologies for compressing AI models is “quantization.” It compresses data expressed in long 32-bit units into shorter 8-bit or 4-bit units. Kim said, “When shopping at a supermarket, you do not calculate everything precisely down to the last 10 KRW. You round to 1,000 KRW or 100 KRW units and the result is not significantly different. The technology lies in simplifying AI data while keeping performance degradation at zero or at a minimum.”
Compression also involves a technique called pruning. This method cuts away unnecessary connections among the myriad neural networks in an AI model that do not significantly affect its output. The principle is similar to trimming branches so a tree can grow better.
Among optimization technologies, graph compilation—transforming the computation graph—is representative. It rearranges complex, intertwined computation orders into a sequence that the hardware prefers and can process most quickly.
Squeezebits combines all these techniques to make processing up to 10 times faster. Kim said, “There was an AI service provider that needed to generate 10 images in 10 seconds and was using 10 GPUs. After applying our solution, it achieved the same speed with a single GPU.”
● “Even Intel comes to us”… Already profitable Squeezebits has presented more than 70 papers on compression and optimization at several of the world’s most prestigious AI conferences. Its customers include leading conglomerates such as Naver, Kakao, Krafton, LG, KT and LGU+. Global semiconductor company Intel is a partner. Squeezebits has been recognized for its ability to optimize Intel’s AI accelerator “Gaudi,” earning Intel Partner Alliance Gold status. It has also formed a strategic alliance with Korean AI startup Rebellions and is taking a leading role in expanding the domestic NPU ecosystem.
Could large companies such as Google or Samsung not internalize such optimization technologies themselves? Kim is confident. “They could do it themselves. But we do it better. Very few teams globally understand everything from the hardware’s lowest layer to the algorithm’s highest layer. It is more efficient for them to use us than to build it in-house, which is why global conglomerates are entrusting us with this work.”
● The determination of a POSTECH alumnus: “Do what others cannot” CEO Kim Hyoung-joon is a first-cohort graduate of the Department of Creative IT Engineering at POSTECH (Pohang University of Science and Technology). As an undergraduate, he enjoyed “creating something from nothing,” drafting departmental rules himself instead of following a fixed curriculum and organizing the student council. He had dreamed of founding a startup from early on. He continually reflected on the idea that there would be no chance of success if his capabilities were only on par with others. “I felt a technological barrier to entry was necessary. So I entered graduate school and majored in AI semiconductor design. I believed that understanding hardware was essential to optimize software to its limits,” he said.
In March 2022, he founded Squeezebits together with four others: graduate school colleagues who had studied AI semiconductor design and professors from POSTECH and Seoul National University who provided technical advice. At first, the company focused on “edge AI” compression for devices such as CCTV systems and robots. Although there was strong demand for the technology, successful commercialization was a different matter. Clients wanted to save significant costs with Squeezebits’ technology, but adverse market conditions made it difficult.
The opportunity came with the boom in generative AI. As more companies suffered under high server costs, compression and optimization technologies came into the spotlight. Upon its founding, Squeezebits secured seed investment from Naver D2SF and POSTECH Holdings, which recognized its potential. In 2024, it attracted pre-Series A funding from investors including Kakao Ventures.
Squeezebits expects the NPU market to expand further. Kim said, “As the NPU market that will replace Nvidia GPUs grows, Squeezebits’ role will also expand,” adding, “Our solution is proving that existing AI can run sufficiently fast and efficiently on domestic NPUs.” Squeezebits has been profitable since 2024, its third year of operation.
Looking ahead, Kim said of Squeezebits’ future, “We will become an essential intermediary that breaks down the technological barriers between AI semiconductors and service companies so that anyone can use AI efficiently.”
ⓒ dongA.com. All rights reserved. Reproduction, redistribution, or use for AI training prohibited.
Popular News