The Man Behind Unitree: Insights on AI, Adoption, and Growth
As China’s humanoid industry enters the spotlight, these translated pieces with Unitree’s founder share candid thoughts on technology, and what it will take to reach robotics’ "ChatGPT moment"

Editor’s note: China’s robotics industry is having a moment. Over the past year, its robots have moved from factory floors and research labs into the public eye - dancing on national television, sparring in matches, and drawing attention at international trade shows. Among the most visible players is Hangzhou-based Unitree Robotics, known for its agile quadrupeds and humanoid models like the H1 and R1. Its founder, Wang Xingxing, was the youngest entrepreneur to join Chinese President Xi Jinping’s high-profile business symposium in February this year. In July, the company entered pre-IPO counselling with CITIC Securities, signaling its intent to raise capital for the next stage of growth.
This issue of Asia Tech Lens brings you the highlights from two recent appearances by Wang Xingxing in Chinese publications: one from 36Kr, and the second from an exclusive interview with People’s Daily, the official newspaper of the Communist Party.
We chose to translate these pieces because they shed light on where China’s humanoid sector stands today, the challenges ahead, and how one of its most prominent entrepreneurs is preparing for what he predicts will be robotics’ “ChatGPT moment” within the next five years.
Questioning the Hype Around VLA
The much-hyped Vision-Language-Action (VLA) framework has become a go-to in robotics circles.
At its core the VLA is meant to help robots connect what they see (vision) with what they read (language) and then decide what to do (action). It’s designed to let robots understand their environment, interpret human instructions, and perform tasks, without needing to be manually programmed for each one. These models are trained on vast datasets of images, text, and robot actions, enabling them to generate robot actions directly from visual observations and text-based commands.
Wang is blunt about VLA: he sees it as “a relatively simplistic architecture”. His criticism centered on insufficient data: embodied AI lacks the quantity and quality of real-world data necessary for effective VLA deployment.
In his view, the bottleneck isn’t just in the volume or quality of data, but in the models themselves: “Current models are not good enough, and not unified enough.” Without better models and richer datasets, VLA will have a hard time handling the messiness of real-world environments.
Hedging Bets: VLA vs. Video-Based Approach
Wang isn’t rejecting VLA entirely though. Unitree is experimenting with AI-enhanced VLA models. But the company is leaning more toward video-driven methods to develop its robotics cognitive architecture. Inspired by Google's video-driven world model released last year, Unitree tried a similar approach: generate a video “a robot tidying up a room” and then use that video to train a robot to actually do the task.
Wang predicts that video-based paths might advance faster and converge more reliably than VLA, but notes:
“High video quality demands result in excessive GPU consumption.”
“Unitree’s Model Team Is Actually Not Small”
Wang has often emphasized that Unitree's core strength lies in robot hardware and not the “brain” which has led many to assume, “Unitree doesn’t work on robot brains.”
At the WRC, he tried to correct that perception stating that while Unitree remains cautious about model investment, “the number of people in the model team is quite large, but small compared to big AI companies.”
He also believes that team size does not necessarily guarantee success. “More resources, more money, more people cannot guarantee the best, most innovative technological breakthroughs. A small-to-mid-sized team also has the potential to create better models, though pressure would be intense.”
Tackling Compute Limitations with Distributed Clusters
Wang sees decentralized computing as a way forward: “In a factory housing 100 robots, a local distributed server cluster could reduce communication latency (the short delay between sending a signal and receiving a response. In robotics, lower latency means faster reaction times, for example, a robot can detect an obstacle and change course almost instantly) and improve performance.”
He notes power limits as a fundamental constraint:
“To put it simply, a humanoid robot can only carry 100 watts of computing power at maximum. On average, it can only handle 10 watts.”
In the future, he envisions central computing centers in each neighborhood or factory, with robots connected locally to reduce delays and costs.
On Continuous Learning and AI Development
“I personally believe that continuous learning is crucial, for everyone, including myself. Many young people excel in AI because they started studying it early in school, giving them an edge. Maintaining a student-like openness to learning is key. For example, I make time daily to read research papers and write code.
It is important to accurately identify the big direction of research. If you’ve accurately identified where the next breakthrough or where the most promising developments will be, you can direct resources into that area. Research and development timelines can also be greatly shortened if you’ve identified the right avenues of exploration.
50% of your energy should be spent on identifying what to learn, and then 50% of your energy spent on the actual learning. Don’t invest everything into something before exploring all your options. You should use the quickest way to find the most promising path before committing to it.”
Unitree Robots: Entertainment and Beyond
From dancing robots at the Spring Festival Gala to robot fights at the World Artificial Intelligence Conference (WAIC) and WRC, many perceive Unitree’s robots as entertaining rather than functional. Wang admitted that getting robots into factories or households to perform work isn’t realistic yet.

At present, performances are a more attainable application and none have made a bigger impact than the company’s humanoid robots’ dance performance at the CCTV Spring Festival Gala, which far exceeded expectations, even those of the team and the directors. Back in the 2022 Olympics, Unitree’s “Xiao Niu” robot also went viral, but this time, the impact was cultural.
However, internally, a majority of Unitree’s team is focused on figuring out how to make robots actually work.
“Getting robots to work is a huge AI-model challenge, and our current implementations aren’t ideal.”
Wang hopes robots won’t be limited to single-function tasks. Instead, they should be versatile and multifunctional.
When Will Robots Have Their “ChatGPT Moment”?
“Incremental progress is healthier than sudden leaps; it allows society to adapt. Hardware is improving (cost, reliability), but the bottleneck is AI capability. It is kind of like the years leading up to the explosion of ChatGPT. Everyone understands what direction we need to take, where we need to develop, it’s just that no one’s gotten there yet.”
Wang estimates the “ChatGPT moment” for robots could happen as early as two to three years, and at the latest, three to five years.
“Imagine a venue where humanoid robots roam freely and you can ask any robot to do something, and it just does it. That’s the tipping point.”
Adoption, Safety, and Public Perception
“In three to five years, robots will likely become common in service sectors (malls, streets), but home use faces safety/ethical hurdles. I predict that within a decade, society will experience an AI and robotics revolution, and it will totally change the landscape of life.
I’m less worried about robots developing ‘hostile consciousness’ than bad actors misusing them. AI has no emotions, it is totally rational and logical. Robots lack human fears, a damaged limb is replaceable, not a permanent injury.
Some people may not like robots. Regulations can designate robot-free zones (like drone restrictions) to address concerns, while robots can be free roaming in other areas.”
AI and Robotics: Interdependence and Global Competition
“In the past, AI has been confined to single use AI – recognizing a single word or a face or a color. But now AI is multi-functional. It is AI for everything – one size fits all. But it is still limited to linguistic, image-based or digital functions. Its applications in the real world, doing physical work, is still limited. It is still limited in its ability to increase the efficiency of physical labor.
A multi-function AI needs a multi-function robot. As long as AI keeps improving, robotics will keep improving as well. The two go hand in hand.
In the last two to three years, there have been glimmers of this development. The realm of imagination for AI is expanding, and people are optimistic. The attention of major players like governments, Elon Musk and other large tech companies is increasing the pace and optimism of the entire industry.
The market is logical. There are more people who are into humanoid robots than those who are against them. For example if you have two robots and one looks pleasing and the other doesn’t, most people will like the one that looks pleasing. This is reality.”
Advice for the Industry and Broader Vision
“Stay open-minded. Every sector has hype cycles; focus on sustainable progress. Every sector has ups and downs. I think we have to accept these fluctuations and look to the future with confidence. Competition is good, and many new companies have entered the fray in recent years. We want the industry to grow and be competitive, because that’s what will really be good for our company.
At the same time, it is crucial to nurture more talent, because the growth of the industry will depend on the strength of our talent. So our greatest hope is for more people to join the industry and be interested in learning the skills needed so that we can push our industry further together.”
“AI and robotics could surpass the impact of electricity or steam engines, freeing humanity from labor, solving hunger, and elevating living standards. The future of development and tech is intimately tied to the youth. Information and technology will thrive in the cities that manage to attract young people.”
Our Take
Wang Xingxing’s words are both pragmatic and ambitious. He tempers industry hype with caution about technical bottlenecks, yet speaks with conviction about a coming “ChatGPT moment” for robotics. Whether the next three to five years deliver the leap he predicts will depend on the foundations being laid today. For Wang, those foundations are clear: better models, richer data, faster infrastructure, and a talent pool ready to turn spectacle into substance.
For More Info on Asia Tech Lens