Two AI Phones. Two Access Models. One Critical Difference.
Samsung’s Galaxy S26 with Gemini relies on permission-based integrations, while ByteDance’s Doubao phone demonstrated UI-driven automation. The real divide is how AI gains control of the device
Two models are emerging in the race to build “AI phones.”
One works through approved integrations with apps and operating systems, relying on APIs and permissions that developers explicitly grant.
The other works by reading the screen and acting through the interface like a human user, tapping through apps even without formal integrations.
Both approaches can complete the same task. But they represent fundamentally different ways for AI to gain authority on a device.
And that difference is how the AI gets access and that may ultimately determine which model actually scales.
One Task, Two Paths
Say you tell your phone: “Get me a ride to the hotel.”
Two AI phones can do it.
In one approach, the assistant connects to ride-hailing apps through official integrations. It uses permissions granted by the user and the app developer. The system stays inside Android’s normal app boundaries, and the user can intervene or confirm sensitive steps.
In the other approach, the AI assistant reads what is on the screen and taps through apps like a person would, often using accessibility features. Because it works through the interface itself, it can move across apps even when those apps do not provide official integrations.
That second model entered the spotlight in December 2025, when ByteDance introduced its Doubao assistant on ZTE’s Nubia M153 handset. The device demonstrated an AI agent capable of navigating apps through UI automation.
When Samsung launched the Galaxy S26 with Gemini in February 2026, comparisons quickly followed. Some Chinese tech media even dubbed it the “global version” of the Doubao phone.
It’s a catchy headline. But it collapses a crucial difference.
Prof Bo An, President’s Chair Professor and Head of Division of Artificial Intelligence at the College of Computing and Data Science, NTU Singapore, draws a clear line: “The fundamental difference is where the AI assistant operates.”
One approach works through official system functions and app integrations. The other operates at the GUI level, interpreting what appears on the screen and interacting with apps the way a person would.
Architecture Determines Control
For many researchers, the real question in the AI phone debate is not features, but system design.
Robert Dahlke, Managing Partner at German firm TNG Technology Consulting, says the core issue is the “role the AI agent plays within the system.”
Dahlke, who also works on building leading open source large language models at TNG, says that in some implementations, he explains, the AI agent effectively acts “as a substitute for the human user.”
“The agent gains privileges close to those of a human user and can tap, navigate the interface, and execute cross-application tasks on the phone in the same way a person would.”
That makes it one of the most powerful ways to enable AI on a device, and also one of the most permissive. The challenge, Dahlke says, is control.
When people use a phone, actions happen step by step. In a fully agentic setup, the AI can replace the user in executing those actions, reducing the control layers that normally sit between intention and execution.
The other models including Samsung + Gemini take a more constrained approach.
In those designs, the functions an AI agent can perform are defined through “explicit permission mechanisms,” such as APIs, MCP or other structured integrations. This establishes clear capability boundaries and ensures the AI operates only within authorized functions.
Dahlke notes that similar high-authority agents are already being explored in desktop environments. Tools such as the popular experimental agent systems OpenClaw are sometimes only run on employees’ separate machines, allowing organizations to contain potential risks.
“AI agents may operate with high levels of authority, but they should do so within controlled environments.”
Smartphones make that balance harder to maintain. They are deeply personal devices containing banking information, private communications, identity credentials, and payment systems.
Allowing an autonomous agent to operate freely on such a device, Dahlke warns, could introduce risks such as unintended actions or accidental transactions.
The real challenge, he says, is “finding the right balance between expanding AI capabilities and maintaining system control.”
Reliability is the Real Benchmark
Even before governance questions arise, there is a practical challenge: reliability.
Complex tasks require what researchers call “long-horizon planning.” The agent has to break one goal into many smaller steps, then keep track of what it has already done as it moves across screens and apps.
Prof Bo An explains that this is where problems begin to accumulate.
“Probabilistic errors compound with each action.”
As tasks stretch across multiple screens, systems must implement robust error-recovery mechanisms and maintain strong state management to track progress and user intent.
The difficulty is that mobile interfaces are not designed for automation. Buttons move, pop-ups appear, and security checks interrupt flows.
Those variations can easily confuse an AI agent relying on screen interpretation, causing it to tap the wrong button or get stuck halfway through a task.
Pradeep Reddy Varakantham, Professor of Computer Science in the School of Computing and Information Systems at Singapore Management University, says the UI-driving approach can be “brittle and not as reliable.”
He adds that if apps worry their data is being used without explicit permission, “they may start behaving adversarially and start adding UI features to fool the OS agent.”
In other words, interface automation risks becoming a cat-and-mouse game between AI agents and app developers.
Accountability Is the Harder Problem
Reliability is only part of the challenge. The deeper issue is accountability.
Ding Xuhua, Professor of Computer Science and Co-Director, Centre on Security, Mobile Applications and Cryptography, Singapore Management University, calls it a trust boundary problem.
“When it is at the OS level, you have to fully trust it as it becomes the boss of the phone,” he says.
If the AI remains an application, the operating system can still regulate its behavior.
But when the agent sits deeper in the system, the bar for oversight rises significantly.
“If the AI system is at the OS level, it becomes much harder to reliably attain accountability and auditability,” Ding explains.
In practice, that means ensuring that the system can record what the agent did, reconstruct workflows after the fact, and verify that sensitive actions were properly authorized.
Such safeguards become especially important in services involving banking, government systems, and identity verification, where unclear access models can quickly become unacceptable.
Ding does not argue that interface automation should never exist. But he says it must operate under strict guardrails and clear accountability frameworks.
Without those, trust becomes difficult to establish.
The Ecosystem Decides What Scales
Beyond architecture and reliability lies another factor: the ecosystem.
Kyle Chan, an American researcher and fellow at the Brookings Institution’s China Center, a prominent tech voice on X following the AI race between China and the US, argues that platform partnerships may ultimately matter more than raw AI capability.
“Google approach to AI agents for Android offers a more durable approach than the one by ByteDance. Google is making a longer-term investment in app partnerships. ByteDance is trying to rush ahead without permission and has already run into walls.”
“Google’s approach is more comparable to Huawei’s agent-to-agent framework approach. Because both Google and Huawei build mobile operating systems, they can integrate AI agents far more deeply into their devices.”
He further compares the competition to an earlier platform race.
“Building out a network of app partners will likely be even more important than sheer agentic AI performance. This is like the race to become the next Apple app store, but even bigger.”
That comparison matters because distribution layers shape which services users actually adopt. If everyday tasks increasingly begin with an AI assistant, that layer could influence which apps are called by default and who ultimately owns the user relationship.
For now, two models are taking shape in the AI phone race.
One is built around approved integrations and defined permissions.
The other relies on interface-level automation and broader system access.
Both approaches promise powerful automation.
But where trust, traceability, and responsibility matter—in banking, government services, and identity systems—the difference in access models becomes decisive.
An AI phone becomes truly viable only when existing services are willing to work with it, tasks complete reliably, and actions can be traced when something goes wrong. That is the standard these systems will ultimately have to meet.
Related Reading On Asia Tech Lens
Why ByteDance’s AI Phone Hit a Wall: Security, Fair Play, and the Economics of Attention
The rollout showed how quickly platforms push back when a phone-level agent starts acting across apps without approved access.
The Chinese New Year AI Gateway War: The Big Four’s Fight for Daily Habit
How China’s tech giants are competing to become the default AI gateway, and why habit, distribution and control of the user layer matter as much as model power.
What Tencent’s “Yuanbao PAI” Reveals About Its AI Strategy
A consumer AI read on how Tencent is using product design, distribution and social mechanics to make AI part of everyday behavior.
Why Smartphone Prices Could Rise in 2026 as RAM Costs Surge
Why the AI phone race is also a hardware story, with rising memory costs starting to reshape smartphone pricing and device trade-offs.
Agentic AI Can Act. Singapore’s New Rulebook Says: Prove You Can Stop It
A governance companion on why autonomy needs limits, visibility and the ability to step in when something goes wrong.


