Two AI Phones. Two Access Models. One Critical Difference.

Samsung’s Galaxy S26 with Gemini relies on permission-based integrations, while ByteDance’s Doubao phone demonstrated UI-driven automation. The real divide is how AI gains control of the device

Mar 06, 2026

Two models are emerging in the race to build “AI phones.”

One works through approved integrations with apps and operating systems, relying on APIs and permissions that developers explicitly grant.

The other works by reading the screen and acting through the interface like a human user, tapping through apps even without formal integrations.

Both approaches can complete the same task. But they represent fundamentally different ways for AI to gain authority on a device.

And that difference is how the AI gets access and that may ultimately determine which model actually scales.

One Task, Two Paths

Say you tell your phone: “Get me a ride to the hotel.”

Two AI phones can do it.

In one approach, the assistant connects to ride-hailing apps through official integrations. It uses permissions granted by the user and the app developer. The system stays inside Android’s normal app boundaries, and the user can intervene or confirm sensitive steps.

In the other approach, the AI assistant reads what is on the screen and taps through apps like a person would, often using accessibility features. Because it works through the interface itself, it can move across apps even when those apps do not provide official integrations.

That second model entered the spotlight in December 2025, when ByteDance introduced its Doubao assistant on ZTE’s Nubia M153 handset. The device demonstrated an AI agent capable of navigating apps through UI automation.

When Samsung launched the Galaxy S26 with Gemini in February 2026, comparisons quickly followed. Some Chinese tech media even dubbed it the “global version” of the Doubao phone.

It’s a catchy headline. But it collapses a crucial difference.

Prof Bo An, President’s Chair Professor and Head of Division of Artificial Intelligence at the College of Computing and Data Science, NTU Singapore, draws a clear line: “The fundamental difference is where the AI assistant operates.”

One approach works through official system functions and app integrations, while the other operates at the GUI level, interacting with apps the way a person would.

Architecture Determines Control

For many researchers, the real question in the AI phone debate is not features, but system design.

Robert Dahlke, Managing Partner at German firm TNG Technology Consulting, says the core issue is the “role the AI agent plays within the system.”

Dahlke, who also works on building leading open source large language models at TNG, says that in some implementations, he explains, the AI agent effectively acts “as a substitute for the human user.”

“The agent gains privileges close to those of a human user and can tap, navigate the interface, and execute cross-application tasks on the phone in the same way a person would.”

That makes it one of the most powerful ways to enable AI on a device, and also one of the most permissive. The challenge, Dahlke says, is control.

When people use a phone, actions happen step by step. In a fully agentic setup, the AI can replace the user in executing those actions, reducing the control layers that normally sit between intention and execution.

Samsung + Gemini and similar systems take a more constrained approach.

In those designs, the functions an AI agent can perform are defined through “explicit permission mechanisms,” such as APIs, MCP or other structured integrations. This establishes clear capability boundaries and ensures the AI operates only within authorized functions.

Professor Wei Lu from the College of Computing & Data Science at the Nanyang Technological University in Singapore says the issue is not just what the AI can do, but what happens when the user no longer directs each step. In a conventional smartphone setup, the user interacts with individual apps while the operating system enforces the rules around what each app can access. A permission-based assistant still works within that structure.

But when an AI system begins interpreting a user’s broader goal and carrying out the intermediate steps on its behalf, responsibility becomes harder to trace.

As Wei puts it, “Whether that concierge faithfully represents the user’s intent—and who is accountable when it does not—is precisely what this architectural difference makes harder to determine.”

Dahlke notes that similar high-authority agents are already being explored in desktop environments. Tools such as the popular experimental agent systems OpenClaw are sometimes only run on employees’ separate machines, allowing organizations to contain potential risks.

“AI agents may operate with high levels of authority, but they should do so within controlled environments.”

Smartphones make that balance harder to maintain. They are deeply personal devices containing banking information, private communications, identity credentials, and payment systems.

Allowing an autonomous agent to operate freely on such a device, Dahlke warns, could introduce risks such as unintended actions or accidental transactions.

The real challenge, he says, is “finding the right balance between expanding AI capabilities and maintaining system control.”

Reliability is the Real Benchmark

Even before governance questions arise, there is a practical challenge: reliability.

Complex tasks require what researchers call “long-horizon planning.” The agent has to break one goal into many smaller steps, then keep track of what it has already done as it moves across screens and apps.

Prof Bo An explains that this is where problems begin to accumulate.

“Probabilistic errors compound with each action.”

As tasks stretch across multiple screens, systems must implement robust error-recovery mechanisms and maintain strong state management to track progress and user intent.

The difficulty is that mobile interfaces are not designed for automation. Buttons move, pop-ups appear, and security checks interrupt flows.

Those variations can easily confuse an AI agent relying on screen interpretation, causing it to tap the wrong button or get stuck halfway through a task.

Chai Yeow Yeoh, Senior Specialist in Cybersecurity at Singapore Polytechnic’s School of Computing, frames the divide as one between a protocol-driven model and a vision-driven one.

“The main technological difference between these two methods lies in the interaction layer.”

One model, he says, uses a protocol-driven approach that lets AI communicate with apps through structured code and APIs in the background. The other, by contrast, relies on a vision-driven method in which the AI interprets the front end by reading pixels and simulating human actions on the screen.

To simplify, Yeoh says, “consider an AI agent that works mainly through app and system permissions as having backstage access,” while the UI-driven agent is “a highly advanced robot standing in front of the phone.”

That distinction has practical consequences. APIs behave more like contracts. They usually remain stable even when an app’s visual design changes. A UI-driven model is more fragile. Move a button, change a color, or insert a new prompt, and the automation can fail.

Yeoh notes that “APIs act like contracts that don’t change every time an app’s appearance alters.” By contrast, if a developer moves a Submit button or changes its color, “the AI’s visual model might fail to recognize it, leading to failed automations.”

Pradeep Reddy Varakantham, Professor of Computer Science in the School of Computing and Information Systems at Singapore Management University, says the UI-driving approach can be “brittle and not as reliable.”

He adds that if apps worry their data is being used without explicit permission, “they may start behaving adversarially and start adding UI features to fool the OS agent.”

In other words, interface automation risks becoming a cat-and-mouse game between AI agents and app developers.

Accountability Is the Harder Problem

Reliability is only part of the challenge. The deeper issue is accountability.

Ding Xuhua, Professor of Computer Science and Co-Director, Centre on Security, Mobile Applications and Cryptography, Singapore Management University, calls it a trust boundary problem.

“When it is at the OS level, you have to fully trust it as it becomes the boss of the phone,” he says.

If the AI remains an application, the operating system can still regulate its behavior.

But when the agent sits deeper in the system, the bar for oversight rises significantly.

“If the AI system is at the OS level, it becomes much harder to reliably attain accountability and auditability,” Ding explains.

That matters because when something goes wrong, the question is not just whether the AI made a mistake. It is whether the system can clearly show what it did, why it did it, and whether the user properly authorized the action in the first place.

As Prof Wei Lu from NTU notes, “that interpretation is not itself an auditable record in the same way that a permission grant is.”

In other words, when a user approves a goal instead of a specific step, more of the execution path is determined by the AI itself. That can make it harder to reconstruct decisions afterward and harder to assign responsibility when outcomes go wrong.

Such safeguards become especially important in services involving banking, government systems, and identity verification, where unclear access models can quickly become unacceptable.

Ding does not argue that interface automation should never exist. But he says it must operate under strict guardrails and clear accountability frameworks.

Chai Yeow Yeoh, Senior Specialist in Cybersecurity at Singapore Polytechnic’s School of Computing, says systems with this level of autonomy should not run without stronger user oversight. “An AI agent should operate under a Human-in-the-Loop framework that emphasizes observability and final decision-making.”

In practice, that means users should be able to see what the agent is doing, stop it immediately, and explicitly approve high-risk actions before they are completed.

Without those, trust becomes difficult to establish.

Privacy and Exposure

The access model also shapes what the AI is able to observe.

“When an AI agent can operate across multiple apps, it raises concerns about privacy and data exposure,” Yeoh says.

A UI-driven model may require broad visibility into whatever appears on the screen. A permission-based model is narrower by design, accessing only the fields or functions explicitly made available through approved interfaces.

As Yeoh puts it, the Samsung/Gemini approach follows “the principle of Least Privilege by only accessing specific data fields through secure APIs.”

That distinction matters because privacy risk begins not only when an agent acts, but when it sees more than it needs to.

The Ecosystem Decides What Scales

Beyond architecture and reliability lies another factor: the ecosystem.

Kyle Chan, an American researcher and fellow at the Brookings Institution’s China Center, a prominent tech voice on X following the AI race between China and the US, argues that platform partnerships may ultimately matter more than raw AI capability.

“Google approach to AI agents for Android offers a more durable approach than the one by ByteDance. Google is making a longer-term investment in app partnerships. ByteDance is trying to rush ahead without permission and has already run into walls.”

“Google’s approach is more comparable to Huawei’s agent-to-agent framework approach. Because both Google and Huawei build mobile operating systems, they can integrate AI agents far more deeply into their devices.”

He further compares the competition to an earlier platform race.

“Building out a network of app partners will likely be even more important than sheer agentic AI performance. This is like the race to become the next Apple app store, but even bigger.”

That comparison matters because distribution layers shape which services users actually adopt. If everyday tasks increasingly begin with an AI assistant, that layer could influence which apps are called by default and who ultimately owns the user relationship.

Wei Lu says the assistant layer could become more powerful as users shift from opening apps directly to simply stating goals.

When that happens, the competitive question is no longer just which app performs a task best, but which systems the assistant chooses to call first.

One model asks platforms to participate. The other tries to work around them.

That distinction helps explain why app ecosystems may accept one approach more readily than the other—and why scaling an AI phone may depend as much on developer cooperation as on the model itself.

For now, the AI phone race is coalescing around two access models: approved integrations with defined permissions, and interface-level automation with broader system access. Both promise powerful automation.

But in areas where trust, traceability, and responsibility matter, such as banking, government services, and identity systems, that architectural difference becomes decisive.

An AI phone becomes truly viable only when existing services are willing to work with it, tasks complete reliably, and actions can be traced when something goes wrong. That is the standard these systems will ultimately have to meet.

Asia Tech Lens

Discussion about this post

Ready for more?