Your AI Agent Has a UX Problem

What Apple's Research on Computer Use Agents Means for Your Business

I watched a demo last week where an AI agent booked a flight, reserved a hotel, and rented a car. All by itself. Impressive, right?

Then it bought the wrong insurance, upgraded to a suite nobody asked for, and sent a confirmation email to the wrong contact. The person watching had no idea any of it happened until after the fact.

That's the state of computer use agents right now. The technology works. The experience around it doesn't.

A new research paper out of Apple and Carnegie Mellon, "Mapping the Design Space of User Experience for Computer Use Agents" (Cheng et al., 2026), finally puts structure around what I've been telling clients for months: the hard part isn't getting AI to click buttons. The hard part is making sure humans can actually work with these things without losing control of their own business.

Here's what you need to know.

What Are Computer Use Agents, and Why Should You Care?

Computer use agents are AI systems that interact with your software the same way a human would. They click buttons, fill out forms, scroll pages, move data between systems. Think of them as a virtual employee who can operate any application with a screen.

This isn't theoretical. Anthropic's Claude, OpenAI's Operator, Google's Project Mariner, and several others are already shipping these capabilities. If you're running a business between $2M and $35M, you're going to encounter these tools within the next 12 months. Probably sooner.

The promise is obvious: hand off repetitive browser and desktop tasks to an agent, and get your time back. The reality is messier. Because right now, most of these agents are built by engineers who care deeply about making the model work, and not nearly enough about making the experience work for the person who has to trust it.

What the Research Actually Found

The Apple/CMU team did something useful. They reviewed nine existing computer use agents, interviewed eight UX and AI practitioners, and then ran a Wizard-of-Oz study with 20 participants where a human pretended to be an AI agent while researchers watched how people reacted.

What came out of it is a taxonomy, a structured map of everything that matters for the user experience of these agents. Four big areas, 21 specific dimensions. I'm going to translate the ones that matter most for your business.

1. User Queries: How You Talk to the Agent

The research found that people talk to agents differently depending on what they're trying to do. When you have a specific task, like "book me a flight to Denver on March 15th," you want the agent to execute and come back when it's done. When you're exploring, like "help me figure out where to go for vacation," you want the agent to collaborate with you, suggest options, and check in.

This matters because most agents today handle both situations the same way. They either execute everything blindly or ask you to confirm every single step. Neither works.

The other big finding here: ambiguity. When your instruction could mean two different things, like "find the Springfield listing" when there are six Springfields on screen, what should the agent do? Some people wanted it to just pick the most likely option. Others wanted it to stop and ask. The right answer depends on how much is at stake.

For your business, this means any agent you deploy needs to understand context. A $50 decision? Just pick. A $5,000 purchase? Ask first.

2. Explainability: Knowing What the Agent Is Doing

Here's where it gets interesting. Every participant in the study said they wanted to see the agent's actions on screen. The cursor moving, the buttons being clicked, the forms being filled. It made them feel like they were still in charge.

But almost nobody wanted to watch the whole time.

That's the tension. People want visibility, but they don't want to babysit. If you're staring at the agent doing its thing for 20 minutes, you haven't saved any time at all. You've just turned yourself into a supervisor for a robot.

What people actually wanted was a summary at the end, notifications when something went wrong, and detailed explanations only at the moments where the agent had to make a judgment call. One participant put it well: "I don't want to see it clicking a button when there's only one button. I want to know when it made a decision."

For operations leaders, this is a design principle worth remembering: show people the decisions, not the clicks.

3. User Control: The Ability to Intervene

This is the section that should keep you up at night if you're deploying any kind of AI agent in your business.

Participants were consistent on one point: they never want an agent to take an irreversible action without asking first. Buying something? Ask. Sending a message to a client? Ask. Deleting data? Definitely ask.

But they were equally consistent on the flip side: don't ask me about every little thing. If you're just navigating to a page, filling in a search filter, scrolling through results, just do it.

The research identified tiers of risk that determine how much control users want. Low-risk actions like browsing and searching? Let the agent run. Medium-risk actions like configuring settings? Show a plan first. High-risk actions like payments and communications? Full stop, get permission.

One participant described the ideal as: "I don't want it to tell me anything until it gets to the point where I have to click finish." That's the sweet spot. Handle the boring stuff, pause before anything consequential.

The study also found something practical about error recovery: people don't just want to stop the agent when it messes up. They want to rewind to a specific step and let it try again from there. Like an undo button, but for an entire workflow. One person asked for a visual decision tree showing every choice the agent made, so they could point to exactly where it went wrong and say, "go back to this point and take the other path."

4. Mental Models: What People Think These Agents Can Do

This might be the most important finding for anyone rolling out AI tools to a team.

People don't understand what these agents can and can't do. And when they don't understand the boundaries, one of two things happens: they either don't trust it at all and never use it, or they trust it too much and get burned.

The research found that watching the agent work in real time helped people calibrate their expectations. When they could see the agent scrolling through six options on a page, they understood it was only looking at those six, not every option on the internet. That's a massive difference in expectation.

Participants also cared a lot about scope. What apps can the agent access? What data can it see? Can it open my health app? Can it read my messages? People wanted explicit controls, similar to the permission prompts you get when installing a phone app. "Does this agent have access to your email? Your calendar? Your payment information?"

For your business: before you hand your team an AI agent, make sure they know what it can see, what it can do, and what it can't. The five minutes you spend on that upfront will save you weeks of either non-adoption or disaster recovery.

So What Does This Mean for Your Business?

Here's the practical takeaway. Computer use agents are coming to your stack whether you plan for them or not. Your team members are already experimenting with browser automation, AI assistants, and agent-based workflows on their own. Snow melts from the edges.

The question isn't whether to adopt these tools. It's whether you build the organizational infrastructure to use them well. That means:

  • Defining which tasks are appropriate for agent delegation and which aren't. Not everything should be handed to AI. Payments, client communications, data deletion, anything with real consequences needs a human checkpoint.

  • Setting clear scope boundaries. Which systems can the agent access? What data is off-limits? This is your governance framework, and you need it before deployment, not after the first incident.

  • Training your team on how to work with agents, not just how to use them. The difference is huge. Using an agent means typing a prompt and hoping for the best. Working with an agent means understanding its limitations, knowing when to intervene, and being able to recover when things go sideways.

  • Choosing tools that give you visibility and control. If an agent can't show you what it did, can't let you undo its actions, and can't pause before irreversible steps, it's not ready for your business. Period.

What I'm Watching

The research paper calls out something I think is going to define the next 18 months of AI tool development: the tension between autonomy and control.

Every vendor wants to show you a demo where the agent does everything by itself. It's impressive. It makes for great marketing. But the study showed that users don't actually want a fully autonomous agent. They want a capable one that knows when to check in.

Think about the best executive assistant you've ever worked with. They didn't ask you about every paper clip. But they also didn't book a $15,000 conference room without checking. They had judgment about when to act and when to pause. That's what these agents need, and we're not there yet.

The companies that figure out this balance first, building agents that are useful without being dangerous, visible without being annoying, and capable without being opaque, those are the ones whose tools will actually get used. The rest will join the graveyard of software that works great in demos and collects dust in the real world.

The First Step

Pick one repetitive, low-risk process your team does every week. Something boring and browser-based, like pulling data from a website into a spreadsheet, or filling out the same form across three different systems.

Try running it with a computer use agent. Watch what happens. Note where it gets confused, where it makes assumptions, where it needs your input. That 30-minute experiment will teach you more about the current state of AI agents than any demo or white paper.

Then ask yourself: would I trust this to run while I'm in a meeting? If the answer is no, you've found the gap between where the technology is and where it needs to be.

That gap is where the real work lives.

This is what I cover in detail in The Modern Digital Business Blueprint. If you want a structured approach to leveraging AI and frontier technologies across your business, with hands-on guidance and a group of peers doing the same work, the next cohort opens soon.

If you're not already reading Signal to Scale, that's where I share tools and approaches like this every Friday. [Subscribe here]

Source: Cheng, R., Liang, J.T., Schoop, E., & Nichols, J. (2026). "Mapping the Design Space of User Experience for Computer Use Agents." 31st International Conference on Intelligent User Interfaces (IUI '26), Paphos, Cyprus.