Commando – Oleksii

A pattern I worked out for myself - what an AI interface might look like once chat stops being enough.

If you’d like to play with it - commando-beryl.vercel.app.

Where this started

After a few months of using AI tools every day, I noticed chat had stopped working for me. Not because chat is bad - for a single question, it’s fine. But the more I worked with agents that took longer than a few seconds to finish anything, the more I felt the shape of the interface fighting me.

If I wanted two things running at once, I had to open two tabs. If I wanted to come back to a thread three days later, I had to scroll up through everything. If I had four tasks going across different tools - one in Cursor, one in a chatbot, one in a Figma plugin, one in Slack - there was no place where I could see all of them in one view. The work was parallel. The interface was still pretending it was a conversation.

So I started asking: if chat is the wrong shape, what’s the right one? That’s what this is. Me trying to figure out what I actually wanted from something I was using every day.

Looking for a shape

It took a few directions before something stuck.

The first thing I tried was just a better chat - threads, branching, multiple panes. It helped a little, but it still felt like trying to make a messaging app do something it wasn’t built for. The core metaphor was still wrong.

What I kept circling back to was real-time strategy games. In something like Starcraft, the player runs dozens of units across a map at once. There’s a tight vocabulary - selections, hotkeys, unit abilities - and good players move through it incredibly fast. The screen is dense but stays readable. The language is small enough to memorize, and after a while you stop looking at the UI.

That kind of compressed vocabulary felt like exactly what was missing from chat. Once I started looking at AI interfaces through that lens, the analogy was hard to put down.

The shape I ended up with

After a few iterations, the prototype settled on three pieces.

There are artifacts - units of work that live on the canvas. A file, a link, a PR, a Figma file, something an agent produced earlier. Each has a position, a type, a state, and a small preview.

There are agents - who actually do the work. Each has a color, an identity, a configurable set of actions, and a memory. Several can be running in parallel.

And there are actions - what an agent does to your selection. Refactor, Slides, Convert, Test. The set isn’t fixed - you can add your own.

The loop is short. Pick some artifacts. Click an agent. Click an action. The agent gets to work, the card shows it’s busy, and when it’s done the result appears nearby on the canvas. No prompt textbox. No chat. You point at things and tell agents what to do.

It sounds simple written down, but I went back and forth on every part of it before it settled. The rest of this is the questions I had to work through.

Did actions need to be classified?

I noticed early on that not all actions did the same kind of thing. Some took inputs and made something new. Some took one thing and reshaped it. Some modified things in place. At first I treated them all the same and just let agents do whatever - but then it bothered me that you couldn’t tell from looking at the dock what was about to happen to your canvas.

Out of that came the taxonomy. I split actions into three classes and gave each a color.

Producers take what you’ve selected and make something new. Most actions ended up here - Slides, Document, Deploy, Test.

Transformers take one artifact and convert it into a different format. Same content, different shape. Convert is the clearest example.

Mutators modify an artifact in place and keep a version history on the card. Refactor on code was what I had in mind here.

🚀

Deploy

Producer

What clicked for me was how much of the predicting the color did on its own. Once you know what each color means, you know what will happen before clicking. It saved a lot of cognitive load I hadn’t realized was there until I removed it.

What changed along the way

A handful of decisions ended up reversing.

Multi-agent selection. I started with the idea that one command could target several agents at once. It sounded powerful. In practice, every interesting question - which agent owned a failure, whose palette took precedence, how to attribute the output - had no clean answer. I pinned each command to one executor and the problems went away. Parallelism still works. It just comes from running several commands at the same time, not from one command across many agents. I was hesitant to drop the idea, but I haven’t missed it since.

Universal action palette. At first every agent had access to every action. That made the agents interchangeable, which kind of defeated the point of having more than one. I split it - every agent gets a standard core for discoverability, specialized actions are opt-in.

Replace-source mutators. The first version of Refactor just overwrote the source. Cleaner conceptually. But it broke the “sources stay untouched” rule the rest of the system relied on. Mutators got versioning instead - v1, v2, v3 stacked on the same card, original preserved. Slightly more state on the canvas. A lot more trust.

Universal canvas. A single infinite canvas would have turned into spaghetti after twenty tasks. I added workspaces. Each canvas is a project - named, persistent, separate. Switching contexts means switching canvases. Spatial memory only works when the contexts are actually distinct.

How chains came in

A pattern surfaced from actually using the prototype. I’d fire an action, then immediately want to fire another on its output. Pause, breathe, find the new card, select it, click again. That came up often enough that the friction started to bug me.

So I added chaining. After firing an action, a brief time window opens. While it’s open, any action you click joins the queue, shown as numbered pills above the dock. When the running action finishes, the next one runs - on its output if the previous was a producer or transformer, on the same selection if it was a mutator. Queueing something new resets the time window, so you control how long the chain gets.

It’s a small feature, but it changed how the prototype felt. What had been a sequence of individual commands started feeling like one fluid instruction.

Speed turned out to be load-bearing

I didn’t set out thinking speed was the point. But somewhere through prototyping I noticed the moment Commando started feeling right was the moment I stopped reaching for the mouse.

So I added shortcuts. Every agent has a number on the dock, every action has a letter. Selection happens on the canvas - the one mouse move you actually need - and everything else lives on the keyboard. Hit 2 for the second agent, S to fire Slides on the selection, then C to queue Convert behind it. Three keys, and a multi-step pipeline is dispatched.

Compare to writing the same intent in chat - “hey, take these files and turn them into a deck, then convert it to PDF.” A paragraph of typing, every time, with no muscle memory carryover from the last time. In Commando, the same intent is a selection and three keys, and the third time you do it your fingers already know the sequence.

The visual side of the dock is for learning. The keyboard side is for working. Once you know what each agent does and what each color means, you stop looking at the dock at all. You’re just playing it.

Where it fits

The harder problem was working out what Commando actually is, in a landscape where AI agents already live in plenty of places.

The problem wasn’t any of them in isolation. It was the same one from earlier - several agents running across different surfaces, no place to see them together or trace what came from where. The agents were parallel. My view of them was not.

Commando, as I started to see it, isn’t another surface to add to the pile. It’s a place for the parts of the work that don’t have a natural home - orchestration, the cross-tool view, the sense of what’s actually in flight. The other tools keep doing what they do. Commando is the layer above them that lets you see the whole picture at once.

I’m not sure yet how clean that boundary stays in practice. But that’s where I landed for now.

What I take from this

This was the first time I’d designed a UX pattern as the deliverable, not as a means to something else. I started by drawing screens, and at some point realized the screens weren’t the design. The rules underneath them were - which pieces exist, how they combine, what each visual choice tells you. The screens were just where you saw the result. That shift surprised me, and I expect it’ll change how I approach harder interaction problems from here.

The metaphor itself was another surprise. Half-borrowing RTS visuals would have been costume. What made it actually work was committing to the mechanics - hotkeys doing real selection, status living on the object, results appearing physically near their sources. Going in, I wouldn’t have predicted how much depended on that.

What surprised me most, in the end, was how much of the design work turned out to be positioning. The version of Commando that goes anywhere isn’t “a new way to interact with AI.” That’s too broad and fights with too much. It’s the layer above other tools for people who already use several AI agents and don’t have a way to manage what comes back. Working out that framing took as long as designing the pattern.

What I’m still working out is whether Commando is a real tool, or just a prototype that feels good in controlled conditions. Right now it works for me - the gestures feel right, the rhythm clicks, the canvas earns its space. But that’s prototype-level evidence - my workflow, my data, no real demands on it yet.

Real tools have to survive things prototypes don’t - friction that shows up over weeks of use, edge cases I can’t anticipate, sustained work rather than demo sessions. Whether Commando holds up, I genuinely don’t know yet.

Maybe one day it does grow into something real, and then we’ll know. For now - thanks for reading this far.