From User Interface to User Intent

Something fundamental is changing in how humans interact with software.

For decades, we designed interfaces—buttons, menus, forms—and trained users to navigate them. Success meant making interfaces "intuitive." But what if the interface disappeared entirely? What if users just stated what they wanted, and software figured out how to do it?

This isn't speculation. It's happening now. And it changes everything about how we build software.

The Old Paradigm: User Interface

Traditional software interaction follows this model:

User → Interface → Action

Users navigate through explicit interfaces (menus, buttons, forms)
Every action requires manual initiation (click, type, submit)
Workflows are prescribed (step 1 → step 2 → step 3)
Users must know WHERE to go and WHAT to click
The interface is the bottleneck between intent and execution

Example: Booking a Hotel (Traditional UX)

Let's walk through what this looks like in practice:

Open travel website
Click "Hotels" in navigation menu
Enter destination city in search box
Select check-in date from calendar picker
Select check-out date from calendar picker
Click "Search" button
Review list of 50+ hotels
Click on preferred hotel
Select room type from dropdown
Click "Book Now"
Fill out guest information form (8 fields)
Fill out payment form (16 fields)
Click "Confirm Booking"
Wait for email confirmation

Total: 14 manual steps, each requiring the user to know exactly what to do next.

The design challenge: Make these 14 steps feel smooth. Reduce friction. Optimize button placement. Minimize form fields. A/B test checkout flow.

This is "User Interface thinking": The user must navigate the interface the developers designed.

The New Paradigm: User Intent

Now consider a fundamentally different model:

User Intent → Agent → Orchestrated Actions

Users state intent conversationally ("I need a hotel in Chicago Tuesday night")
AI agents act autonomously (search, compare, book, confirm)
Workflows are adaptive (agent remembers preferences, anticipates needs)
Users describe WHAT they want; agents figure out HOW
Conversation replaces navigation

Example: Booking a Hotel (Agentic UX)

The same goal, achieved differently:

User: "I need a hotel in Chicago next Tuesday night for a client meeting downtown."

Agent: "Found 3 options near downtown. Based on your preferences, I recommend the Hilton Garden Inn—quiet floor available, $189/night, free breakfast. Your usual king bed non-smoking room?"

User: "Yes, book it."

Agent: "Done. Confirmation sent to your email. Added to calendar. Uber scheduled for Tuesday 8am to O'Hare. Need anything else?"

Total: 3 conversational exchanges replacing 14 manual steps.

What the agent did autonomously:

✅ Remembered user preferences (quiet rooms, king bed, non-smoking)
✅ Inferred need for transportation (scheduled Uber without being asked)
✅ Integrated with calendar automatically
✅ Understood context (client meeting = business district location)

This is "User Intent thinking": The user expresses goals; the agent orchestrates execution.

What Makes This Possible? The Five Powers

Agentic AI can do this because it possesses five fundamental capabilities that, when combined, enable autonomous orchestration:

1. 👁️ See — Visual Understanding

What it means:

Process images, screenshots, documents, videos
Extract meaning from visual context
Navigate interfaces by "seeing" them
Understand diagrams and visual data

Example:

Claude Code reading error screenshots to debug issues
AI extracting data from invoices and receipts
Agents clicking buttons by visually locating them on screen

2. 👂 Hear — Audio Processing

What it means:

Understand spoken requests (voice interfaces)
Transcribe and analyze conversations
Detect sentiment and tone
Process audio in real-time

Example:

Voice assistants understanding natural speech
Meeting transcription and summarization
Customer service AI detecting frustration in tone

3. 🧠 Reason — Complex Decision-Making

What it means:

Analyze tradeoffs and constraints
Make context-aware decisions
Chain multi-step reasoning (if X, then Y, then Z)
Learn from outcomes

Example:

Agent choosing optimal hotel based on price, location, and preferences
AI debugging code by reasoning through error causes
Financial agents evaluating investment opportunities

4. ⚡ Act — Execute and Orchestrate

What it means:

Call APIs and use tools autonomously
Perform actions across multiple systems
Coordinate complex workflows
Retry and adapt when things fail

Example:

Claude Code writing files, running tests, committing to Git
Travel agents booking flights and hotels
E-commerce agents processing orders and tracking shipments

5. 💾 Remember — Maintain Context and Learn

What it means:

Store user preferences and history
Recall previous interactions
Build domain knowledge over time
Adapt behavior based on feedback

Example:

Agent remembering you prefer quiet hotel rooms
AI assistants referencing previous conversations
Personal AI learning your communication style

How the Five Powers Combine

Individually, each power is useful but limited.

Combined, they create something transformational: autonomous orchestration.

Hotel booking example breakdown:

Hear: User speaks request ("Find me a hotel in Chicago")
Reason: Analyzes requirements (location, timing, context)
Remember: Recalls user prefers quiet rooms, king beds, downtown proximity
Act: Searches hotels, compares options, filters by criteria
See: Reads hotel websites, reviews, location maps
Reason: Evaluates best option considering all factors
Act: Books room, schedules transportation, updates calendar
Remember: Stores this interaction to improve future bookings

The result: A multi-step workflow orchestrated autonomously, adapting to context and user needs.

The Key Insight

As Sandeep Alur from Microsoft states:

"We're moving from large language models to large action models where AI doesn't just respond, it acts, orchestrates, and remembers."

The key word is orchestrate—agents coordinate complex workflows of many actions, like a conductor leading a digital orchestra.

🎓 Expert Insight

Understanding the Five Powers isn't just academic—it transforms how you write specifications. When you know AI can "see" diagrams, you'll include visual specs. When you know it "remembers" context, you'll build on previous conversations. The Five Powers aren't constraints; they're capabilities you orchestrate through clear intent.

What This Means for Building AI-Driven Applications?

The design challenge shifts from "How do we make this interface intuitive?" to "How do we make this agent understand intent accurately?"

The Skill Shift

What mattered in the Interface era:

UI/UX design (visual hierarchy, information architecture)
Frontend frameworks (React, Vue, Angular)
Form validation and input handling
CSS and responsive design
Click-through testing

What matters in the Intent era:

Intent modeling (understanding user goals from natural language)
Context management (memory, personalization, preferences)
Agent orchestration (coordinating multi-step workflows)
Specification writing (clear, testable intent descriptions)
Evaluation design (how do you test "understanding"?)
Behavioral testing (does agent respond appropriately to variations?)

The skill that matters most: Clear specification writing.

But the nature of specs changes:

Before: "When user clicks button X, do Y"
Now: "When user expresses intent Z (in any phrasing), agent understands and acts appropriately"

Brief Context: The Evolution to Agentic AI

Understanding where we are helps explain why this shift is happening now.

AI evolved through roughly three phases:

Phase 1: Predictive AI

What it did: Analyzed historical data to forecast outcomes
Limitation: Could only predict, not create or act

Phase 2: Generative AI

What it does: Creates new content from patterns
Limitation: Generates when prompted, but doesn't take action

Phase 3: Agentic AI

What it does: Takes autonomous action to achieve goals
Breakthrough: AI shifts from tool to teammate—from responding to orchestrating

The key difference: Earlier AI waited for commands. Agentic AI initiates, coordinates, and completes workflows autonomously.

This evolution unlocked the Five Powers working together, making the UX→Intent paradigm shift possible.

💬 AI Colearning Prompt

Reimagine a workflow you do regularly: Think of something you do often—expense reporting, scheduling meetings, planning projects. Describe it to your AI partner and ask: "How would this work as an agentic experience? What would I say? What would the agent need to understand and remember? Where might it misunderstand my intent?" Let your AI help you discover the difference between automation (scripted steps) and agency (understanding intent).

🤝 Practice Exercise

Ask your AI: Pick any task you do regularly (booking travel, managing emails, tracking expenses). Describe your current manual process to your AI partner, then work together to design an agentic version. Ask the AI: "What would make this truly intent-driven rather than just automated?" Let the AI challenge your thinking—this is co-learning in action.

Try With AI

Use your AI companion (ChatGPT web, Claude Code, Gemini CLI) to explore these concepts:

Prompt 1: Reimagine a Workflow as Agentic

I want to reimagine a manual workflow as agentic. Here's what I currently do [describe
a multi-step task you do regularly, like expense reporting, email management, project
planning, etc.].

Help me reimagine this as an agentic experience:
1. What would I say to an agent to express my intent?
2. What would the agent need to understand?
3. What actions would it take autonomously?
4. What would the agent need to remember for next time?
5. Where might it fail or misunderstand?

Let's discover together: What makes this agentic vs. just automated?

What you're learning: Intent modeling—thinking in goals and context rather than steps and clicks.

Prompt 2: Identify the Five Powers in Action

Let's analyze a real agentic system (like Claude Code, travel booking agents, or customer
service AI). For the system we choose, help me identify concrete examples of each power:

1. SEE: How does it process visual information?
2. HEAR: How does it understand natural language input?
3. REASON: What decisions does it make autonomously?
4. ACT: What actions can it take across systems?
5. REMEMBER: What context does it maintain?

Then let's discover: How do these five powers COMBINE to enable orchestration? What would
break if one power was missing?

What you're learning: System analysis—understanding how capabilities combine to create emergent behavior.

Prompt 3: Design Intent Disambiguation

Pick a simple user intent like "I want to travel to New York." This is ambiguous—it could
mean many things. Work with your AI to design disambiguation:

1. What questions would an agent need to ask to clarify intent?
2. What context would help avoid asking obvious questions?
3. What reasonable assumptions could the agent make?
4. How would you specify "understanding" for this intent?

Let's discover: What makes intent specification different from interface specification?

What you're learning: Specification thinking for agentic systems—clarity in the face of ambiguity.

The Old Paradigm: User Interface​

Example: Booking a Hotel (Traditional UX)​

The New Paradigm: User Intent​

Example: Booking a Hotel (Agentic UX)​

What Makes This Possible? The Five Powers​

1. 👁️ See — Visual Understanding​

2. 👂 Hear — Audio Processing​

3. 🧠 Reason — Complex Decision-Making​

4. ⚡ Act — Execute and Orchestrate​

5. 💾 Remember — Maintain Context and Learn​

How the Five Powers Combine​

The Key Insight​

🎓 Expert Insight​

What This Means for Building AI-Driven Applications?​

The Skill Shift​

Brief Context: The Evolution to Agentic AI​

💬 AI Colearning Prompt​

🤝 Practice Exercise​

Try With AI​

Prompt 1: Reimagine a Workflow as Agentic​

Prompt 2: Identify the Five Powers in Action​

Prompt 3: Design Intent Disambiguation​

The Old Paradigm: User Interface

Example: Booking a Hotel (Traditional UX)

The New Paradigm: User Intent

Example: Booking a Hotel (Agentic UX)

What Makes This Possible? The Five Powers

1. 👁️ See — Visual Understanding

2. 👂 Hear — Audio Processing

3. 🧠 Reason — Complex Decision-Making

4. ⚡ Act — Execute and Orchestrate

5. 💾 Remember — Maintain Context and Learn

How the Five Powers Combine

The Key Insight

🎓 Expert Insight

What This Means for Building AI-Driven Applications?

The Skill Shift

Brief Context: The Evolution to Agentic AI

💬 AI Colearning Prompt

🤝 Practice Exercise

Try With AI

Prompt 1: Reimagine a Workflow as Agentic

Prompt 2: Identify the Five Powers in Action

Prompt 3: Design Intent Disambiguation