From User Interface to User Intent
Something fundamental is changing in how humans interact with software.
For decades, we designed interfaces—buttons, menus, forms—and trained users to navigate them. Success meant making interfaces "intuitive." But what if the interface disappeared entirely? What if users just stated what they wanted, and software figured out how to do it?
This isn't speculation. It's happening now. And it changes everything about how we build software.
The Old Paradigm: User Interface
Traditional software interaction follows this model:
User → Interface → Action
- Users navigate through explicit interfaces (menus, buttons, forms)
- Every action requires manual initiation (click, type, submit)
- Workflows are prescribed (step 1 → step 2 → step 3)
- Users must know WHERE to go and WHAT to click
- The interface is the bottleneck between intent and execution
Example: Booking a Hotel (Traditional UX)
Let's walk through what this looks like in practice:
- Open travel website
- Click "Hotels" in navigation menu
- Enter destination city in search box
- Select check-in date from calendar picker
- Select check-out date from calendar picker
- Click "Search" button
- Review list of 50+ hotels
- Click on preferred hotel
- Select room type from dropdown
- Click "Book Now"
- Fill out guest information form (8 fields)
- Fill out payment form (16 fields)
- Click "Confirm Booking"
- Wait for email confirmation
Total: 14 manual steps, each requiring the user to know exactly what to do next.
The design challenge: Make these 14 steps feel smooth. Reduce friction. Optimize button placement. Minimize form fields. A/B test checkout flow.
This is "User Interface thinking": The user must navigate the interface the developers designed.
The New Paradigm: User Intent
Now consider a fundamentally different model:
User Intent → Agent → Orchestrated Actions
- Users state intent conversationally ("I need a hotel in Chicago Tuesday night")
- AI agents act autonomously (search, compare, book, confirm)
- Workflows are adaptive (agent remembers preferences, anticipates needs)
- Users describe WHAT they want; agents figure out HOW
- Conversation replaces navigation
Example: Booking a Hotel (Agentic UX)
The same goal, achieved differently:
User: "I need a hotel in Chicago next Tuesday night for a client meeting downtown."
Agent: "Found 3 options near downtown. Based on your preferences, I recommend the Hilton Garden Inn—quiet floor available, $189/night, free breakfast. Your usual king bed non-smoking room?"
User: "Yes, book it."
Agent: "Done. Confirmation sent to your email. Added to calendar. Uber scheduled for Tuesday 8am to O'Hare. Need anything else?"
Total: 3 conversational exchanges replacing 14 manual steps.
What the agent did autonomously:
- ✅ Remembered user preferences (quiet rooms, king bed, non-smoking)
- ✅ Inferred need for transportation (scheduled Uber without being asked)
- ✅ Integrated with calendar automatically
- ✅ Understood context (client meeting = business district location)
This is "User Intent thinking": The user expresses goals; the agent orchestrates execution.
What Makes This Possible? The Five Powers
Agentic AI can do this because it possesses five fundamental capabilities that, when combined, enable autonomous orchestration:
1. 👁️ See — Visual Understanding
What it means:
- Process images, screenshots, documents, videos
- Extract meaning from visual context
- Navigate interfaces by "seeing" them
- Understand diagrams and visual data
Example:
- Claude Code reading error screenshots to debug issues
- AI extracting data from invoices and receipts
- Agents clicking buttons by visually locating them on screen
2. 👂 Hear — Audio Processing
What it means:
- Understand spoken requests (voice interfaces)
- Transcribe and analyze conversations
- Detect sentiment and tone
- Process audio in real-time
Example:
- Voice assistants understanding natural speech
- Meeting transcription and summarization
- Customer service AI detecting frustration in tone
3. 🧠 Reason — Complex Decision-Making
What it means:
- Analyze tradeoffs and constraints
- Make context-aware decisions
- Chain multi-step reasoning (if X, then Y, then Z)
- Learn from outcomes
Example:
- Agent choosing optimal hotel based on price, location, and preferences
- AI debugging code by reasoning through error causes
- Financial agents evaluating investment opportunities
4. ⚡ Act — Execute and Orchestrate
What it means:
- Call APIs and use tools autonomously
- Perform actions across multiple systems
- Coordinate complex workflows
- Retry and adapt when things fail
Example:
- Claude Code writing files, running tests, committing to Git
- Travel agents booking flights and hotels
- E-commerce agents processing orders and tracking shipments
5. 💾 Remember — Maintain Context and Learn
What it means:
- Store user preferences and history
- Recall previous interactions
- Build domain knowledge over time
- Adapt behavior based on feedback
Example:
- Agent remembering you prefer quiet hotel rooms
- AI assistants referencing previous conversations
- Personal AI learning your communication style
How the Five Powers Combine
Individually, each power is useful but limited.
Combined, they create something transformational: autonomous orchestration.
Hotel booking example breakdown:
- Hear: User speaks request ("Find me a hotel in Chicago")
- Reason: Analyzes requirements (location, timing, context)
- Remember: Recalls user prefers quiet rooms, king beds, downtown proximity
- Act: Searches hotels, compares options, filters by criteria
- See: Reads hotel websites, reviews, location maps
- Reason: Evaluates best option considering all factors
- Act: Books room, schedules transportation, updates calendar
- Remember: Stores this interaction to improve future bookings
The result: A multi-step workflow orchestrated autonomously, adapting to context and user needs.
The Key Insight
As Sandeep Alur from Microsoft states:
"We're moving from large language models to large action models where AI doesn't just respond, it acts, orchestrates, and remembers."
The key word is orchestrate—agents coordinate complex workflows of many actions, like a conductor leading a digital orchestra.
🎓 Expert Insight
Understanding the Five Powers isn't just academic—it transforms how you write specifications. When you know AI can "see" diagrams, you'll include visual specs. When you know it "remembers" context, you'll build on previous conversations. The Five Powers aren't constraints; they're capabilities you orchestrate through clear intent.
What This Means for Building AI-Driven Applications?
The design challenge shifts from "How do we make this interface intuitive?" to "How do we make this agent understand intent accurately?"
The Skill Shift
What mattered in the Interface era:
- UI/UX design (visual hierarchy, information architecture)
- Frontend frameworks (React, Vue, Angular)
- Form validation and input handling
- CSS and responsive design
- Click-through testing
What matters in the Intent era:
- Intent modeling (understanding user goals from natural language)
- Context management (memory, personalization, preferences)
- Agent orchestration (coordinating multi-step workflows)
- Specification writing (clear, testable intent descriptions)
- Evaluation design (how do you test "understanding"?)
- Behavioral testing (does agent respond appropriately to variations?)
The skill that matters most: Clear specification writing.
But the nature of specs changes:
- Before: "When user clicks button X, do Y"
- Now: "When user expresses intent Z (in any phrasing), agent understands and acts appropriately"
Brief Context: The Evolution to Agentic AI
Understanding where we are helps explain why this shift is happening now.
AI evolved through roughly three phases:
Phase 1: Predictive AI
- What it did: Analyzed historical data to forecast outcomes
- Limitation: Could only predict, not create or act
Phase 2: Generative AI
- What it does: Creates new content from patterns
- Limitation: Generates when prompted, but doesn't take action
Phase 3: Agentic AI
- What it does: Takes autonomous action to achieve goals
- Breakthrough: AI shifts from tool to teammate—from responding to orchestrating
The key difference: Earlier AI waited for commands. Agentic AI initiates, coordinates, and completes workflows autonomously.
This evolution unlocked the Five Powers working together, making the UX→Intent paradigm shift possible.
💬 AI Colearning Prompt
Reimagine a workflow you do regularly: Think of something you do often—expense reporting, scheduling meetings, planning projects. Describe it to your AI partner and ask: "How would this work as an agentic experience? What would I say? What would the agent need to understand and remember? Where might it misunderstand my intent?" Let your AI help you discover the difference between automation (scripted steps) and agency (understanding intent).
🤝 Practice Exercise
Ask your AI: Pick any task you do regularly (booking travel, managing emails, tracking expenses). Describe your current manual process to your AI partner, then work together to design an agentic version. Ask the AI: "What would make this truly intent-driven rather than just automated?" Let the AI challenge your thinking—this is co-learning in action.
Try With AI
Use your AI companion (ChatGPT web, Claude Code, Gemini CLI) to explore these concepts:
Prompt 1: Reimagine a Workflow as Agentic
I want to reimagine a manual workflow as agentic. Here's what I currently do [describe
a multi-step task you do regularly, like expense reporting, email management, project
planning, etc.].
Help me reimagine this as an agentic experience:
1. What would I say to an agent to express my intent?
2. What would the agent need to understand?
3. What actions would it take autonomously?
4. What would the agent need to remember for next time?
5. Where might it fail or misunderstand?
Let's discover together: What makes this agentic vs. just automated?
What you're learning: Intent modeling—thinking in goals and context rather than steps and clicks.
Prompt 2: Identify the Five Powers in Action
Let's analyze a real agentic system (like Claude Code, travel booking agents, or customer
service AI). For the system we choose, help me identify concrete examples of each power:
1. SEE: How does it process visual information?
2. HEAR: How does it understand natural language input?
3. REASON: What decisions does it make autonomously?
4. ACT: What actions can it take across systems?
5. REMEMBER: What context does it maintain?
Then let's discover: How do these five powers COMBINE to enable orchestration? What would
break if one power was missing?
What you're learning: System analysis—understanding how capabilities combine to create emergent behavior.
Prompt 3: Design Intent Disambiguation
Pick a simple user intent like "I want to travel to New York." This is ambiguous—it could
mean many things. Work with your AI to design disambiguation:
1. What questions would an agent need to ask to clarify intent?
2. What context would help avoid asking obvious questions?
3. What reasonable assumptions could the agent make?
4. How would you specify "understanding" for this intent?
Let's discover: What makes intent specification different from interface specification?
What you're learning: Specification thinking for agentic systems—clarity in the face of ambiguity.