Agent-User Interaction Protocol: When the frontend got an AI protocol

Written by Roi Dannon | Jan 7, 2026 10:46:45 PM

An emerging AI agent protocol.

At Via, we are continually exploring innovative ways to enhance our operational efficiency and support systems. Recently, a team of AI enthusiasts — comprising both frontend and backend engineers — demonstrated the power of the Agent-User Interaction Protocol (AG-UI) in a successful internal experiment. We built a specialized chatbot assistant for our backoffice rider management system, designing it to leverage the protocol's core capabilities. By utilizing features like Front-end Tool Calls for rapid web app navigation and Streaming Chat for real-time data delivery, we could enable our operations agents to perform more complex tasks and resolve service requests more quickly and effectively.

As part of this team, I found the protocol and its systematic approach to AI-user interaction fascinating. I wanted to share my journey in studying the core abilities this protocol gives us, especially when developing modern, web-based chat products.

What is the AG-UI protocol?

The AG-UI protocol acts as the crucial connection layer between an LLM-based AI agent and the user’s interface (like a web browser).

Developed by the creators of CopilotKit, it formalizes the emerging behaviors observed in how AI agents interact with users. Understanding AG-UI is essential for productizing AI features, especially for web-based chat experiences: Where the agent interacts with the user directly on a web page.

The protocol places the AI agent at the center of three key interaction types:

Agent-to-Tools (MCP Protocol): A standard for connecting AI applications to external systems where the agent performs actions with external services or internal back-end systems.
Agent-to-Other-Agents (A2A): A standard designed to enable seamless communication and collaboration between AI agents.
Agent-to-User Interface (AG-UI): How the agent delivers information and receives input from the end-user.

The general AI architecture diagram is the following:

AG‑UI protocol architecture diagram showing agent, tools, user interface and communication channels. The user’s browser is communicating with the agent through AG-UI protocol. The AI agent is talking to other AI agents through A2A while activating tools and APIs through MCP protocol.

Underlying protocols: WebSocket vs SSE.

While the AG-UI protocol defines the event structure for AI-user interaction, the actual transport layer for features like streaming chat often relies on foundational web technologies like WebSockets or Server-Sent Events (SSE). The core difference lies in directionality: WebSockets provide a bidirectional, full-duplex channel, allowing the AI agent and the user interface to send data to each other simultaneously over a single connection. This is necessary for real-time applications requiring two-way command and control.

In contrast, Server-Sent Events (SSE) establish a unidirectional connection, designed only for the server (the agent) to push continuous data streams to the client (the user interface). SSE is generally simpler to implement when the client’s role is purely to consume the server’s stream of updates, such as the initial text generation from an LLM.

Diagram comparing bidirectional WebSocket communication to unidirectional Server‑Sent Events (SSE) within the AG‑UI protocol, highlighting two‑way vs one‑way streaming

An example: AG-UI as applied to the Via Operator Console.

To truly understand the capabilities of AG-UI, it helps to see it in action. In this section, we will walk through the protocol’s features by exploring an experimental implementation within the Via Operator Console (VOC).

The VOC is a central hub from which operators — whether in transit agencies, businesses, or schools — manage their live microtransit and paratransit operations, helping riders connect with rides. By examining how an AI agent integrates into this complex environment, we can see exactly how AG-UI structures real-time communication, navigation, and safety.

Streaming Chat.

The most immediately visible feature of the AG-UI Protocol is Streaming Chat. In a high-paced environment like the VOC, operators need immediate feedback.

Imagine a scenario where a human operator requests that a ride be booked on behalf of a rider. Instead of waiting for the agent to process the entire request in silence, the agent uses Streaming Chat to deliver the response in real time, word-by-word. This immediate feedback confirms that the system is complying with the request and allows the operator to verify booking details as they are generated.

The events emitted by the server in a simple chat stream interaction

This is managed through a sequence of events that the agent server sends to the user’s browser:

Lifecycle Event, RUN_STARTED: The agent has received the user’s input and begins processing.
Text Message Event, TEXT_MESSAGE_START: text message start: An indication that the LLM has finished its initial thinking and is ready to send the response.
Text Message Event, TEXT_MESSAGE_CONTENT: The actual text streams in, often broken down word-by-word. The front-end developer decides how to buffer and display this (e.g., word-by-word or in chunks like paragraphs).
Text Message Event, TEXT_MESSAGE_END: The full text message is complete.
Lifecycle Event, RUN_FINISHED: The agent has finished its turn, and the system is ready for the user’s next request.

Front-end Tool Calls.

Beyond conversation, an effective agent within the VOC needs to navigate the dashboard. Front-end Tool Calls allow the AI agent to send instructions directly to the browser to operate local tools, such as changing the page URL, modifying CSS, or switching tabs.

To see the value of this, consider a typical workflow: A rider calls the support center with a question about a past trip. The operator quickly verifies the rider's identity, then simply asks the agent, "Show me John Doe's ride history." Instead of responding with text, the agent utilizes a Front-end Tool Call to instantly "zip" the operator’s view from the main dashboard directly to the "Rider Rides" tab, pre-filtered for John Doe. This automation streamlines the interaction, allowing the operator to execute complex navigation steps instantly using free-form natural language.

The events emitted by the server in a simple frontend tool calls interaction

The event flow for a tool call involves:

Lifecycle Event, RUN_STARTED
Tool Call Event, TOOL_CALL_START: tool call start event. In this case the server is indicating the browser with the intent of running a change background color tool on the frontend.
Tool Call Event, TOOL_CALL_ARGS: tool call Args event(s) to pass specific parameters (e.g., color red).
Tool Call Event, TOOL_CALL_END: an indication that the frontend tool calling has ended

Crucially, when implementing the agent integration, we have the flexibility to decide if the agent should tell the user what it’s about to do by a text message first, or if it should execute the tool call directly.

A note on compatibility: Front-end tool calls are client-dependent. The same instruction (e.g., change background color) might be implemented differently or even disallowed across different browsers like Chrome, Firefox, or Safari.

State sharing.

The AG-UI protocol’s support for Agent Steering and State Sharing is crucial for enabling a true human-in-the-loop experience within the VOC.

Consider a high-stakes scenario: An operator instructs the agent to "Cancel a batch of unconfirmed rides." The agent begins the run by announcing a Back-end Tool Call by text: “Preparing to cancel rides.” As it transitions into an “executing_tool_call” state, it transmits a detailed plan listing the specific rides to be removed.

Because of constant state sharing, the front-end registers this intent immediately. If the operator notices that the list is incorrect — perhaps it includes an active booking — they can instantly trigger an interrupt command. This changes the agent’s state to “halted” before the final, irreversible backend tool call is executed, demonstrating the protocol’s fine-grained control over system safety.

The events emitted by the server sharing information about his state with the user

In the above example, the event flow for a state share includes:

Lifecycle Event, RUN_STARTED
State Management Event, STATE_SNAPSHOT: The server shares the user with his existing state, in this case, the agent is in a state of “re-organizing directory”
State Management Event, STATE_DELTA: The state is changing into preparation to disk drive state - a potentially alarming state for the user.

In this case the user would probably want to immediately stop the agent process to prevent data loss.

What’s next?

The AG-UI protocol provides a clear blueprint for developers building AI chat agents. The separation of messaging events, front-end tool calls, and state management gives product teams fine-grained control over the user experience.

This article only covered a small portion of the existing types of events the AG-UI protocol currently supports and it also didn’t address the hefty list of roadmap features the copilotkit guys are planning for us. I encourage the reader to visit the official documentation: https://docs.ag-ui.com/introduction.

View full post