At Via, we are continually exploring innovative ways to enhance our operational efficiency and support systems. Recently, a team of AI enthusiasts — comprising both frontend and backend engineers — demonstrated the power of the Agent-User Interaction Protocol (AG-UI) in a successful internal experiment. We built a specialized chatbot assistant for our backoffice rider management system, designing it to leverage the protocol's core capabilities. By utilizing features like Front-end Tool Calls for rapid web app navigation and Streaming Chat for real-time data delivery, we could enable our operations agents to perform more complex tasks and resolve service requests more quickly and effectively.
As part of this team, I found the protocol and its systematic approach to AI-user interaction fascinating. I wanted to share my journey in studying the core abilities this protocol gives us, especially when developing modern, web-based chat products.
The AG-UI protocol acts as the crucial connection layer between an LLM-based AI agent and the user’s interface (like a web browser).
Developed by the creators of CopilotKit, it formalizes the emerging behaviors observed in how AI agents interact with users. Understanding AG-UI is essential for productizing AI features, especially for web-based chat experiences: Where the agent interacts with the user directly on a web page.
The protocol places the AI agent at the center of three key interaction types:
The general AI architecture diagram is the following:
AG‑UI protocol architecture diagram showing agent, tools, user interface and communication channels. The user’s browser is communicating with the agent through AG-UI protocol. The AI agent is talking to other AI agents through A2A while activating tools and APIs through MCP protocol.
While the AG-UI protocol defines the event structure for AI-user interaction, the actual transport layer for features like streaming chat often relies on foundational web technologies like WebSockets or Server-Sent Events (SSE). The core difference lies in directionality: WebSockets provide a bidirectional, full-duplex channel, allowing the AI agent and the user interface to send data to each other simultaneously over a single connection. This is necessary for real-time applications requiring two-way command and control.
In contrast, Server-Sent Events (SSE) establish a unidirectional connection, designed only for the server (the agent) to push continuous data streams to the client (the user interface). SSE is generally simpler to implement when the client’s role is purely to consume the server’s stream of updates, such as the initial text generation from an LLM.
Diagram comparing bidirectional WebSocket communication to unidirectional Server‑Sent Events (SSE) within the AG‑UI protocol, highlighting two‑way vs one‑way streaming
To truly understand the capabilities of AG-UI, it helps to see it in action. In this section, we will walk through the protocol’s features by exploring an experimental implementation within the Via Operator Console (VOC).
The VOC is a central hub from which operators — whether in transit agencies, businesses, or schools — manage their live microtransit and paratransit operations, helping riders connect with rides. By examining how an AI agent integrates into this complex environment, we can see exactly how AG-UI structures real-time communication, navigation, and safety.
The most immediately visible feature of the AG-UI Protocol is Streaming Chat. In a high-paced environment like the VOC, operators need immediate feedback.
Imagine a scenario where a human operator requests that a ride be booked on behalf of a rider. Instead of waiting for the agent to process the entire request in silence, the agent uses Streaming Chat to deliver the response in real time, word-by-word. This immediate feedback confirms that the system is complying with the request and allows the operator to verify booking details as they are generated.
This is managed through a sequence of events that the agent server sends to the user’s browser:
Beyond conversation, an effective agent within the VOC needs to navigate the dashboard. Front-end Tool Calls allow the AI agent to send instructions directly to the browser to operate local tools, such as changing the page URL, modifying CSS, or switching tabs.
To see the value of this, consider a typical workflow: A rider calls the support center with a question about a past trip. The operator quickly verifies the rider's identity, then simply asks the agent, "Show me John Doe's ride history." Instead of responding with text, the agent utilizes a Front-end Tool Call to instantly "zip" the operator’s view from the main dashboard directly to the "Rider Rides" tab, pre-filtered for John Doe. This automation streamlines the interaction, allowing the operator to execute complex navigation steps instantly using free-form natural language.
The event flow for a tool call involves:
Crucially, when implementing the agent integration, we have the flexibility to decide if the agent should tell the user what it’s about to do by a text message first, or if it should execute the tool call directly.
A note on compatibility: Front-end tool calls are client-dependent. The same instruction (e.g., change background color) might be implemented differently or even disallowed across different browsers like Chrome, Firefox, or Safari.
The AG-UI protocol’s support for Agent Steering and State Sharing is crucial for enabling a true human-in-the-loop experience within the VOC.
Consider a high-stakes scenario: An operator instructs the agent to "Cancel a batch of unconfirmed rides." The agent begins the run by announcing a Back-end Tool Call by text: “Preparing to cancel rides.” As it transitions into an “executing_tool_call” state, it transmits a detailed plan listing the specific rides to be removed.
Because of constant state sharing, the front-end registers this intent immediately. If the operator notices that the list is incorrect — perhaps it includes an active booking — they can instantly trigger an interrupt command. This changes the agent’s state to “halted” before the final, irreversible backend tool call is executed, demonstrating the protocol’s fine-grained control over system safety.
In the above example, the event flow for a state share includes:
In this case the user would probably want to immediately stop the agent process to prevent data loss.
The AG-UI protocol provides a clear blueprint for developers building AI chat agents. The separation of messaging events, front-end tool calls, and state management gives product teams fine-grained control over the user experience.
This article only covered a small portion of the existing types of events the AG-UI protocol currently supports and it also didn’t address the hefty list of roadmap features the copilotkit guys are planning for us. I encourage the reader to visit the official documentation: https://docs.ag-ui.com/introduction.