Chat, streaming, and message edits
This page explains behavior and integration choices for chat/streaming routes listed in the main index. See app/server/routers/chat.py and conversation.py for the implementation.
The three stream POST entry points
| Endpoint | Auth | When to use |
|---|---|---|
/api/chat | Session middleware (unless allowlisted) | Requires agent_id in JSON; prepare_session fills tools/skills from the saved agent. |
/api/stream | Same | StreamRequest with require_agent_id=False—if agent_id is present it is used, otherwise the server resolves the agent from saved defaults / session. Typical product flow: selected agent + this endpoint. |
/api/web-stream | Same | Backed by StreamManager for re-entrancy: a new request for the same session_id stops the old stream first. Good for web tabs and reconnect UX. |
Shared validation: messages must be non-empty. user_id can be injected from the session if omitted.
User input optimization
POST /api/chat/optimize-input:BaseResponsewith structureddatafrom the service.POST /api/chat/optimize-input/stream:text/plainwhere each line is one JSON object from the stream.
UserInputOptimizeRequest.user_id is optional; the server reads the current user from request.state.user_claims.
Resume and active sessions
GET /api/stream/resume/{session_id}: continues fromlast_indexinStreamManager. If no new chunks, you may get astream_endline withresume_fallback: true.GET /api/stream/active_sessions: SSE of currently active streaming session ids for the UI.
Rerun vs edit
POST /api/conversations/{session_id}/rerun-stream: does not requiremessagesin the request body. The server loads the last user turn and re-runs the stream. OptionalRerunStreamRequestfields overrideagent_id, sub-agents, mode, etc. Sametext/plainstream shape asweb-stream.POST /api/conversations/{session_id}/edit-last-user-messageupdates the stored last user message; callrerun-streamafter if you need a new model response.
rerun-stream also accepts optional guidance_content and guidance_id. The UI uses this for “apply guidance now”: it interrupts the running stream, deletes the pending guidance item, and reruns the session with guidance_content appended as a new user message.
Runtime guidance messages
A running session can receive guidance messages. The message first enters the session pending queue. Before the next LLM request, the agent consumes it, writes it to MessageManager, includes it in the current LLM context, and streams it back as a regular role=user message. The returned message includes metadata.guidance_id, so the frontend can remove the matching guidance chip.
Use this when a user sends follow-up guidance while the assistant is already running, for example “prioritize the failing test first”. This is different from interrupt: it does not stop the session and does not immediately force a new model request.
| Method | Endpoint | Purpose |
|---|---|---|
POST | /api/sessions/{session_id}/inject-user-message | Queue one guidance user message for the running session. |
GET | /api/sessions/{session_id}/inject-user-message | List queued guidance messages that have not been consumed yet. |
PATCH | /api/sessions/{session_id}/inject-user-message/{guidance_id} | Edit one queued guidance message before it is consumed. |
DELETE | /api/sessions/{session_id}/inject-user-message/{guidance_id} | Delete one queued guidance message before it is consumed. |
Add guidance
Request body:
{
"content": "Please prioritize the failing test first",
"guidance_id": "optional-client-id",
"metadata": {
"source_ui": "guidance_area"
}
}
guidance_id is optional. If omitted, the runtime generates one. metadata is optional and is merged into the eventual user MessageChunk.metadata.
Successful responses use the standard BaseResponse wrapper. The important data shape is:
{
"session_id": "sess_123",
"guidance_id": "optional-client-id",
"accepted": true
}
List, edit, and delete pending guidance
GET returns:
{
"session_id": "sess_123",
"items": [
{
"guidance_id": "g1",
"content": "Please prioritize the failing test first",
"status": "pending",
"timestamp": 1761700000.123
}
]
}
PATCH body:
{"content": "Please prioritize the failing test and summarize the fix"}
PATCH returns {"updated": true} in data; DELETE returns {"deleted": true} in data.
Client handling notes
- Keep a local guidance chip keyed by
guidance_idafterPOSTsucceeds. - Remove the chip when the stream emits a normal
role=usermessage whosemetadata.guidance_idmatches. - Treat a 4xx on
PATCH/DELETEas “already consumed or missing” unless the error is clearly validation-related. contentmust be non-empty; empty content is rejected.
Once a guidance message is consumed, it is a normal persisted user message and can no longer be edited through the pending-guidance endpoints.
tool_progress events in the stream
In addition to regular message events, the NDJSON stream from any of the three endpoints above may also contain tool_progress events that carry incremental output from a tool while it is still running:
{"type":"tool_progress","tool_call_id":"call_abc","text":"...","stream":"stdout","closed":false,"ts":1761700000.123}
- UI-only. These events never enter session history / MessageManager / the LLM context.
- Clients aggregate by
tool_call_idinto the corresponding tool card;closed: truemarks the end of the live stream. - Downstream consumers that don’t care can simply ignore lines with
type=tool_progress; the existing protocol is fully preserved. - To disable: set
SAGE_TOOL_PROGRESS_ENABLED=falseon the server.
See Architecture · §12 Tool live-progress channel.
Interrupt
POST /api/sessions/{session_id}/interrupt stops a running session at the engine level. That is different from the automatic stop inside web-stream / rerun-stream when the same session is replaced.