Language

Chat, streaming, and message edits

This page explains behavior and integration choices for chat/streaming routes listed in the main index. See app/server/routers/chat.py and conversation.py for the implementation.

The three stream `POST` entry points

Endpoint	Auth	When to use
`/api/chat`	Session middleware (unless allowlisted)	Requires `agent_id` in JSON; `prepare_session` fills tools/skills from the saved agent.
`/api/stream`	Same	`StreamRequest` with `require_agent_id=False`—if `agent_id` is present it is used, otherwise the server resolves the agent from saved defaults / session. Typical product flow: selected agent + this endpoint.
`/api/web-stream`	Same	Backed by `StreamManager` for re-entrancy: a new request for the same `session_id` stops the old stream first. Good for web tabs and reconnect UX.

Shared validation: messages must be non-empty. user_id can be injected from the session if omitted.

User input optimization

POST /api/chat/optimize-input: BaseResponse with structured data from the service.
POST /api/chat/optimize-input/stream: text/plain where each line is one JSON object from the stream.

UserInputOptimizeRequest.user_id is optional; the server reads the current user from request.state.user_claims.

Resume and active sessions

GET /api/stream/resume/{session_id}: continues from last_index in StreamManager. If no new chunks, you may get a stream_end line with resume_fallback: true.
GET /api/stream/active_sessions: SSE of currently active streaming session ids for the UI.

Rerun vs edit

POST /api/conversations/{session_id}/rerun-stream: does not require messages in the request body. The server loads the last user turn and re-runs the stream. Optional RerunStreamRequest fields override agent_id, sub-agents, mode, etc. Same text/plain stream shape as web-stream.
POST /api/conversations/{session_id}/edit-last-user-message updates the stored last user message; call rerun-stream after if you need a new model response.

rerun-stream also accepts optional guidance_content and guidance_id. The UI uses this for “apply guidance now”: it interrupts the running stream, deletes the pending guidance item, and reruns the session with guidance_content appended as a new user message.

Runtime guidance messages

A running session can receive guidance messages. The message first enters the session pending queue. Before the next LLM request, the agent consumes it, writes it to MessageManager, includes it in the current LLM context, and streams it back as a regular role=user message. The returned message includes metadata.guidance_id, so the frontend can remove the matching guidance chip.

Use this when a user sends follow-up guidance while the assistant is already running, for example “prioritize the failing test first”. This is different from interrupt: it does not stop the session and does not immediately force a new model request.

Method	Endpoint	Purpose
`POST`	`/api/sessions/{session_id}/inject-user-message`	Queue one guidance user message for the running session.
`GET`	`/api/sessions/{session_id}/inject-user-message`	List queued guidance messages that have not been consumed yet.
`PATCH`	`/api/sessions/{session_id}/inject-user-message/{guidance_id}`	Edit one queued guidance message before it is consumed.
`DELETE`	`/api/sessions/{session_id}/inject-user-message/{guidance_id}`	Delete one queued guidance message before it is consumed.

Add guidance

Request body:

{
  "content": "Please prioritize the failing test first",
  "guidance_id": "optional-client-id",
  "metadata": {
    "source_ui": "guidance_area"
  }
}

guidance_id is optional. If omitted, the runtime generates one. metadata is optional and is merged into the eventual user MessageChunk.metadata.

Successful responses use the standard BaseResponse wrapper. The important data shape is:

{
  "session_id": "sess_123",
  "guidance_id": "optional-client-id",
  "accepted": true
}

List, edit, and delete pending guidance

GET returns:

{
  "session_id": "sess_123",
  "items": [
    {
      "guidance_id": "g1",
      "content": "Please prioritize the failing test first",
      "status": "pending",
      "timestamp": 1761700000.123
    }
  ]
}

PATCH body:

{"content": "Please prioritize the failing test and summarize the fix"}

PATCH returns {"updated": true} in data; DELETE returns {"deleted": true} in data.

Client handling notes

Keep a local guidance chip keyed by guidance_id after POST succeeds.
Remove the chip when the stream emits a normal role=user message whose metadata.guidance_id matches.
Treat a 4xx on PATCH / DELETE as “already consumed or missing” unless the error is clearly validation-related.
content must be non-empty; empty content is rejected.

Once a guidance message is consumed, it is a normal persisted user message and can no longer be edited through the pending-guidance endpoints.

`tool_progress` events in the stream

In addition to regular message events, the NDJSON stream from any of the three endpoints above may also contain tool_progress events that carry incremental output from a tool while it is still running:

{"type":"tool_progress","tool_call_id":"call_abc","text":"...","stream":"stdout","closed":false,"ts":1761700000.123}

UI-only. These events never enter session history / MessageManager / the LLM context.
Clients aggregate by tool_call_id into the corresponding tool card; closed: true marks the end of the live stream.
Downstream consumers that don’t care can simply ignore lines with type=tool_progress; the existing protocol is fully preserved.
To disable: set SAGE_TOOL_PROGRESS_ENABLED=false on the server.

See Architecture · §12 Tool live-progress channel.

Interrupt

POST /api/sessions/{session_id}/interrupt stops a running session at the engine level. That is different from the automatic stop inside web-stream / rerun-stream when the same session is replaced.

Back to HTTP API Reference