Streaming Chat
The /chat endpoint streams responses as NDJSON (newline-delimited JSON). Each line is a self-contained JSON object — a chunk of the assistant's answer.
Savanto's chat pipeline is multi-agent: a triage agent classifies intent, domain curators (product, post, custom) run in parallel, and a response composer weaves their output into a single cohesive multi-block answer. The streaming protocol mirrors that architecture: instead of "one bubble per agent" you get one ordered stream of typed blocks that the client renders inline (text, product cards, post cards, custom cards) as the model is still writing.
Why Streaming?
| Regular Chat | Streaming Chat |
|---|---|
| Wait for the complete response | See text and cards appear in real-time |
| 3–10 second perceived delay | Sub-second time-to-first-token |
| Single payload | Block-by-block delivery |
Basic Request
curl -N -X POST https://api.savanto.ai/chat \
-H "Authorization: Bearer if_sk_xxx" \
-H "Content-Type: application/json" \
-d '{"message": "Tell me about your products", "threadId": "session-123", "stream": true}'
The response is a stream of JSON objects, one per line:
{"type":"progress","data":{"message":"Analyzing your question...","stage":"triage"}}
{"type":"block_start","data":{"blockId":"b1","type":"text","domain":null}}
{"type":"block_delta","data":{"blockId":"b1","content":"Here are some "}}
{"type":"block_delta","data":{"blockId":"b1","content":"options I think you'll like."}}
{"type":"block_end","data":{"blockId":"b1"}}
{"type":"block_data","data":{"itemId":"prod-42","payload":{"id":"prod-42","name":"Wireless Headphones","price":199.99,"image":"..."}}}
{"type":"block_start","data":{"blockId":"b2","type":"products","domain":"product","ids":["prod-42"]}}
{"type":"block_end","data":{"blockId":"b2"}}
{"type":"item_delta","data":{"itemId":"prod-42","field":"reason","delta":"Great noise cancellation."}}
{"type":"prompts","data":[{"id":"p1","label":"Show me cheaper options"}]}
{"type":"complete","data":{"finished":true}}
Chunk Types
The protocol has two layers: content chunks carry the actual answer, and flow-control chunks carry metadata, progress, and lifecycle events.
Content (block protocol)
| Type | Data | Description |
|---|---|---|
block_start | { blockId, type, domain?, ids?, renderHint? } | Begins a block. type is text, products, posts, or custom. |
block_delta | { blockId, content } | Appends prose to an open text block (composer streaming). |
block_data | { itemId, payload } | Full structured payload for an item (product, post, custom card). Keyed by id. Sent as soon as identifying fields parse from the curator's stream. |
item_delta | { itemId, field, delta } | Appends a delta to a named field on a previously-emitted item (e.g. summary for posts, reason for products, any field listed in composerStreamingFields for custom domains). The widget mutates payloadCache[itemId][field] and re-renders the card. |
block_end | { blockId } | Closes the current block. |
block_discard | { reason? } | Drop all partial block content for this turn. Emitted when the composer fails mid-stream so fallback prose under a fresh f-prefixed block id namespace doesn't concatenate with partial composer text. Reset the message's text content, card arrays, and any per-turn payload caches when you see this. |
A typical multi-domain answer interleaves text and structured blocks in the order the composer chose:
text block "Here are jackets that match your size..."
products block (3 items)
text block "And these articles cover sizing tips:"
posts block (2 items)
block_data arrives before the corresponding block_start for structured blocks, so the widget can resolve ids from a payload cache.
Flow control
| Type | Data | Description |
|---|---|---|
progress | { message, stage } | Lifecycle stage update (typed enum). See Progress stages below. |
prompts | ChatPrompt[] | Suggested follow-up prompts. |
metadata | { lastAgent, turnCount, finished } | Per-turn metadata; emitted near end of stream. |
complete | { finished } | Stream finished. |
error | { error } | Error message. The stream ends after an error chunk. |
domain_offer | Offer object | Live agent / contact-form offer (action buttons). |
domain_offer_success | Result object | The offer was accepted. |
agent_status | Presence info | Live agent typing / online status. |
handoff_ended | {} | Live agent session ended. |
analytics | { ... } | Optional analytics breadcrumbs. |
usage | { ... } | Optional per-call usage / cost data. |
Progress stages
progress.stage is a typed enum so clients can render stage-specific affordances (e.g. dim or replace prior progress messages):
| Stage | When |
|---|---|
understand | Triage / classification phase. |
search | Knowledge / product retrieval is running. |
lookup | Generic data lookup. |
compose | Domain curators returned; composer is weaving the answer. |
connect | Routing to a live agent. |
tool:<name> | A custom-domain tool is running (e.g. tool:get_orders). The <name> is the tool's function name. Per-tool messages can be customized via CustomDomainConfig.toolProgressMessages (object keyed by tool name → message); unmapped tools fall back to an auto-derived Calling <human readable name>... message. |
Two of the messages are workspace-customizable for branding consistency: triageProgressMessage (default: "Understanding your question...") and composerProgressMessage (default: "Putting it all together..."). Both live in WorkspaceSettings.
JavaScript Implementation
Reading the Stream
import { createClient, chat } from '@savantoai/ai-sdk';
const client = createClient({
baseUrl: 'https://api.savanto.ai',
auth: process.env.SAVANTO_SECRET_KEY!,
});
const { response } = await chat({
client,
body: { message: 'Tell me about your products', threadId: 'session-123', stream: true },
throwOnError: true,
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() ?? '';
for (const line of lines) {
if (!line.trim()) continue;
const chunk = JSON.parse(line);
switch (chunk.type) {
case 'block_delta':
process.stdout.write(chunk.data.content);
break;
case 'block_data':
console.log('\nItem:', chunk.data.itemId, chunk.data.payload);
break;
case 'complete':
console.log('\nDone');
break;
}
}
}
React Hook
The block protocol maps cleanly onto React state — keep a payload cache (itemId → payload) plus an ordered list of blocks, and apply deltas in place.
import { useState, useCallback } from 'react';
type Block =
| { id: string; kind: 'text'; text: string }
| { id: string; kind: 'products' | 'posts' | 'custom'; ids: string[] };
interface AssistantTurn {
blocks: Block[];
payloadCache: Record<string, Record<string, unknown>>;
isStreaming: boolean;
}
export function useStreamingChat(apiKey: string, threadId: string) {
const [turn, setTurn] = useState<AssistantTurn>({ blocks: [], payloadCache: {}, isStreaming: false });
const sendMessage = useCallback(async (text: string) => {
setTurn({ blocks: [], payloadCache: {}, isStreaming: true });
const res = await fetch('https://api.savanto.ai/chat', {
method: 'POST',
headers: { Authorization: `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ message: text, threadId, stream: true }),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() ?? '';
for (const line of lines) {
if (!line.trim()) continue;
const { type, data } = JSON.parse(line);
setTurn((prev) => {
switch (type) {
case 'block_start':
if (data.type === 'text') {
return { ...prev, blocks: [...prev.blocks, { id: data.blockId, kind: 'text', text: '' }] };
}
return {
...prev,
blocks: [...prev.blocks, { id: data.blockId, kind: data.type, ids: data.ids ?? [] }],
};
case 'block_delta':
return {
...prev,
blocks: prev.blocks.map((b) =>
b.id === data.blockId && b.kind === 'text' ? { ...b, text: b.text + data.content } : b,
),
};
case 'block_data':
return {
...prev,
payloadCache: { ...prev.payloadCache, [data.itemId]: data.payload },
};
case 'item_delta': {
const cached = prev.payloadCache[data.itemId];
if (!cached) return prev;
const current = (cached[data.field] as string | undefined) ?? '';
return {
...prev,
payloadCache: {
...prev.payloadCache,
[data.itemId]: { ...cached, [data.field]: current + data.delta },
},
};
}
case 'complete':
return { ...prev, isStreaming: false };
default:
return prev;
}
});
}
}
}, [apiKey, threadId]);
return { turn, sendMessage };
}
Rendering
function AssistantTurnView({ turn }: { turn: AssistantTurn }) {
return (
<div className="message">
{turn.blocks.map((block) => {
if (block.kind === 'text') {
return (
<p key={block.id}>
{block.text}
{turn.isStreaming && <span className="cursor">▋</span>}
</p>
);
}
return (
<div key={block.id} className={`block-${block.kind}`}>
{block.ids.map((id) => (
<Card key={id} payload={turn.payloadCache[id]} />
))}
</div>
);
})}
</div>
);
}
Server-Side Proxy
Proxy streaming requests through your backend so the secret key stays server-side:
Next.js Route Handler
// app/api/chat/route.ts
export async function POST(request: Request) {
const { message, threadId } = await request.json();
const upstream = await fetch('https://api.savanto.ai/chat', {
method: 'POST',
headers: {
Authorization: `Bearer ${process.env.SAVANTO_SECRET_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ message, threadId, stream: true }),
});
return new Response(upstream.body, {
headers: {
'Content-Type': 'application/x-ndjson',
'Cache-Control': 'no-cache',
},
});
}
Custom domain blocks
Workspace-defined custom domains (orders, account, inventory, real-estate, etc.) emit custom-typed blocks. The block_start carries a renderHint so your client knows how to display the data:
| Render hint | Default UX |
|---|---|
card | Standard card per item (orders, listings, accounts). |
inline | Inline list under prose (lightweight rows). |
prose-only | No visual block — payload is consumed by the composer text only. |
gallery | Image-forward grid (real estate, recipes, products). |
Custom payload is workspace-defined JSON. Fields listed in CustomDomainConfig.composerStreamingFields will animate via item_delta as the curator generates them.
Styling the Streaming Cursor
.cursor {
display: inline-block;
animation: blink 1s infinite;
margin-left: 2px;
}
@keyframes blink {
0%, 50% { opacity: 1; }
51%, 100% { opacity: 0; }
}
Best Practices
- Show a typing indicator while waiting for the first chunk.
- Render
progress.messageduring the initial stages — sub-second feedback is critical. - Cache payloads by item id.
block_dataarrives before itsblock_start; cache lookups make rendering trivial. - Apply
item_deltain place — never re-fetch a payload. - Implement a stop button for long responses.
- Auto-scroll as content appears.
- Handle reconnection for network interruptions.
- Cache thread context to resume after errors.
- Rate-limit sends on the client to prevent abuse.
Next Steps
- AI Chat — Chat customization and thread management
- Webhooks — Per-turn server-side notifications
- API Reference — Full
/chatendpoint schema