developers/Streaming Chat

Streaming Chat

The /chat endpoint streams responses as NDJSON (newline-delimited JSON). Each line is a self-contained JSON object — a chunk of the assistant's answer.

Savanto's chat pipeline is multi-agent: a triage agent classifies intent, domain curators (product, post, custom) run in parallel, and a response composer weaves their output into a single cohesive multi-block answer. The streaming protocol mirrors that architecture: instead of "one bubble per agent" you get one ordered stream of typed blocks that the client renders inline (text, product cards, post cards, custom cards) as the model is still writing.

Why Streaming?

Regular ChatStreaming Chat
Wait for the complete responseSee text and cards appear in real-time
3–10 second perceived delaySub-second time-to-first-token
Single payloadBlock-by-block delivery

Basic Request

curl -N -X POST https://api.savanto.ai/chat \
  -H "Authorization: Bearer if_sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{"message": "Tell me about your products", "threadId": "session-123", "stream": true}'

The response is a stream of JSON objects, one per line:

{"type":"progress","data":{"message":"Analyzing your question...","stage":"triage"}}
{"type":"block_start","data":{"blockId":"b1","type":"text","domain":null}}
{"type":"block_delta","data":{"blockId":"b1","content":"Here are some "}}
{"type":"block_delta","data":{"blockId":"b1","content":"options I think you'll like."}}
{"type":"block_end","data":{"blockId":"b1"}}
{"type":"block_data","data":{"itemId":"prod-42","payload":{"id":"prod-42","name":"Wireless Headphones","price":199.99,"image":"..."}}}
{"type":"block_start","data":{"blockId":"b2","type":"products","domain":"product","ids":["prod-42"]}}
{"type":"block_end","data":{"blockId":"b2"}}
{"type":"item_delta","data":{"itemId":"prod-42","field":"reason","delta":"Great noise cancellation."}}
{"type":"prompts","data":[{"id":"p1","label":"Show me cheaper options"}]}
{"type":"complete","data":{"finished":true}}

Chunk Types

The protocol has two layers: content chunks carry the actual answer, and flow-control chunks carry metadata, progress, and lifecycle events.

Content (block protocol)

TypeDataDescription
block_start{ blockId, type, domain?, ids?, renderHint? }Begins a block. type is text, products, posts, or custom.
block_delta{ blockId, content }Appends prose to an open text block (composer streaming).
block_data{ itemId, payload }Full structured payload for an item (product, post, custom card). Keyed by id. Sent as soon as identifying fields parse from the curator's stream.
item_delta{ itemId, field, delta }Appends a delta to a named field on a previously-emitted item (e.g. summary for posts, reason for products, any field listed in composerStreamingFields for custom domains). The widget mutates payloadCache[itemId][field] and re-renders the card.
block_end{ blockId }Closes the current block.
block_discard{ reason? }Drop all partial block content for this turn. Emitted when the composer fails mid-stream so fallback prose under a fresh f-prefixed block id namespace doesn't concatenate with partial composer text. Reset the message's text content, card arrays, and any per-turn payload caches when you see this.

A typical multi-domain answer interleaves text and structured blocks in the order the composer chose:

text block "Here are jackets that match your size..."
products block (3 items)
text block "And these articles cover sizing tips:"
posts block (2 items)

block_data arrives before the corresponding block_start for structured blocks, so the widget can resolve ids from a payload cache.

Flow control

TypeDataDescription
progress{ message, stage }Lifecycle stage update (typed enum). See Progress stages below.
promptsChatPrompt[]Suggested follow-up prompts.
metadata{ lastAgent, turnCount, finished }Per-turn metadata; emitted near end of stream.
complete{ finished }Stream finished.
error{ error }Error message. The stream ends after an error chunk.
domain_offerOffer objectLive agent / contact-form offer (action buttons).
domain_offer_successResult objectThe offer was accepted.
agent_statusPresence infoLive agent typing / online status.
handoff_ended{}Live agent session ended.
analytics{ ... }Optional analytics breadcrumbs.
usage{ ... }Optional per-call usage / cost data.

Progress stages

progress.stage is a typed enum so clients can render stage-specific affordances (e.g. dim or replace prior progress messages):

StageWhen
understandTriage / classification phase.
searchKnowledge / product retrieval is running.
lookupGeneric data lookup.
composeDomain curators returned; composer is weaving the answer.
connectRouting to a live agent.
tool:<name>A custom-domain tool is running (e.g. tool:get_orders). The <name> is the tool's function name. Per-tool messages can be customized via CustomDomainConfig.toolProgressMessages (object keyed by tool name → message); unmapped tools fall back to an auto-derived Calling <human readable name>... message.

Two of the messages are workspace-customizable for branding consistency: triageProgressMessage (default: "Understanding your question...") and composerProgressMessage (default: "Putting it all together..."). Both live in WorkspaceSettings.

JavaScript Implementation

Reading the Stream

import { createClient, chat } from '@savantoai/ai-sdk';

const client = createClient({
  baseUrl: 'https://api.savanto.ai',
  auth: process.env.SAVANTO_SECRET_KEY!,
});

const { response } = await chat({
  client,
  body: { message: 'Tell me about your products', threadId: 'session-123', stream: true },
  throwOnError: true,
});

const reader = response.body!.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() ?? '';

  for (const line of lines) {
    if (!line.trim()) continue;
    const chunk = JSON.parse(line);

    switch (chunk.type) {
      case 'block_delta':
        process.stdout.write(chunk.data.content);
        break;
      case 'block_data':
        console.log('\nItem:', chunk.data.itemId, chunk.data.payload);
        break;
      case 'complete':
        console.log('\nDone');
        break;
    }
  }
}

React Hook

The block protocol maps cleanly onto React state — keep a payload cache (itemId → payload) plus an ordered list of blocks, and apply deltas in place.

import { useState, useCallback } from 'react';

type Block =
  | { id: string; kind: 'text'; text: string }
  | { id: string; kind: 'products' | 'posts' | 'custom'; ids: string[] };

interface AssistantTurn {
  blocks: Block[];
  payloadCache: Record<string, Record<string, unknown>>;
  isStreaming: boolean;
}

export function useStreamingChat(apiKey: string, threadId: string) {
  const [turn, setTurn] = useState<AssistantTurn>({ blocks: [], payloadCache: {}, isStreaming: false });

  const sendMessage = useCallback(async (text: string) => {
    setTurn({ blocks: [], payloadCache: {}, isStreaming: true });

    const res = await fetch('https://api.savanto.ai/chat', {
      method: 'POST',
      headers: { Authorization: `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({ message: text, threadId, stream: true }),
    });

    const reader = res.body!.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop() ?? '';

      for (const line of lines) {
        if (!line.trim()) continue;
        const { type, data } = JSON.parse(line);

        setTurn((prev) => {
          switch (type) {
            case 'block_start':
              if (data.type === 'text') {
                return { ...prev, blocks: [...prev.blocks, { id: data.blockId, kind: 'text', text: '' }] };
              }
              return {
                ...prev,
                blocks: [...prev.blocks, { id: data.blockId, kind: data.type, ids: data.ids ?? [] }],
              };

            case 'block_delta':
              return {
                ...prev,
                blocks: prev.blocks.map((b) =>
                  b.id === data.blockId && b.kind === 'text' ? { ...b, text: b.text + data.content } : b,
                ),
              };

            case 'block_data':
              return {
                ...prev,
                payloadCache: { ...prev.payloadCache, [data.itemId]: data.payload },
              };

            case 'item_delta': {
              const cached = prev.payloadCache[data.itemId];
              if (!cached) return prev;
              const current = (cached[data.field] as string | undefined) ?? '';
              return {
                ...prev,
                payloadCache: {
                  ...prev.payloadCache,
                  [data.itemId]: { ...cached, [data.field]: current + data.delta },
                },
              };
            }

            case 'complete':
              return { ...prev, isStreaming: false };

            default:
              return prev;
          }
        });
      }
    }
  }, [apiKey, threadId]);

  return { turn, sendMessage };
}

Rendering

function AssistantTurnView({ turn }: { turn: AssistantTurn }) {
  return (
    <div className="message">
      {turn.blocks.map((block) => {
        if (block.kind === 'text') {
          return (
            <p key={block.id}>
              {block.text}
              {turn.isStreaming && <span className="cursor">▋</span>}
            </p>
          );
        }
        return (
          <div key={block.id} className={`block-${block.kind}`}>
            {block.ids.map((id) => (
              <Card key={id} payload={turn.payloadCache[id]} />
            ))}
          </div>
        );
      })}
    </div>
  );
}

Server-Side Proxy

Proxy streaming requests through your backend so the secret key stays server-side:

Next.js Route Handler

// app/api/chat/route.ts
export async function POST(request: Request) {
  const { message, threadId } = await request.json();

  const upstream = await fetch('https://api.savanto.ai/chat', {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${process.env.SAVANTO_SECRET_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ message, threadId, stream: true }),
  });

  return new Response(upstream.body, {
    headers: {
      'Content-Type': 'application/x-ndjson',
      'Cache-Control': 'no-cache',
    },
  });
}

Custom domain blocks

Workspace-defined custom domains (orders, account, inventory, real-estate, etc.) emit custom-typed blocks. The block_start carries a renderHint so your client knows how to display the data:

Render hintDefault UX
cardStandard card per item (orders, listings, accounts).
inlineInline list under prose (lightweight rows).
prose-onlyNo visual block — payload is consumed by the composer text only.
galleryImage-forward grid (real estate, recipes, products).

Custom payload is workspace-defined JSON. Fields listed in CustomDomainConfig.composerStreamingFields will animate via item_delta as the curator generates them.

Styling the Streaming Cursor

.cursor {
  display: inline-block;
  animation: blink 1s infinite;
  margin-left: 2px;
}

@keyframes blink {
  0%, 50% { opacity: 1; }
  51%, 100% { opacity: 0; }
}

Best Practices

  1. Show a typing indicator while waiting for the first chunk.
  2. Render progress.message during the initial stages — sub-second feedback is critical.
  3. Cache payloads by item id. block_data arrives before its block_start; cache lookups make rendering trivial.
  4. Apply item_delta in place — never re-fetch a payload.
  5. Implement a stop button for long responses.
  6. Auto-scroll as content appears.
  7. Handle reconnection for network interruptions.
  8. Cache thread context to resume after errors.
  9. Rate-limit sends on the client to prevent abuse.

Next Steps

  • AI Chat — Chat customization and thread management
  • Webhooks — Per-turn server-side notifications
  • API Reference — Full /chat endpoint schema