Streaming Chat

The /chat endpoint streams responses as NDJSON (newline-delimited JSON). Each line is a self-contained JSON object — a chunk of the assistant's answer.

Savanto's chat pipeline is multi-agent: a triage agent classifies intent, domain curators (product, post, custom) run in parallel, and a response composer weaves their output into a single cohesive multi-block answer. The streaming protocol mirrors that architecture: instead of "one bubble per agent" you get one ordered stream of typed blocks that the client renders inline (text, product cards, post cards, custom cards) as the model is still writing.

Why Streaming?

Regular Chat	Streaming Chat
Wait for the complete response	See text and cards appear in real-time
3–10 second perceived delay	Sub-second time-to-first-token
Single payload	Block-by-block delivery

Basic Request

curl -N -X POST https://api.savanto.ai/chat \
  -H "Authorization: Bearer if_sk_xxx" \
  -H "Content-Type: application/json" \
  -d '{"message": "Tell me about your products", "threadId": "session-123", "stream": true}'

The response is a stream of JSON objects, one per line:

{"type":"progress","data":{"message":"Analyzing your question...","stage":"triage"}}
{"type":"block_start","data":{"blockId":"b1","type":"text","domain":null}}
{"type":"block_delta","data":{"blockId":"b1","content":"Here are some "}}
{"type":"block_delta","data":{"blockId":"b1","content":"options I think you'll like."}}
{"type":"block_end","data":{"blockId":"b1"}}
{"type":"block_data","data":{"itemId":"prod-42","payload":{"id":"prod-42","name":"Wireless Headphones","price":199.99,"image":"..."}}}
{"type":"block_start","data":{"blockId":"b2","type":"products","domain":"product","ids":["prod-42"]}}
{"type":"block_end","data":{"blockId":"b2"}}
{"type":"item_delta","data":{"itemId":"prod-42","field":"reason","delta":"Great noise cancellation."}}
{"type":"prompts","data":[{"id":"p1","label":"Show me cheaper options"}]}
{"type":"complete","data":{"finished":true}}

Chunk Types

The protocol has two layers: content chunks carry the actual answer, and flow-control chunks carry metadata, progress, and lifecycle events.

Content (block protocol)

Type	Data	Description
`block_start`	`{ blockId, type, domain?, ids?, renderHint? }`	Begins a block. `type` is `text`, `products`, `posts`, or `custom`.
`block_delta`	`{ blockId, content }`	Appends prose to an open `text` block (composer streaming). A text block that follows earlier prose in the same message begins with a `\n\n` paragraph-separator delta — clients that accumulate text per block should expect a leading blank line on non-first blocks, and clients that accumulate one running string get correct paragraph breaks for free.
`block_data`	`{ itemId, payload }`	Full structured payload for an item (product, post, custom card). Keyed by id. Sent as soon as identifying fields parse from the curator's stream.
`item_delta`	`{ itemId, field, delta }`	Appends a delta to a named field on a previously-emitted item (e.g. `summary` for posts, `reason` for products, any field listed in `composerStreamingFields` for custom domains). The widget mutates `payloadCache[itemId][field]` and re-renders the card.
`block_end`	`{ blockId }`	Closes the current block.
`block_discard`	`{ reason? }`	Drop all partial block content for this turn. Emitted when the composer fails mid-stream so fallback prose under a fresh `f`-prefixed block id namespace doesn't concatenate with partial composer text. Reset the message's text content, card arrays, and any per-turn payload caches when you see this.

A typical multi-domain answer interleaves text and structured blocks in the order the composer chose:

text block "Here are jackets that match your size..."
products block (3 items)
text block "And these articles cover sizing tips:"
posts block (2 items)

block_data arrives before the corresponding block_start for structured blocks, so the widget can resolve ids from a payload cache.

Flow control

Type	Data	Description
`progress`	`{ message, stage }`	Lifecycle stage update (typed enum). See Progress stages below.
`prompts`	`ChatPrompt[]`	Suggested follow-up prompts.
`metadata`	`{ lastAgent, turnCount, finished }`	Per-turn metadata; emitted near end of stream.
`complete`	`{ finished }`	Stream finished.
`error`	`{ error }`	Error message. The stream ends after an error chunk.
`domain_offer`	Offer object	Live agent / contact-form offer (action buttons).
`domain_offer_success`	Result object	The offer was accepted.
`agent_status`	Presence info	Live agent typing / online status.
`handoff_ended`	`{}`	Live agent session ended.
`analytics`	`{ ... }`	Optional analytics breadcrumbs.
`usage`	`{ ... }`	Optional per-call usage / cost data.

Progress stages

progress.stage is a typed enum so clients can render stage-specific affordances (e.g. dim or replace prior progress messages):

Stage	When
`understand`	Triage / classification phase.
`search`	Knowledge / product retrieval is running.
`lookup`	Generic data lookup.
`compose`	Domain curators returned; composer is weaving the answer.
`connect`	Routing to a live agent.
`tool:<name>`	A custom-domain tool is running (e.g. `tool:get_orders`). The `<name>` is the tool's function name. Per-tool messages can be customized via `CustomDomainConfig.toolProgressMessages` (object keyed by tool name → message); unmapped tools fall back to an auto-derived `Calling <human readable name>...` message.

Two of the messages are workspace-customizable for branding consistency: triageProgressMessage (default: "Understanding your question...") and composerProgressMessage (default: "Putting it all together..."). Both live in WorkspaceSettings.

JavaScript Implementation

Reading the Stream

import { createClient, chat } from '@savantoai/ai-sdk';

const client = createClient({
  baseUrl: 'https://api.savanto.ai',
  auth: process.env.SAVANTO_SECRET_KEY!,
});

const { response } = await chat({
  client,
  body: { message: 'Tell me about your products', threadId: 'session-123', stream: true },
  throwOnError: true,
});

const reader = response.body!.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() ?? '';

  for (const line of lines) {
    if (!line.trim()) continue;
    const chunk = JSON.parse(line);

    switch (chunk.type) {
      case 'block_delta':
        process.stdout.write(chunk.data.content);
        break;
      case 'block_data':
        console.log('\nItem:', chunk.data.itemId, chunk.data.payload);
        break;
      case 'complete':
        console.log('\nDone');
        break;
    }
  }
}

React Hook

The block protocol maps cleanly onto React state — keep a payload cache (itemId → payload) plus an ordered list of blocks, and apply deltas in place.

import { useState, useCallback } from 'react';

type Block =
  | { id: string; kind: 'text'; text: string }
  | { id: string; kind: 'products' | 'posts' | 'custom'; ids: string[] };

interface AssistantTurn {
  blocks: Block[];
  payloadCache: Record<string, Record<string, unknown>>;
  isStreaming: boolean;
}

export function useStreamingChat(apiKey: string, threadId: string) {
  const [turn, setTurn] = useState<AssistantTurn>({ blocks: [], payloadCache: {}, isStreaming: false });

  const sendMessage = useCallback(async (text: string) => {
    setTurn({ blocks: [], payloadCache: {}, isStreaming: true });

    const res = await fetch('https://api.savanto.ai/chat', {
      method: 'POST',
      headers: { Authorization: `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
      body: JSON.stringify({ message: text, threadId, stream: true }),
    });

    const reader = res.body!.getReader();
    const decoder = new TextDecoder();
    let buffer = '';

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      buffer += decoder.decode(value, { stream: true });
      const lines = buffer.split('\n');
      buffer = lines.pop() ?? '';

      for (const line of lines) {
        if (!line.trim()) continue;
        const { type, data } = JSON.parse(line);

        setTurn((prev) => {
          switch (type) {
            case 'block_start':
              if (data.type === 'text') {
                return { ...prev, blocks: [...prev.blocks, { id: data.blockId, kind: 'text', text: '' }] };
              }
              return {
                ...prev,
                blocks: [...prev.blocks, { id: data.blockId, kind: data.type, ids: data.ids ?? [] }],
              };

            case 'block_delta':
              return {
                ...prev,
                blocks: prev.blocks.map((b) =>
                  b.id === data.blockId && b.kind === 'text' ? { ...b, text: b.text + data.content } : b,
                ),
              };

            case 'block_data':
              return {
                ...prev,
                payloadCache: { ...prev.payloadCache, [data.itemId]: data.payload },
              };

            case 'item_delta': {
              const cached = prev.payloadCache[data.itemId];
              if (!cached) return prev;
              const current = (cached[data.field] as string | undefined) ?? '';
              return {
                ...prev,
                payloadCache: {
                  ...prev.payloadCache,
                  [data.itemId]: { ...cached, [data.field]: current + data.delta },
                },
              };
            }

            case 'complete':
              return { ...prev, isStreaming: false };

            default:
              return prev;
          }
        });
      }
    }
  }, [apiKey, threadId]);

  return { turn, sendMessage };
}

Rendering

function AssistantTurnView({ turn }: { turn: AssistantTurn }) {
  return (
    <div className="message">
      {turn.blocks.map((block) => {
        if (block.kind === 'text') {
          return (
            <p key={block.id}>
              {block.text}
              {turn.isStreaming && <span className="cursor">▋</span>}
            </p>
          );
        }
        return (
          <div key={block.id} className={`block-${block.kind}`}>
            {block.ids.map((id) => (
              <Card key={id} payload={turn.payloadCache[id]} />
            ))}
          </div>
        );
      })}
    </div>
  );
}

Server-Side Proxy

Proxy streaming requests through your backend so the secret key stays server-side:

Next.js Route Handler

// app/api/chat/route.ts
export async function POST(request: Request) {
  const { message, threadId } = await request.json();

  const upstream = await fetch('https://api.savanto.ai/chat', {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${process.env.SAVANTO_SECRET_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ message, threadId, stream: true }),
  });

  return new Response(upstream.body, {
    headers: {
      'Content-Type': 'application/x-ndjson',
      'Cache-Control': 'no-cache',
    },
  });
}

Custom domain blocks

Workspace-defined custom domains (orders, account, inventory, real-estate, etc.) emit custom-typed blocks. The block_start carries a renderHint so your client knows how to display the data:

Render hint	Default UX
`card`	Standard card per item (orders, listings, accounts).
`inline`	Inline list under prose (lightweight rows).
`prose-only`	No visual block — payload is consumed by the composer text only.
`gallery`	Image-forward grid (real estate, recipes, products).

Custom payload is workspace-defined JSON. Fields listed in CustomDomainConfig.composerStreamingFields will animate via item_delta as the curator generates them.

Styling the Streaming Cursor

.cursor {
  display: inline-block;
  animation: blink 1s infinite;
  margin-left: 2px;
}

@keyframes blink {
  0%, 50% { opacity: 1; }
  51%, 100% { opacity: 0; }
}

Best Practices

Show a typing indicator while waiting for the first chunk.
Render progress.message during the initial stages — sub-second feedback is critical.
Cache payloads by item id. block_data arrives before its block_start; cache lookups make rendering trivial.
Apply item_delta in place — never re-fetch a payload.
Implement a stop button for long responses.
Auto-scroll as content appears.
Handle reconnection for network interruptions.
Cache thread context to resume after errors.
Rate-limit sends on the client to prevent abuse.

Next Steps

AI Chat — Chat customization and thread management
Webhooks — Per-turn server-side notifications
API Reference — Full /chat endpoint schema