Every client conversation we have in 2026 eventually circles back to the same question: "Can we just put a chatbot on the site?" The honest answer is yes, you can ship one in a weekend if it only needs to answer FAQs. The useful answer is more interesting. A chatbot that actually moves business metrics, books meetings, qualifies leads, deflects support tickets, takes thought. Here is how we build them on Next.js today, and where the sharp edges still are.
Choosing the model layer
We default to the Vercel AI SDK on the client and server. It abstracts the streaming protocol, gives you useChat out of the box, and lets you swap providers without rewriting your API route. That last point matters more than people think. A year ago we wrote chatbot code that hardcoded one provider's SDK. We have since rewritten that code twice. Treat the model as a swappable dependency.
For the model itself, we pick based on the job:
- Anthropic Claude for anything that handles ambiguous customer input, sensitive content, or long context. Strong instruction following and a measured tone that fits brand voice work.
- OpenAI GPT when you need tight tool calling, structured JSON output, or you are deep in their ecosystem already (Assistants, image generation, voice).
- Open-weights models via a hosted inference provider when you need cost predictability at scale, or you have a regulated workload that benefits from a stricter data path.
Pick one as your primary, set up a fallback for the other, and route based on task type if you have the volume to justify it.
A real streaming route
Here is a stripped-down streaming chat route using the AI SDK with Next.js App Router. This is the shape we ship.
// app/api/chat/route.ts
import { streamText, convertToCoreMessages } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { checkRateLimit } from "@/lib/rate-limit";
export const runtime = "edge";
export const maxDuration = 30;
export async function POST(req: Request) {
const { messages, sessionId } = await req.json();
const ok = await checkRateLimit(sessionId, { rpm: 12, rph: 80 });
if (!ok) {
return new Response("Rate limit exceeded", { status: 429 });
}
const result = streamText({
model: anthropic("claude-sonnet-4-7"),
system:
"You are a helpful assistant for an agency website. Answer concisely. " +
"If asked about pricing or scope, suggest contacting the team.",
messages: convertToCoreMessages(messages),
maxSteps: 4,
tools: {
bookConsult: {
description: "Start a consultation booking flow",
parameters: { type: "object", properties: {} },
},
},
});
return result.toDataStreamResponse();
}
On the client, useChat from @ai-sdk/react handles the streaming protocol for you. Less than fifty lines of UI and you have a working chat surface. The hard part is everything else.
Tool calling is where the value lives
A chatbot that only answers questions is a fancier FAQ page. A chatbot that can do things is a product. Tool calling lets the model invoke typed functions you define, then continue the conversation with the result. Common shapes we ship:
- Lookup tools that hit your own search index or CMS to ground responses in real content. This kills most hallucination on factual questions.
- Action tools like booking a meeting, creating a support ticket, or sending a quote request to your CRM.
- Handoff tools that escalate to a human and stash the transcript so the human starts in context, not from zero.
The trap is over-tooling. Every tool you add is a new attack surface and a new failure mode. Start with two or three high value tools and instrument them before you add more.
Cost control and rate limits
LLM costs do not look like SaaS costs. They scale with usage, context length, and how chatty your system prompt is. Three rules we follow on every build:
- Cap tokens per request. Set a hard ceiling on output length and trim history aggressively. Most chats do not need the last twenty turns.
- Per-session and per-IP rate limits. Twelve requests per minute is plenty for a real human. Anyone hitting that ceiling is either testing or abusing.
- Server side caching for common questions. If the same five questions account for half your volume, cache the response with a short TTL and serve from the edge.
We also keep a daily spend cap baked into the deploy. If something goes wrong, we want a billing surprise to be small.
Hosting: Edge vs Node
Vercel Edge runtime is the right default for streaming chat. Low cold starts, the streaming protocol works cleanly, and you can colocate sessions in a region close to the user. Use Node runtime when you need a library that does not work on Edge (heavy crypto, certain database drivers, file system access). For self-hosted Next.js, stick with Node and put a reverse proxy in front that supports chunked transfer.
If your traffic is bursty and unpredictable, serverless wins. If it is steady and high volume, a container on a small VPS will be cheaper. Run the numbers before you commit.
The risks nobody wants to talk about
- Hallucination. Even with grounding, models invent details under pressure. Never let the bot quote prices or commit to delivery dates without a human in the loop.
- Prompt injection. If your bot reads user content, treats it as instructions, then takes action, you have a security problem. Treat all user input as untrusted, and never put secrets in the system prompt thinking it is private.
- PII leakage. Logs are the most common leak. Scrub before you store, and decide upfront whether transcripts are training data for your future fine tunes or one way ephemeral data.
What this actually costs to build
A useful internal chatbot, scoped to one knowledge base with two or three tools and a clean handoff, ships in two to three weeks. A customer facing chatbot that represents your brand, handles edge cases, integrates with your CRM, and survives a real launch, is a six to ten week project. Anyone quoting you less is dropping in a template or has not shipped one in production.
If you want help scoping or building one, talk to us.
Tags