Customer Support

Build an internal support knowledge agent

An agent that answers your support team's questions from your own help docs, macros, and past tickets, with a citation on every answer so reps can verify before they reply. It speeds up handle time and onboarding without putting an unverified bot in front of customers.

7 min read2026-06-17Human in the loopMedium-sensitivity data

Ease

3/5

Impact

4/5

Risk

2/5

Tools you'll use

Claude CodeCodexClaude Cowork

An internal support knowledge agent is an AI assistant that answers your support team's questions from your own material — help center articles, saved replies, policy docs, and resolved tickets — with a citation on every answer so a rep can verify the source in one click before they reply to a customer. It sits behind your team, not in front of your customers. A rep asks a plain-English question — "What's our refund window for annual plans bought through a reseller?" — and gets one grounded answer instead of digging through scattered docs.

This matters because support knowledge is scattered and goes stale. The canonical answer lives in a Confluence page, an old Slack thread, a macro, and the head of a senior rep who is on holiday. The cost is real: McKinsey's "social economy" report found interaction workers spend nearly 20% of the workweek just looking for internal information or tracking down colleagues. Knowledge upkeep lags too — a 2024 Gartner survey of customer service leaders found 61% had backlogs editing knowledge articles, and over a third had no formal process for revising outdated material. The result is wasted minutes per ticket, slow onboarding, and different reps giving different answers to the same question.

Starting internal rather than customer-facing is about risk. A wrong answer to a rep gets caught by the rep; a wrong answer to a customer ships. You get most of the value — faster, more consistent support, shorter onboarding — while a human stays in the loop on every reply. Once the agent has earned trust on internal traffic and you have the accuracy numbers to prove it, you have a measured, defensible path to wider use.

Moriva's take

This clears all three gates. Gate 1 (real work): your team searches for answers on every ticket, every day, so the agent attaches to a workflow that runs constantly. Gate 2 (owned): one of your operators can build and run this with Claude Code or Codex against your own docs — it's your pipeline, your prompts, your data, and you can fix and extend it without us. Gate 3 (measured): citation coverage, handle time, and rep thumbs-up/down give you clean numbers from week one. Start internal with a human in the loop; it's a clear GO.

How do you build an internal support knowledge agent?

1
Pull your real knowledge into one place
Point Claude Code or Codex at your actual sources: exported help center articles, saved replies and macros, policy PDFs, and a sample of resolved tickets. Ask it to write a small ingestion script that pulls each source, strips boilerplate, and tags every chunk with its origin (URL, doc title, last-updated date). Owning this script matters — when a doc changes, you re-run it; you are not waiting on a vendor's re-index.
2
Chunk and index for hybrid retrieval
Have the tool split documents into passages (a common starting point is roughly 500-token chunks with a small overlap so context isn't cut mid-sentence) and build both a keyword index and a semantic (embedding) index. Hybrid search — keyword plus semantic, with the top results re-ranked — consistently beats either one alone for support questions, because reps mix exact product terms with vague phrasing. Ask for the two indexes and a simple re-rank step combining their scores.
3
Write a grounding prompt that forces citations
Instruct the agent to answer only from the retrieved passages, quote or cite the source on every claim, and say "I don't have a confident answer" rather than guess when the passages don't cover the question. This single rule — answer from sources or escalate, never invent — is what separates a useful internal tool from a liability. Have Claude Code bake it into the system prompt and print the source link beside each answer.
4
Wrap it where reps already work
Ask the tool to expose the agent as a simple internal interface — a Slack slash command, a small internal web page, or a sidebar in your helpdesk. The goal is zero context-switching: a rep asks from the ticket they're working and gets a cited answer in place. Keep it read-only at first; the agent suggests, the rep sends.
5
Build an evaluation set before you trust it
Have Claude Cowork or Claude Code assemble 50-100 real questions with known correct answers, weighted toward high-stakes topics (refunds, cancellations, security, pricing). Run the agent against them and score two things: is the answer faithful to the cited source, and does it correctly say "I don't know" when the answer isn't in the docs. This is your accuracy baseline and the gate you re-run after every change.
6
Find and fill the content gaps
The agent will expose where your knowledge base is thin — questions it can't answer because no document covers them. Have Claude Cowork cluster the unanswered and thumbs-down questions into a ranked list of missing or stale articles, so a non-coder on the team can write the fixes. The agent gets better as the docs do, and you now have a feedback loop that improves both.
7
Roll out gradually and watch the signals
Start with a few volunteer reps, then widen. Track the leading warning signs from production: answers with low source relevance but high confidence, missing citations on policy claims, and tickets that get reopened after a rep used an agent answer. Keep a human in the loop on every customer reply until the numbers earn more trust.

What could go wrong (and how to handle it)

The agent invents an answer (hallucinates) when the docs don't cover the question.

Require source citations on every claim and an explicit "I don't have a confident answer" path. Score faithfulness on your evaluation set and block changes that raise the unsupported-answer rate. Keep a rep reviewing before anything reaches a customer.

Stale or contradictory source documents produce confidently wrong answers.

Tag every chunk with a last-updated date and prefer newer sources. Re-run ingestion on a schedule. Use the gap report to retire or merge conflicting articles so there is one canonical answer per topic.

Sensitive data (customer PII, internal policy) leaks into the index or logs.

Scope ingestion to approved sources, exclude raw PII from ticket samples, and keep the index and any prompts/responses on infrastructure you control. Review what gets logged. This is your pipeline, so you set the data boundary.

Reps over-trust the agent and stop verifying.

Keep the agent read-only and suggestion-only; the rep always sends the reply. Make the citation link prominent so checking the source is one click. Spot-audit a sample of agent-assisted replies each week.

Quietly drifting accuracy as docs and products change.

Re-run the evaluation set on a schedule and after every ingestion. Monitor reopen rate and thumbs-down trends. Treat a drop in faithfulness as a release blocker, not a nice-to-have.

Scope creep into a customer-facing bot before it's earned trust.

Stay internal until you have weeks of accuracy data and a low reopen rate. A customer-facing version is a separate, higher-risk decision with its own guardrails and gradual rollout, not a flip of a switch.

Prompts to get started

Build the ingestion and index

Here are exports of our help center articles, saved replies, and three policy PDFs in this folder. Write a script that loads each file, splits it into ~500-token chunks with small overlap, and tags every chunk with its source title, URL, and last-updated date. Then build a keyword index and an embedding index over the chunks, plus a re-rank step that combines both scores. Keep everything runnable by me from the command line.

Write the grounded answer behavior

Create the system prompt and answer function for our internal support agent. It must answer only from the retrieved passages, put a clickable citation next to every factual claim, and respond with 'I don't have a confident answer — escalate' when the passages don't cover the question. Show me the top three sources it used for each answer.

Stand up the evaluation set

From this CSV of 80 real rep questions and their known-correct answers, build an evaluation harness. For each question, score whether the agent's answer is faithful to the source it cited and whether it correctly abstains when the answer isn't in our docs. Output a summary table and flag every wrong or unsupported answer so I can review them.

Cluster the content gaps

Here is a month of questions the agent couldn't answer or that reps marked thumbs-down. Group them into themes, rank the themes by how often they came up, and for each one tell me whether we're missing an article or have a stale/conflicting one. Give me a prioritized list a non-engineer can use to fix the knowledge base.

FAQ

Isn't this just a chatbot? How is it different?

A chatbot guesses from general training. This agent answers only from your own documents and shows the source for every claim, so a rep can verify in one click. And it sits behind your team, not in front of customers — a human still sends every reply. The point is grounded, checkable answers, not autonomous conversation.

What if it gives a wrong answer?

Two safeguards. First, every answer is grounded in a cited source the rep checks before sending, so mistakes get caught internally instead of shipping to a customer. Second, you run an evaluation set that scores how often answers are faithful to their sources and how often the agent correctly says 'I don't know.' You don't widen the rollout until those numbers hold up.

Do we need to hire engineers or keep paying a consultant?

No. One operator can stand up the first version with Claude Code or Codex pointed at your real docs, and a non-coder can run the content-gap work with Claude Cowork. It's your pipeline, your prompts, and your data — you re-run ingestion when docs change and adjust the prompt yourself. We help you start; you own and extend it.

How long until we see value, and how do we measure it?

A focused team can have a working internal version in about a week. Measure handle time before and after, rep thumbs-up/down on answers, citation coverage on high-stakes topics, and the reopen rate on tickets where a rep used an agent answer. Those numbers tell you the time saved and whether accuracy is holding.

Is our data safe?

You control where it lives. The index, prompts, and logs run on infrastructure you choose, ingestion is scoped to sources you approve, and you keep raw customer PII out of the ticket samples. Because you built the pipeline, the data boundary is yours to set and audit — not a black box you have to trust.

Sources

Interaction workers spend nearly 20% of the workweek looking for internal information or tracking down colleagues who can help with specific tasks. — McKinsey Global Institute, The social economy
61% of customer service leaders reported backlogs in editing knowledge articles, and over one-third admitted to lacking formal processes for revising outdated materials (survey of 187 customer service and support leaders, July-August 2024). — Gartner, 2024

More from Customer Support

Customer Support

Draft and triage support replies from your knowledge base

An agent reads each incoming ticket, classifies and routes it, then drafts a reply grounded in your own help docs with citations — leaving your team to review, edit, and send. You own the whole pipeline and can fix or extend it without us.

Ease

Impact

Risk

Claude CodeCodexClaude Cowork

Read the guide Customer Support

Ticket-trend analysis and macro suggestions

Turn your raw ticket exports into a weekly read of what customers are actually contacting you about, then use those clusters to spot missing or stale macros. The output is a report and a script your support team owns and runs itself.

Ease

Impact

Risk

Claude CodeCodexClaude Cowork

Read the guide Sales

Prep renewals and spot upsell openings

Build a repeatable agent that pulls usage, support, and account data into a per-customer renewal brief flagging churn risk and concrete upsell openings, 90 days ahead of each renewal date.

Ease

Impact

Risk

Claude CodeCodexClaude Cowork

Read the guide