Everyone wants to "use AI" in their business. Almost nobody wants to do the one thing that makes AI actually useful: train it on what their business actually does, sells, promises, and refuses to do. That's the gap between a chatbot that embarrasses you in front of customers and an AI agent that replies better than your junior support hire.
If you're trying to figure out how to train AI on your business, the short answer is this: you build a knowledge base, you wire it into a retrieval system, you calibrate it for two weeks, and then you keep it fresh. That's it. That's the whole game. This playbook walks you through every step — with specifics, not hand-waving.
By the end, you'll know what to upload, what to never upload, how to write answers the AI can actually use, how to teach it your voice, and how to test it before it ever touches a real customer. Let's go.
What You'll Learn
- Why "just use AI" doesn't work without training
- What a knowledge base actually is (RAG explained simply)
- The 7 types of content to upload
- Audit your existing docs: what you already have
- How to write canonical answers for top-10 questions
- Formats that work (PDF, Markdown, URLs, .txt, past emails)
- Formats and content to never upload
- Tone training: feeding the AI your voice
- Testing and tuning: the 2-week calibration process
- Keeping the KB fresh: monthly review cadence
- What to do next
1. Why "Just Use AI" Doesn't Work Without Training
Pick any SME owner and ask how they're using AI. Nine out of ten will say some version of "I paste stuff into ChatGPT." That works for drafting a LinkedIn post. It doesn't work for replying to customers, because a generic AI knows nothing about your refund policy, your shipping windows, your product SKUs, your warranty terms, or the tone your brand uses.
A generic AI with no training on your business has exactly two failure modes, and they're both bad:
- It makes things up. Asked about your return window, it guesses "30 days" because that's the industry average. Your actual window is 14 days. Now you're honoring a refund you shouldn't, or arguing with a customer who's holding your AI's reply against you.
- It gives useless non-answers. Asked anything specific, it defers: "Please check with the company for details." That's not automation. That's an AI-shaped auto-reply.
Training solves both. Once the AI has access to your actual documents — policies, product specs, past email threads, tone guidelines — it stops guessing. It retrieves. It cites. It answers like someone who's worked at your company for six months, because in a very real sense, it has.
This is why every serious AI email agent, including Leadilla, is built around a knowledge base. The model is just the engine. Your business data is the fuel.
2. What a Knowledge Base Actually Is (RAG Explained Simply)
When people say "knowledge base" in the AI context, they usually mean something specific: a collection of documents indexed for semantic search, wired into a pattern called RAG — Retrieval-Augmented Generation.
Here's RAG without the PhD language:
- A customer emails you a question.
- The AI turns that question into a numerical fingerprint called an embedding.
- It searches your knowledge base for the chunks of text whose fingerprints are closest to the question.
- It pulls those chunks into its working memory.
- It writes an answer using only that retrieved context — plus its base reasoning skills — rather than whatever it vaguely remembers from training on the public internet.
That "only that retrieved context" part is the whole magic. When the AI is constrained to your policies, it stops inventing policies. When it's constrained to your product catalog, it stops inventing products. This is the entire reason a well-built knowledge base takes an AI from "party trick" to "genuine support hire."
You don't need to understand the math. You just need to understand the principle: the AI is only as smart as the documents it can retrieve. Give it good documents and it sounds like a trained employee. Give it nothing and it sounds like a stranger with a confident voice.
If your AI hallucinates, it's almost never a "bad AI" problem. It's a "missing knowledge" problem. 90% of AI customer support failures trace back to a policy or product detail that wasn't in the knowledge base. Fix the source, fix the symptom.
3. The 7 Types of Content to Upload
Not all content is equally useful to an AI. Upload the wrong things and you dilute retrieval. Upload the right things and the AI gets sharper with every document. Here are the seven categories that matter — in priority order.
1. FAQ and help center articles
This is the single highest-signal content you own. Every FAQ entry is already a Q → A pair, which is exactly what RAG wants. If you have a public FAQ, import it first. If you don't, write one — even a rough draft of your top 20 questions beats nothing.
2. Policies (refund, shipping, warranty, privacy, terms)
Policies are the second-highest signal because they're the documents customers ask about most often. "Can I return this?" "Do you ship to Germany?" "What's your warranty on this?" Upload every policy document, even the ones that feel obvious.
3. Product documentation
Specs, compatibility charts, feature lists, user guides, setup instructions. If a customer could ask about it, it needs to be in the knowledge base. Structured data tables work especially well — the AI can cite exact numbers rather than paraphrasing.
4. Past email threads (resolved)
This is the underrated one. Export 50 to 200 of your best past customer email threads — especially the ones where a complaint ended well. These teach the AI how your team actually solves problems, not just what the policy is in theory.
5. Tone guides and brand voice rules
Explicit written rules about how you speak: "We say 'hi' not 'hello.' We never say 'delighted.' We always end with a specific next step." A one-page tone guide punches way above its weight.
6. Escalation rules
Written rules about when the AI should NOT answer and should hand off to a human: keywords, customer types, topics, dollar thresholds. We'll cover this in depth in the calibration section.
7. Brand voice samples
Five to ten emails or pages of marketing copy that you consider "us at our best." Not policy. Not information. Just voice. These become reference material the AI imitates when writing fresh sentences.
The good news: on Leadilla, your knowledge base is unlimited on every plan — Starter, Growth, and Scale. Upload all seven categories without worrying about document caps. Most tools gate KB size behind higher tiers; we don't, because the whole point is to let you train the AI thoroughly from day one.
4. Audit Your Existing Docs: What You Already Have
Before you write a single new document, audit what already exists. Most SMEs have 70–80% of the content they need scattered across Google Drive, Notion, a dusty help center, and their sent folder. The problem isn't absence — it's organization.
Spend 45 minutes doing this:
- Open a blank spreadsheet with columns: Document, Location, Type (FAQ / policy / product / tone / email / other), Last Updated, Status (ready / needs editing / obsolete / missing).
- Walk your Google Drive or file system and list every support-relevant document. Don't edit yet. Just catalog.
- Include your website pages — the FAQ URL, the policies URLs, the product pages. These can be crawled directly.
- Open your sent folder and tag the 20 most recent customer replies you personally wrote. These are tone gold.
- List the gaps. What question types come up weekly that have no written answer? These are your highest-ROI new documents to write.
Almost every SME we've audited finishes this exercise with 30–60 documents already ready to go, and a short list of 5–10 gaps. That's a weekend of work, max, to have a production-grade knowledge base.
5. How to Write Canonical Answers for Your Top-10 Questions
For the ten questions you get most often, don't rely on the AI to synthesize an answer from scattered sources. Write a single, canonical, gold-standard response. This is your voice, your policy, your exact phrasing — the version you wish every customer email looked like.
The format matters. Here's the template that works:
Question: [One-line version of the question, phrased naturally.]
Short answer: [One sentence. The direct answer.]
Full answer: [2–4 sentences with context, caveats, and next steps.]
When this applies: [Any conditions, exclusions, or customer-type nuances.]
When to escalate: [Edge cases where the AI should hand off to a human.]
Example, for a refund request:
Question: Can I get a refund?
Short answer: Yes, within 14 days of purchase if the item is unused and in original packaging.
Full answer: We offer full refunds within 14 days of the original purchase date. The item must be unused and in its original packaging. Shipping is refundable only if the item was damaged or defective. Refunds process in 5–7 business days back to the original payment method.
When this applies: Standard retail purchases. Does not apply to customized items, final-sale items, or orders older than 14 days.
When to escalate: Refund requests outside the 14-day window, disputes over condition, or any customer mentioning a chargeback.
Write ten of these. That's it. Ten canonical answers cover roughly 60–70% of SME support volume. Everything else, the AI assembles from your broader knowledge base.
6. Formats That Work (PDF, Markdown, URLs, .txt, Past Emails)
File format affects retrieval quality more than people expect. Here's what actually works, ranked by signal quality:
| Format | Best For | Signal Quality |
|---|---|---|
| Markdown (.md) | FAQs, canonical answers, structured policy | Excellent — highest |
| Plain text (.txt) | Tone samples, notes, simple references | Excellent |
| Live URLs | Public FAQ, help center, policy pages | Very good — auto-updates |
| Word (.docx) | Existing internal policy documents | Good |
| PDF (text-based) | Product guides, warranty documents | Good |
| Exported emails (.eml, .mbox) | Past threads for tone training | Good |
| CSV (Q&A pairs) | Structured FAQ imports | Excellent |
| Scanned/image PDFs | Don't use until OCR'd | Poor |
Two practical rules:
- Structure beats length. A 300-word Markdown FAQ with clear headers retrieves better than a 3,000-word PDF with no structure. The AI splits long docs into chunks, and well-structured chunks match queries more precisely.
- URLs are your friend when content changes. If your shipping policy page updates quarterly, import the URL rather than a PDF snapshot. Good systems re-crawl on a schedule so your KB stays current.
Ready to build your knowledge base?
Leadilla gives you an unlimited knowledge base on every plan — Starter, Growth, and Scale. Upload FAQs, PDFs, URLs, and past emails. Watch the AI handle your support inbox in under 30 minutes.
Open Free Account7. Formats and Content to Never Upload
Now the other side. Some content actively degrades your AI's performance or creates real risk if it leaks into a customer reply. Hard rules:
Internal-only content
- Slack and Teams chat history. Too much noise, too many jokes, too many half-finished thoughts. The AI will pull the wrong phrasing into customer replies.
- Internal meeting notes. Same problem, plus candid comments about customers you definitely don't want quoted back.
- Employee onboarding docs. Usually phrased for insiders with internal jargon. Confuses retrieval.
Confidential and regulated content
- Contracts with NDAs. Even if the AI never quotes them, storing them in a KB may violate the NDA itself.
- Customer PII dumps. Lists of emails, phone numbers, addresses. Massive risk, zero upside for retrieval.
- Payment data, card numbers, passwords. Never. Not even redacted. Keep these out of any AI system.
- Medical records, legal advice, financial account details. Regulatory landmines.
Low-signal noise
- Draft documents nobody finalized. Outdated policy drafts confuse the AI about what's actually in force.
- Marketing brainstorm docs. Great for humans, terrible for retrieval — full of ideas that aren't policy.
- Duplicate versions of the same document. Pick one canonical version. Archive the rest.
The test is simple: would you be comfortable if a sentence from this document ended up quoted verbatim in a customer email? If the answer is no, it doesn't belong in the knowledge base.
8. Tone Training: Feeding the AI Your Voice
Correct information written in the wrong tone is still a bad reply. A stiff, over-formal response to a casual DTC brand's customer feels as off as a "hey friend!" response from a law firm. Tone training is how you fix this.
Three layers work together:
Layer 1: Tone rules (explicit)
Write a one-page document with concrete dos and don'ts. Not "be friendly" — that's useless. Instead:
- "Open with 'Hi [first name]' — never 'Dear' and never 'Hey.'"
- "Never use the words: delighted, utilize, kindly, endeavor."
- "Prefer contractions: 'we're,' 'you'll,' 'it's.'"
- "End with a specific next step, not 'let us know if you have any other questions.'"
- "If the customer is frustrated, acknowledge it in one sentence before giving the answer. Don't stack apologies."
Layer 2: Voice samples (implicit)
Upload 50–100 past emails you actually sent. The AI extracts your rhythm, sentence length, favorite connective phrases, and sign-offs. This does more than any rule file for matching your specific voice.
Layer 3: Edit feedback (ongoing)
During the calibration phase (next section), every time you edit an AI draft before sending, the system should capture that edit as a signal. Over 50–100 edits, tone drifts toward the real you.
A quick check: after two weeks of training, pick five AI drafts and five human drafts from your team, shuffle them, and show them to someone outside your company. If they can't reliably tell which is which, your tone training is working. That's the bar.
9. Testing and Tuning: The 2-Week Calibration Process
This is the phase nobody wants to do and everyone needs to do. A two-week calibration turns a raw knowledge base into a production system. Skip it and you'll be embarrassed by an AI reply within a month.
Week 1: Draft-only mode
The AI drafts a reply for every incoming email, but nothing auto-sends. You (or your team) review every draft before it goes out. During the review, keep a running list:
- Knowledge gaps — questions the AI couldn't answer or got wrong. Add the missing info to your KB.
- Tone misses — phrases that aren't how you'd write. Edit them and note the pattern.
- Policy misfires — cases where the AI applied a rule too broadly or too narrowly. Refine the canonical answer.
Expect to make 20–40 knowledge and tone adjustments in week one. That's normal. That's the work.
Week 2: Supervised auto-send on green-light categories
Turn on auto-send for the three or four email types the AI got right 95%+ of the time in week one. These are typically: order status, shipping questions, hours/location, simple FAQ. Everything else still drafts for human review.
Continue logging misses. By end of week two, most SMEs are auto-sending 50–65% of volume confidently.
The calibration test
Before you call calibration done, run this test: pick 20 past emails from your archive, run them through the AI in draft mode, and grade each reply as Correct / Acceptable / Wrong. If 17+ are Correct or Acceptable, you're ready for steady-state operation. If fewer, extend calibration another week and keep logging gaps.
The teams that get 75%+ auto-resolution rates aren't using better AI. They're using the same AI with a better-calibrated knowledge base. The difference is 10–15 hours of focused work in weeks one and two — not a smarter model. See our full email automation guide for a deeper walkthrough of auto-resolution metrics.
10. Keeping the KB Fresh: Monthly Review Cadence
A knowledge base isn't a one-time project. Your business changes. Policies update. New products launch. Prices shift. If your KB doesn't update, your AI starts confidently giving out last quarter's answers.
The sustainable cadence is 30 minutes a month. Here's the checklist:
- Pull the escalation report. What questions did the AI escalate most often this month? Those are your newest knowledge gaps. Write or update the relevant documents.
- Pull the edit report. What AI drafts did humans edit most heavily before sending? Often a tone or policy drift signal.
- Review policy changes. Did refund, shipping, or pricing policies change this month? Update the source documents. Re-index.
- Audit new product launches. Any new SKUs, plans, or features? Add specs, FAQs, and compatibility notes.
- Retire stale content. Old promo terms, expired policies, discontinued products. Archive rather than leave them in the index misleading the AI.
- Spot-check 10 recent AI replies. Random sample. Grade them. Catch drift before customers do.
Teams that do this monthly stay at 70%+ auto-resolution indefinitely. Teams that skip it for a quarter see quality slip and often blame the AI when the real issue is that reality moved and the KB didn't.
11. What to Do Next
You've now got the whole playbook. The gap between reading this and having a trained AI in production is smaller than you think — probably a focused weekend plus two weeks of light daily review. Here's the concrete sequence:
- Today (30 min): Open a spreadsheet. Audit every support document you already have. List the gaps.
- This week (2–4 hours): Write canonical answers for your top 10 questions using the template in section 5. Export 50–100 past customer email threads for tone training. Draft a one-page tone guide.
- This weekend (1 hour): Open a free Leadilla account. Connect your support mailbox. Upload everything from the audit — FAQs, policies, product docs, past emails, tone guide, escalation rules, brand voice samples. Unlimited knowledge base on every plan, so upload without rationing.
- Week 1: Run in draft-only mode. Review every AI reply. Log knowledge gaps, tone misses, policy misfires. Update the KB daily.
- Week 2: Flip auto-send on for green-light categories. Keep monitoring. Expect 50–65% auto-resolution by Friday.
- Monthly thereafter: 30-minute review. Refresh docs. Stay sharp.
That's it. That's how to train an AI on your business — not vaguely, not eventually, but concretely, in the next 17 days. The teams that win with AI aren't the ones with the most budget or the fanciest tools. They're the ones who actually sit down and train it.
For more on the adjacent disciplines, see our guides on automating customer support email end-to-end and comparing generic chat tools to dedicated AI email agents. And when you're ready to see the tool side of this, the features page walks through exactly how Leadilla's knowledge base, retrieval, and escalation engine fit together — and the pricing page confirms what we said up top: unlimited knowledge base on every tier, no gating, no per-document fees.
Train your AI this weekend.
Unlimited knowledge base on every plan. Free to start. No credit card. Connect your mailbox, upload your docs, and see the AI replying like a trained hire in under 30 minutes.
Open Free Account