Knowledge Base Architect
The Core Problem This Solves
Coaches and consultants have years of accumulated expertise — frameworks, client cases, articles, coaching notes, session transcripts, research — living in scattered formats across Notion, Google Drive, email, and their own heads. They want to be able to have conversations with their own knowledge.
The reframe from the Jan 22 session: your website answers the questions people already know they have. Your knowledge base answers the questions they’ll have next. This is the shift from being a content publisher to being an answer provider.
This skill guides you through the architecture decision, content preparation, and deployment — whether you’re building for yourself, for clients, or as a product.
Step 1: Intake Interview
Before recommending an architecture, answer these 8 questions. If the practitioner is present, ask them directly. If working from a brief, extract answers where available and flag gaps.
- Content volume and types — How much content do you have? What formats? (PDFs, Word docs, Notion pages, transcripts, audio, video, web articles, emails)
- Query patterns — What specific questions should this knowledge base be able to answer? Give 3–5 examples of queries you’d want to run.
- Privacy requirements — Does the content include confidential client information, proprietary IP you don’t want on third-party servers, or personal data?
- Intended users — Who queries this? Just you? Your team? Paying clients? Public users?
- Technical capacity — On a scale of 1–10, how comfortable are you with installing software, managing a server, or running terminal commands?
- Budget — Monthly budget for hosting and services? (50, <$200, open)
- Timeline — Do you need something working in a day, a week, or a month?
- Purpose — Is this for personal productivity, replacing a FAQ for clients, or building a product you intend to sell or offer as a service?
Step 2: Architecture Decision Tree
Use this tree to reach a recommendation. The order matters — privacy and technical capacity are the deciding factors; cost and timeline are tie-breakers.
Is the content private/confidential?
├── YES → Do you have technical capacity (5+/10)?
│ ├── YES → Local Qdrant + Open Web UI (or LibreChat)
│ └── NO → Commission via Upwork/Fiverr brief (see Step 5)
│ OR Perplexity Spaces (if content can be semi-public)
└── NO → Is this client-facing or a product?
├── YES → Pinecone + ChatGPT Action (or Qdrant + LibreChat)
└── NO → What's the primary use case?
├── Research synthesis → Perplexity Spaces or NotebookLM
├── Quick demos or audio synthesis → NotebookLM
└── Full privacy + low cost → Open Web UI + local models
Step 3: Architecture Options and Trade-offs
Option A: Local Qdrant + Open Web UI
Best for: Practitioners with private/confidential content who want full control and are willing to invest setup time.
| Factor | Detail |
|---|---|
| Privacy | Maximum — data never leaves your machine or VPS |
| Technical complexity | 6/10 — requires Docker, Coolify or Hostinger VPS setup |
| Cost | ~$5–20/month VPS hosting; inference costs separate |
| Setup time | 4–8 hours for first deployment |
| Query quality | High, especially for long-form and nuanced content |
| Scalability | Good — Qdrant handles millions of vectors |
Recommended stack: Qdrant (vector DB) + Open Web UI (chat interface) + Ollama or Claude API (inference) + Coolify (self-hosting orchestration).
When to avoid: If you need clients to access this immediately and have no DevOps support.
Option B: Pinecone + ChatGPT Action
Best for: Practitioners who want to deploy a client-facing GPT product without managing infrastructure.
| Factor | Detail |
|---|---|
| Privacy | Moderate — content lives on Pinecone’s cloud servers |
| Technical complexity | 4/10 — primarily configuration, minimal coding |
| Cost | Pinecone free tier to ~$70/month depending on volume; OpenAI API costs scale with usage |
| Setup time | 2–4 hours |
| Query quality | High for chunked documents; lower for nuanced reasoning |
| Scalability | Excellent for public-facing use |
Recommended stack: Pinecone (vector DB) + OpenAI Embeddings + ChatGPT Custom GPT with Action.
When to avoid: Confidential client content; if you prefer not to depend on OpenAI.
Option C: Perplexity Spaces
Best for: Research synthesis and public or semi-public content; practitioners who want zero setup friction.
| Factor | Detail |
|---|---|
| Privacy | Low — content is visible to Perplexity |
| Technical complexity | 1/10 — browser-based, no installation |
| Cost | Free to $20/month (Pro) |
| Setup time | Under 1 hour |
| Query quality | Strong for synthesis across sources; weaker for retrieval from specific private documents |
| Scalability | Limited by Space size and Perplexity’s rate limits |
When to avoid: Any content with confidentiality requirements; if you need to answer questions from your own unpublished frameworks.
Option D: NotebookLM
Best for: Audio synthesis (podcast-style summaries), quick demos with clients, and exploratory use before committing to infrastructure.
| Factor | Detail |
|---|---|
| Privacy | Low to moderate — content goes to Google |
| Technical complexity | 1/10 |
| Cost | Free (as of early 2026) |
| Setup time | Under 30 minutes |
| Query quality | Strong for document-grounded Q&A; excellent audio output |
| Scalability | Not suitable as a primary KB; limited sources per notebook |
When to avoid: As a primary knowledge base; for confidential content; for anything requiring persistent memory across sessions.
Option E: Open Web UI + Local Models
Best for: Full privacy and lowest ongoing cost; practitioners with technical capacity and consumer-grade hardware.
| Factor | Detail |
|---|---|
| Privacy | Maximum — runs entirely on your local machine |
| Technical complexity | 7/10 — requires Ollama setup, model downloads, document ingestion pipeline |
| Cost | $0 ongoing (hardware you already own) |
| Setup time | 2–6 hours depending on hardware |
| Query quality | Depends heavily on model choice; weaker than API models for complex reasoning |
| Scalability | Limited by local hardware |
When to avoid: If query quality matters more than cost; if you need to access the KB from multiple devices or share with clients.
Step 4: Minimum Viable Content List
The practitioner almost always has more content than they need. The problem is prioritization. Identify the 10–20 assets that should be ingested first by asking:
- Which 5 documents contain your most distinctive frameworks or methodologies?
- Which content do clients most often ask you to repeat, explain, or send?
- Which content took you the longest to produce and is least likely to be found elsewhere?
- Which transcripts or notes capture your best thinking at its clearest?
- What would you most want to be able to ask a question about at 11pm on a Tuesday?
Prioritize: frameworks and methodology documents, your best articles and long-form content, heavily annotated session notes, any content that answers your most frequently asked questions.
Deprioritize for the first version: early drafts, content that duplicates other assets, client deliverables that are highly specific and won’t generalize.
Step 5: Chunking and Tagging Strategy
Good retrieval depends on how content is prepared before ingestion. Adapt this based on the chosen architecture.
Chunk size:
- Coaching transcripts and long-form articles: 500–800 tokens per chunk with 50-token overlap
- Short-form content (LinkedIn posts, brief notes): 200–300 tokens, minimal overlap
- Structured documents (frameworks, step-by-step guides): chunk by section, not by token count — keep a full step or sub-section together
Tag taxonomy (recommended minimum):
date— when was this content created or published?type— transcript, article, framework, note, client deliverabletopic— 2–4 topic tags from a controlled vocabulary you define in advanceframework— if this content introduces or applies a named framework, tag itaudience— self, client-facing, public
Metadata fields to always include:
- Source title or identifier
- Author (important if ingesting content from others)
- Date
- Type
Step 6: Deployment Brief
Use this template to produce a brief that can be handed to a Fiverr or Upwork freelancer if you don’t want to implement yourself.
Knowledge Base Deployment Brief
Objective: Build a personal AI knowledge base that allows [me / my clients] to query [X years / X documents] of [type of content].
Recommended stack: [Architecture from Step 2]
- Vector database: [e.g., Qdrant self-hosted on Coolify / Pinecone cloud]
- Inference: [e.g., Claude API / OpenAI API / local Ollama]
- Interface: [e.g., Open Web UI / ChatGPT Custom GPT / LibreChat]
- Hosting: [e.g., Hostinger VPS / local machine / cloud]
Content to ingest: [X documents, Y total approximate size, formats listed]
Chunk strategy: [tokens per chunk, overlap, any section-based rules]
Tag taxonomy: [list tags]
Estimated setup time: [from architecture table above]
Testing protocol:
- After ingestion, run these 5 test queries: [list your actual queries from Step 1]
- Verify that results reference the correct source documents
- Check that at least 3 queries return answers that include source citations
- Confirm the interface is accessible at [URL / local address]
Deliverables expected:
- Running deployment accessible at [URL or local address]
- Brief documentation of how to add new content
- Instructions for querying from [device / browser]
Step 7: Maintenance Protocol
A knowledge base degrades if it is not maintained. Establish these habits from the start:
Ingestion cadence:
- Session transcripts: add within 48 hours of session
- New articles and posts: add on publish date
- New frameworks: add when finalized, not when drafted
Quarterly review:
- Flag content older than 18 months for review — is it still accurate?
- Check for content that has been superseded by newer frameworks
- Review your most-queried topics — are there gaps that should be filled?
Version control:
- When you update a framework, mark the old version as deprecated rather than deleting it — your KB should be able to answer questions about how your thinking has evolved
- Keep a simple changelog (a plain text file works) that notes what was added and when
Architecture Summary Reference
| Architecture | Privacy | Technical | Cost/mo | Setup Time | Best For |
|---|---|---|---|---|---|
| Local Qdrant + Open Web UI | Maximum | 6/10 | $5–20 | 4–8 hrs | Private IP, product-builders |
| Pinecone + ChatGPT Action | Moderate | 4/10 | $20–70 | 2–4 hrs | Client-facing GPT products |
| Perplexity Spaces | Low | 1/10 | $0–20 | <1 hr | Research synthesis, public content |
| NotebookLM | Low–Moderate | 1/10 | $0 | <30 min | Audio demos, quick exploration |
| Open Web UI + Local Models | Maximum | 7/10 | $0 | 2–6 hrs | Full privacy, zero ongoing cost |
| Commission (Upwork/Fiverr) | Depends on stack | 0/10 | Variable | 1–2 weeks | No technical capacity |