“I have three layers of memory. One is the local wall, then it pushes up to Notion, and the other one goes to Pinecone — so I can do a RAG-type retrieval there. Basically three skills, and everything’s taken care of.” — Kasimir, 2026-05-21
Session context: 2026-05-21_Mastermind — Kasimir described his three-layer memory setup and Lou expanded it into a general pattern for AI-first knowledge management. The session also surfaced the failure mode the stack is designed to prevent: knowledge bases that grow until they become too slow to index in-line and too dense to navigate manually.
Core Idea
A single store cannot serve all of the access patterns an AI-augmented workflow needs. Recent context wants to be grep-fast and zero-latency. Medium-term knowledge wants to be structured and traversable as a graph. Long-term archive wants to be semantically searchable without forcing you to remember exact keywords. The Hot-Cache-Wiki-Semantic stack assigns each access pattern its own layer:
Layer 1 — Hot Cache (top-of-mind, recent files). A small, flat folder of plain text files representing what is currently in motion: today’s projects, this week’s drafts, the topics Claude knows you are thinking about. The retrieval mechanism is grep and glob — agentic, fast, no indexing overhead. The size cap is the point: when the cache exceeds what a single agent pass can scan, files graduate out.
Layer 2 — Wiki (medium-term, structured). A cross-linked knowledge base (Obsidian, the Karpathy-style LKB this vault uses, or similar). Files have frontmatter, bidirectional links, and category tags. Retrieval is keyword + link-traversal. This is the layer where the graph matters — the value is the connections, not the individual files. Wikis start to slow down somewhere between 500 and 2,000 entries (Lou is hitting friction around 800–900 in this vault); past that point, in-line link-graph updates become noticeable, and you either restructure or push the indexing layer off the documents themselves.
Layer 3 — Semantic Archive (long-term, embeddings). A vector store (Pinecone, pgvector, local FAISS) holding everything you have ever produced. Retrieval is “find me what is like this” — semantic, not lexical. The archive is where graduated wiki entries go to live forever, and where conversations from a year ago can still be surfaced by an idea you describe today.
The non-obvious move is that the layers are not just storage tiers — they are different retrieval contracts with the AI agent that uses them. The hot cache says “grep me.” The wiki says “traverse me.” The semantic archive says “embed your query and look me up.” A well-built stack tells the agent which contract applies when, so it does not waste tokens trying to fuzzy-search the hot cache or grep an embedding index.
Why This Matters for Knowledge Entrepreneurs
Most people start with one store — usually a wiki or note app — and stuff everything into it. That works until it does not. Around 1,000 entries, the wiki’s in-line indexing slows to the point where every save is a several-second affair and search starts missing the entries that would have been most useful. The instinct is to “find a better tool.” The diagnosis is wrong: the issue is not the tool, it is that you are asking one tool to serve three different access patterns.
Splitting the layers is a one-time architectural decision that turns the knowledge base from a liability back into an asset. The hot cache becomes the place Claude actually reaches first — it is small enough to scan in full, fresh enough to be relevant, and structured loosely enough that you can drop files in without ceremony. The wiki stops being the only place; it becomes the promotion target for hot-cache items that have earned permanence. The semantic archive stops being a research project; it becomes the long-tail safety net for everything else.
The compounding effect is in the promotion paths. A note in the hot cache survives a week because it stayed relevant. It graduates to the wiki because it earned a link from somewhere else. It eventually gets demoted into the semantic archive when it stops being actively referenced — but it is still there, retrievable by similarity. Each layer protects the layer above it from drowning, which is what makes the stack scale past the point where any single store would have broken.
Practical Application
Set up the stack incrementally — do not migrate everything at once:
- Create the hot cache. Add a
/hot/folder (or equivalent) at the top of your knowledge directory. Configure your AI agent’sCLAUDE.md/ system prompt: “Anything I am actively working on goes here. Check here first.” - Define the graduation rule for hot → wiki. Lou’s heuristic: anything you have referenced more than twice from outside the hot cache, or that has lived in the hot cache for more than ~2 weeks. Make the rule explicit so the agent can apply it without asking.
- Define the demotion rule for wiki → archive. Anything that has not been linked or queried in 90 days. The archive does not delete; it just stops counting against the wiki’s link-graph performance.
- Pick a semantic layer that integrates with the wiki. Pinecone, pgvector, or the new Pinecone product Lou flagged on the call (which combines wiki-style retrieval over a semantic backend) are all viable. The constraint is that your agent must be able to call it without leaving the current conversation.
- Audit quarterly. Walk the layers in order. Hot cache: prune anything stale. Wiki: confirm the link graph is still navigable. Archive: spot-check that semantic search still returns the right shape of answer.
Coaching question: “Which layer is my current knowledge base actually serving — and which two layers am I trying to fake with the same tool?”
Related Insights
- Insight - Persistent AI Memory via MCP - Building a Cross-Session Intelligence Layer — A specific implementation of the cross-session layer; this insight describes the architecture that such an implementation fits inside.
- Insight - Distributed Agent Memory — Scope Memory to the Function, Not the Platform — Complementary principle: even within a layer, each agent should scope its memory tightly.
- Insight - Externalized Memory Escapes the Reconstruction Bias of Human Recall — Why externalising matters at all; the stack is the how.
- Insight - RAG Is Raw Material, Not Answers — Design for the Right Retrieval Architecture — The semantic-archive layer is RAG; this insight names the role RAG plays inside a broader stack.
- Insight - Ambient Intelligence — Build a Skill in Every Folder to Make Your Entire Knowledge Base Alive — The hot cache and the wiki both become “alive” when each folder has its own skill — the stack is what those skills retrieve against.
Evolution Across Sessions
Builds on Insight - Persistent AI Memory via MCP - Building a Cross-Session Intelligence Layer (which established that cross-session memory needs its own service) and Insight - Externalized Memory Escapes the Reconstruction Bias of Human Recall (which established why externalisation matters in the first place). The new development is the three-tier separation by retrieval contract, not just by storage duration. Prior insights treated memory as “in the conversation vs out of the conversation.” This one names the structural insight: out-of-conversation memory has at least three distinct sub-layers, each with a different retrieval mechanism, and trying to merge them into one store is the failure mode that breaks growing knowledge bases.
Source
- 2026-05-21_Mastermind (Kasimir — naming his three-layer setup; Lou — extending it into a general pattern and naming the failure mode when the wiki layer grows too large)