Most people use LLMs like a search engine with better phrasing:
give it documents
retrieve relevant chunks
get an answer
reset
Nothing accumulates. Every query starts from zero.
A more interesting pattern: treat the LLM as something that writes knowledge, not just retrieves it.
Rough sketch:
keep a folder of raw sources (articles, papers, notes)
have the LLM read them and produce markdown files
organize those into a small wiki
as new sources come in, the wiki updates
Now you're not querying raw data. You're querying something already partially digested.
The difference is subtle but important.
Most systems are read-only — the model looks at your data and answers. This pattern is write-forward — the model compiles your data into structured knowledge that persists.
A few things fall out of this:
Queries become artifacts. If you ask something non-trivial ("compare these five ideas"), the answer is often worth keeping. So instead of returning text into a chat box, the model writes a new page. Over time, your questions literally expand the knowledge base.
The system improves without touching weights. There's an assumption that "making the model better" means fine-tuning. But a lot of leverage is elsewhere: better organization, better summaries, better cross-references. This is cheaper and often more effective than retraining.
You eventually need a harness. Once the wiki grows, the bottleneck isn't the model's capability. It's coordination: how does it find the right pages? How does it stay consistent? How does it recover from a bad update? At that point you're not writing prompts anymore. You're building a system around the model.
Three things happened recently that confirm this is the right frame.
1. Karpathy's tweet
On April 3, 2026, Karpathy posted something titled "LLM Knowledge Bases" describing how he now uses LLMs to build personal knowledge wikis instead of generating code. The tweet went massively viral — 1.2 million views in days.
His current knowledge base on a recent research topic: roughly 100 articles, 400,000 words. Longer than most PhD dissertations. Built without typing a single word.
What made people stop wasn't the idea of a wiki. It was the framing.
RAG produces answers, but it doesn't build lasting knowledge. Karpathy's approach processes documents at the time of ingestion, not at query time. The result is a permanent, structured product — you store and retrieve with a high degree of control.
But the more interesting move was the follow-up. Instead of sharing code or an app, he shared an idea file — a GitHub gist. The reasoning: in the era of LLM agents, there's less point sharing specific implementations. You share the idea. Each person's agent builds a version customized for their specific needs.
That's a meaningful statement about where AI development is going. The product is increasingly the concept, not the code.
2. The Claude Code leak
In late March 2026, Anthropic accidentally shipped the full source code for Claude Code — roughly 512,000 lines of TypeScript — inside a routine npm update. Within hours it was mirrored across GitHub and studied by thousands of developers.
Most coverage focused on the security incident. That's the wrong frame.
What the leak actually revealed: 40 permission-gated tools covering file operations, shell execution, web fetching, and code navigation. A 46,000-line query engine handling API calls, token caching, context management, and retry logic. A three-layer memory architecture designed explicitly to fight "context entropy" — the phenomenon where agents gradually lose the thread as context windows fill up.
Claude itself is available to anyone with an API key. The model wasn't the revelation. The scaffolding around it was what people were actually studying.
The AI engineering discipline has shifted:
2023–2024 → prompt engineering (how to ask the model)
2025 → context engineering (what information to feed the model)
2026 → harness engineering (how the entire system runs around the model)
2023–2024 → prompt engineering (how to ask the model)
2025 → context engineering (what information to feed the model)
2026 → harness engineering (how the entire system runs around the model)
2023–2024 → prompt engineering (how to ask the model)
2025 → context engineering (what information to feed the model)
2026 → harness engineering (how the entire system runs around the model)
The leak confirmed what many suspected but couldn't prove: the harness layer is where the real product lives. Not the model. The harness.
3. What LangChain wrote about continual learning
LangChain recently published a piece that named this cleanly.
Most discussions of continual learning focus on one thing: updating model weights. But for AI agents, learning can happen at three distinct layers:
model → weights (expensive, slow to change)
harness → loops, tools (medium cost, flexible)
context → documents (cheap, immediate)
model → weights (expensive, slow to change)
harness → loops, tools (medium cost, flexible)
context → documents (cheap, immediate)
model → weights (expensive, slow to change)
harness → loops, tools (medium cost, flexible)
context → documents (cheap, immediate)
The core idea at the harness layer: the agent runs over a bunch of tasks, you evaluate them, store the logs into a filesystem, then run a coding agent to look at those traces and suggest changes to the harness code. The system improves its own scaffolding — not its weights.
At the context layer, learning can happen after the fact in an offline job, or in the hot path as the agent is running. The wiki pattern is exactly this: context learning, running continuously, folding outputs back in.
A useful way to think about where leverage actually is:
Most people optimize the model. The model is the thing you have the least control over and the least ability to improve cheaply.
The harness and the context layer are where you actually build. They're cheaper to iterate, faster to fix, and compound in ways that weight updates don't.
Combine all three — the wiki pattern, the harness insight, the continual learning frame — and you get something self-accumulating:
raw data → stored once
LLM → compiles it into structured knowledge
queries → produce new structured outputs
outputs → fold back in
harness → improves from its own traces
The system doesn't reset. It compounds. And the compounding happens mostly outside the model.
This is closer to a compiler than a search engine. Raw sources go in. Structured, queryable, self-improving knowledge comes out.
Not perfect at scale yet. But the direction is clear.
LLMs are less interesting as question-answerers and more interesting as knowledge compilers — systems that accumulate, not just retrieve. The model is the least interesting part of that sentence.
We build AI systems that last for use cases , not one fits for all. Book a call to implement your AI Agent