URL copied — paste it as a website source in a new notebook
Summary
Andrej Karpathy shares a practical workflow for using LLMs to build and maintain personal knowledge bases—a paradigm shift from treating language models as one-off question answerers to treating them as knowledge infrastructure. Rather than chatting with an LLM and discarding the conversation, Karpathy describes a system where raw source documents (articles, papers, datasets, images) are collected into a directory and progressively compiled by an LLM into a structured markdown wiki. The LLM maintains this wiki through automated summarization, backlink creation, concept categorization, and article generation across topics.
The workflow uses Obsidian as the frontend viewing/editing interface, with Karpathy rarely touching the wiki manually—the LLM does all the writing and maintenance. Once the wiki reaches sufficient scale (his example: ~100 articles and 400,000 words), complex Q&A queries can be run against it using LLM agents. Critically, outputs from queries are "filed back" into the wiki, creating a compounding effect where the knowledge base grows automatically through use. Karpathy also describes running "health checks"—LLM passes that identify inconsistencies, impute missing data via web search, and suggest new research directions.
The approach deliberately avoids traditional RAG (Retrieval-Augmented Generation) systems with embeddings and vector databases, betting instead that modern LLMs with large context windows can maintain their own indices and summaries without needing auxiliary infrastructure. Karpathy acknowledges the approach works well at modest scales but suggests the methodology points toward a major product opportunity—today it requires a "hacky collection of scripts" that only someone like Karpathy could assemble. He hints at the next frontier: synthetic data generation and fine-tuning so the LLM eventually internalizes the knowledge in its weights rather than constantly retrieving it from context. The post resonates with broader themes Karpathy has been exploring about the shift from "vibe coding" (2025) to "agentic engineering" (2026), where humans orchestrate AI agents rather than writing code directly.
Key Takeaways
LLMs can act as 'knowledge compilers' that automatically transform raw source documents into structured, interconnected markdown wikis without requiring manual curation—the opposite of traditional personal knowledge management systems.
The wiki structure itself (backlinks, concept pages, categorized articles) becomes the primary knowledge artifact, not the source documents, enabling regeneration and iterative improvement when queries change focus.
A 'filing loop' where query outputs are automatically saved back to the wiki creates compounding value: every question asked strengthens the knowledge base for future questions, eliminating the cold-start problem of traditional chat interfaces.
This workflow deliberately bypasses fancy RAG pipelines with vector databases and embeddings, instead relying on modern LLMs' ability to maintain indices and read relevant materials within large context windows—this simplification works well at modest scales (~400K words).
Automated 'health checks' by LLM agents continuously improve data integrity by finding inconsistencies, imputing missing information via web search, and surfacing candidates for new articles the researcher should investigate.
The path forward includes synthetic data generation and fine-tuning so the LLM internalizes a researcher's personal domain knowledge in its weights, creating a private intelligence model rather than requiring context-window retrieval on every query.
There is a significant product gap: Karpathy's current setup is a collection of custom CLI tools, scripts, Obsidian plugins, and prompt engineering that only technical users can replicate—a polished product could democratize this workflow.
This represents a fundamental shift from LLMs-as-answer-machines to LLMs-as-knowledge-infrastructure, requiring a different mental model of what language models are for rather than simply a productivity hack.
The workflow requires tools like Obsidian Web Clipper (to convert web articles to markdown) and local image storage so the LLM can process both text and images holistically, rather than operating on incomplete information.
At larger scales (millions of documents), this approach would struggle with context window limits, but Karpathy argues most focused research domains don't need millions of documents—just the right ~100 sources organized intelligently.
About
Author: Andrej Karpathy
Publication: X (Twitter)
Published: 2026-04-02
Sentiment / Tone
Pragmatic and exploratory with optimistic undertones. Karpathy adopts a matter-of-fact tone when describing what he's built ("Something I'm finding very useful recently"), but the post carries genuine enthusiasm about the potential. He's candid about limitations (the approach works at modest scales, it's currently a hacky collection of scripts), but frames these as problems waiting to be solved rather than fundamental flaws. The closing line—"I think there is room here for an incredible new product instead of a hacky collection of scripts"—signals both his satisfaction with what he's discovered and a pragmatic acknowledgment that the mainstream won't adopt this until someone polishes it. He avoids hype; instead, he documents his working system with enough detail for others to replicate it, positioning the post as a "here's what I'm doing" share rather than a grand manifesto. This tone is characteristic of Karpathy's broader commentary in 2026 about AI tooling and workflows—observational, evidence-based, and focused on the gap between capability and product-market fit.
Related Links
Glen Rhodes: Karpathy's LLM Knowledge Base Workflow Detailed analysis expanding on Karpathy's post, explicitly framing this as a personal computing paradigm shift and critiquing the 'cold start' problem of traditional chat interfaces. Rhodes contextualizes Karpathy's filing loop as the novel contribution that allows knowledge bases to compound through use.
NextBigFuture: Karpathy on Code Agents and AutoResearch Positions the LLM knowledge base work within Karpathy's broader 2026 vision of 'agentic engineering' and autonomous research loops. Shows how knowledge base compilation fits into his larger thinking about human-agent collaboration and self-improving AI systems.
VentureBeat: Karpathy's Knowledge Base Architecture Enterprise-focused coverage emphasizing the distinction from RAG and the eventual path to fine-tuning models on personal knowledge, creating custom private intelligence. Highlights the product opportunity gap Karpathy identified.
Ole Lehmann on X: Karpathy 'casually described the future of AI' Influential tech voice breaking down Karpathy's post for a broader audience and amplifying its reach. Lehmann's framing helped establish this as a significant insight rather than a niche workflow, driving further discussion.
AutoResearch: Karpathy's Autonomous Research Agent Related project showing agents autonomously running research loops (design experiments, edit code, collect data, optimize). Knowledge bases and research agents are complementary: the knowledge base is where research results accumulate; AutoResearch is how new experiments are generated.
Research Notes
Andrej Karpathy is a uniquely credible voice on this topic. He's a founding member of OpenAI (2015-2017), served as Director of AI at Tesla where he led the computer vision team for Autopilot, and is currently founder of Eureka Labs, focused on modernizing education in the age of AI. He was the architect of CS231n, Stanford's foundational deep learning course. His background spans both cutting-edge AI research and hands-on systems engineering, giving him authority on both the "what's theoretically possible" and "what's practically useful" dimensions.
The post resonates across multiple communities. Tech influencer Ole Lehmann described it as Karpathy "casually described the future of ai," while communities on Reddit, HackerNews, and specialized blogs like Dair.ai quickly elaborated the approach with diagrams and tutorials. Glen Rhodes wrote a detailed analysis arguing this represents a bigger shift than most people realize—not just a productivity hack, but a fundamental change in how knowledge workers interact with AI. VentureBeat and other publications covered it within 24-48 hours, signaling cross-market relevance.
Reactions suggest both enthusiasm and identified limitations. Žiga Drev's response notes a critical gap: Karpathy's wiki is local, unverifiable, and siloed to a single agent—he proposes adding distributed verification and multi-agent knowledge sharing via blockchain-like structures. This points to the next evolution: taking Karpathy's personal workflow and scaling it to collaborative, verifiable knowledge systems.
The timing is significant. This post arrives in early April 2026, roughly 4-5 months after Karpathy popularized the term "vibe coding" (Feb 2026) to describe programming in the age of capable AI assistants. He's now articulating a parallel shift in knowledge work—from treating LLMs as utilities you query to treating them as infrastructure you collaborate with. Combined with his work on AutoResearch (agents that autonomously close research loops) and MicroGPT (a 243-line GPT implementation), the post fits into a broader narrative Karpathy is building: the role of human technical work is shifting from writing code/organizing knowledge to orchestrating, supervising, and directing autonomous agents.
One caveat: the approach works well within Karpathy's constraints (he's using modern LLMs with 200K+ token context windows, he has technical skill to wire together CLI tools, he's willing to maintain a local markdown setup). For most users, this requires either a polished commercial product or significant technical lift to replicate.
The knowledge base isn't the only emerging pattern—similar ideas appear in concurrent work on ai-powered research agents (AutoResearch), knowledge graph systems, and context-management frameworks, suggesting this is a broader inflection point rather than an isolated insight.
Topics
LLM knowledge management systemsPersonal knowledge bases and PKMObsidian markdown workflowsRAG alternatives and knowledge compilationAgentic AI engineeringAI-powered research automation