MemPalace: The Highest-Scoring AI Memory System Ever Benchmarked

Ben Sigman announced MemPalace, an open-source AI memory system co-created with developer "Milla Jovovich" (GitHub username; known as "Aya The Keeper"), that achieves unprecedented benchmark scores while remaining completely free and locally-hosted. The system addresses a critical problem in AI workflows: conversations with Claude, ChatGPT, and other LLMs contain crucial decisions, architectural debates, and debugging insights, but all of this context disappears when the session ends. A solo developer interacting with AI for six months generates roughly 19.5 million tokens of accumulated knowledge—impossible to reload into any context window, yet too valuable to lose.

MemPalace's innovation lies not in retrieval mechanisms but in information architecture. Instead of letting AI decide what's "worth remembering" (which loses context), it stores everything locally and makes it searchable through a hierarchical palace structure inspired by ancient Greek memory techniques. The system organizes conversations into "wings" (people and projects), "halls" (memory types like decisions, preferences, and advice), "rooms" (specific topics like "auth-migration"), and "closets" containing compressed summaries that point to original verbatim content. This structural approach alone delivers a 34% retrieval improvement over flat-search alternatives.

The benchmarks are striking: 100% perfect recall on LongMemEval (500/500 questions, every category at 100%), 92.9% on ConvoMem (more than 2x Mem0's reported ~45%), and 100% on LoCoMo including temporal reasoning tasks. The raw score without any API calls is 96.6%—the highest published LongMemEval result requiring no cloud infrastructure. Crucially, this operates entirely offline with one dependency (ChromaDB), no subscriptions, and no API keys.

The system also introduces AAAK, a lossless compression dialect that shrinks context by 30x while remaining readable to any LLM without special decoding. A developer's identity, team structure, and critical facts fit into ~120 tokens in AAAK format—a yearly cost of roughly $0.70 in API tokens versus $500+ for competitor approaches relying on LLM summarization. The entire solution is released as MIT-licensed open source, directly challenging the subscription-based memory infrastructure (Mem0, Zep, Supermemory) that currently dominates the AI memory landscape.

Key Takeaways

About

Sentiment / Tone

Confident and triumphant, with technical precision grounded in reproducible benchmarks. Sigman positions MemPalace as a paradigm shift—not an incremental improvement but a fundamental rethink of memory architecture inspired by ancient cognition techniques. The tone is matter-of-fact about the superiority of the results ("perfect score," "beating every product") while emphasizing accessibility and openness. There's implicit criticism of the subscription-based memory industry (Mem0, Zep, Supermemory charging $19–250/month) framed through celebration of free/open alternatives. The rhetoric emphasizes "nothing like it exists" and "first perfect score ever recorded," reflecting confidence in technical novelty.

Related Links

Research Notes

**Author Background:** Ben Sigman is a technology veteran with 20+ years in systems engineering. He co-founded Bitcoin Libre (cryptocurrency/open finance), founded WeSolveProblems (IT consulting/security, 2004–2021), and has been exploring AI-assisted development via "automate yourself" approach using Claude and Cursor. He co-authored a book predicting middle-class job automation by 2030s and is bullish on Bitcoin. He has significant credibility in both traditional tech infrastructure and newer AI/automation spaces. Milla Jovovich's GitHub profile identifies them as architect of MemPalace, also known as "Aya The Keeper." The collaboration appears sustained (months of development) suggesting complementary expertise. **Benchmark Context:** LongMemEval (published ICLR 2025) is a rigorous academic benchmark testing information extraction, multi-session reasoning, and temporal inference across conversations with up to 1.5 million tokens. Achieving 100% (500/500) is genuinely unprecedented—Supermemory reported ~99%, Mastra ~94.87%, and the raw 96.6% is already the highest published score requiring zero API calls. ConvoMem tests sophisticated multi-message evidence retrieval where Mem0 achieves only 30–45% according to its authors; MemPalace's 92.9% suggests a completely different architectural approach. LoCoMo tests multi-hop reasoning and temporal inference—areas where most memory systems struggle. **Competitive Positioning:** Mem0 (raised funding, 48K+ GitHub stars) and Supermemory position themselves as production-ready, but both require cloud APIs and charge $19–250/month. MemPalace's claim of "free, local, open source" with superior benchmarks directly undercuts their value proposition. However, Supermemory's ~99% score suggests their approach is still competitive; the 100% hybrid score requires running Haiku for reranking (~500 API calls per benchmark run), which is not shown as a production cost model. **Technical Innovation:** The palace architecture is genuinely novel. Traditional RAG and memory systems treat memories as a flat vector database; MemPalace adds structural semantics (wing → hall → room) that appears to function as learned hierarchical retrieval heuristics. The 34% improvement from structure alone (without reranking) suggests the architecture provides powerful inductive bias. AAAK compression is a clever engineering choice—lossless text shorthand that doesn't require fine-tuning or specialized decoders, maintaining portability across any text-reading model. **Limitations and Caveats:** (1) The 100% score requires Haiku reranking, adding latency and costs that aren't quantified for production use; (2) Benchmarks test retrieval, not long-term reasoning or plan formation—known limitations of memory-augmented systems; (3) GitHub shows recent commits but no evidence of real-world production use at scale; (4) The "Milla Jovovich" identity is a GitHub pseudonym, raising minor questions about team transparency (though technical quality is strong); (5) Local-only operation means no cloud backup or multi-device sync, a trade-off some users may reject; (6) Contradiction detection relies on a knowledge graph that must be maintained—unclear how well it scales with months of noisy, conflicting real-world conversations. **Why This Matters:** AI memory is emerging as a critical bottleneck in 2025–2026. Every major AI company (OpenAI, Anthropic, Google) is investing in memory; the market for memory layers is growing rapidly. MemPalace's open-source, local, and cost-free approach could democratize AI memory in a way subscription models cannot. If benchmarks hold up under independent audit and in production, this represents a significant shift in AI infrastructure landscape. The announcement has generated genuine interest in developer and AI communities, suggesting real impact potential.

MemPalace: The Highest-Scoring AI Memory System Ever Benchmarked — And It's Free

Summary

Key Takeaways

About

Sentiment / Tone

Related Links

Research Notes

Topics