I Compared gstack, Superpowers, and Compound Engineering. They Solve Three Completely Different Problems

Vox provides a detailed technical analysis of three popular Claude Code skill packs—gstack, Superpowers, and Compound Engineering—that have emerged as dominant tools for AI-assisted software development. Rather than treating them as competing products, Vox argues they are three distinct layers addressing different responsibilities in the software development lifecycle, using Anthropic's November 2025 engineering blog post on effective harnesses for long-running agents as a conceptual framework.

Using a restaurant metaphor, Vox breaks down software development into four core responsibilities: planning (the head chef deciding the menu), execution (the kitchen team cooking), evaluation (an independent food taster checking quality), and cross-session state (closing notes passed to the morning shift). This framework reveals why blindly installing all three tools causes friction—they overlap in some areas but solve fundamentally different problems. Gstack excels at the planning and evaluation phases through skills like /plan-ceo-review and /plan-eng-review that pressure-test ideas from product and architecture angles before work begins. The tool also includes QA capabilities (/qa) that provide real browser testing rather than code-level checks. Superpowers brings structured workflow discipline by elevating developers from "chatting randomly with AI" to using AI within a clear brainstorm-plan-execute-review methodology, including subagent-driven development with separate spec and code-quality reviewers. However, Superpowers doesn't treat knowledge accumulation as a first-class feature, meaning every session's context stays isolated.

Compound Engineering (CE) from Every Inc fills this gap by focusing on research-driven planning and deep multi-reviewer evaluation, but most critically, it implements a knowledge compounding system through the /ce:compound command. This spawns five parallel subagents that extract problem type, document solutions, check for duplicate previous fixes, develop prevention strategies, and categorize learnings—automatically building a searchable recipe binder of organizational knowledge. This is fundamentally different from Anthropic's linear progress file model; CE creates exponential learning where future agents automatically discover and reuse solutions from past work. Vox recommends a combined workflow for experienced users: use gstack for high-level product and architecture gates, CE for research-driven planning and ensemble review, and the gstack /qa skill for real browser testing, with CE's /ce:compound capturing lessons learned. The core insight throughout is that the maker and checker must be separate—self-evaluation produces systematic overoptimism, and structured knowledge accumulation compounds productivity gains over time.

Key Takeaways

About

Sentiment / Tone

Educational and analytically balanced. Vox adopts a framework-driven, evidence-based tone that respects each tool's distinct value proposition without dismissing any as inferior. The author positions themselves as a sophisticated practitioner explaining how specialized cognitive modes compound in effect—neither promotional nor cynical, but pragmatically explaining where misconceptions arise (e.g., thinking three tools are "competitors" rather than layers). The restaurant metaphor conveys expertise and clarity of thought. There's implicit respect for the engineering rigor underpinning each tool, especially Anthropic's foundational research, and a constructive, problem-solving tone throughout—the goal is to help readers avoid process conflicts and tool overlap by understanding what each tool actually solves.

Related Links

Research Notes

Vox (@voxyz_ai) is a builder focused on visual AI, coding tools, and workflows, according to their X bio. They also maintain VoxYZ, a platform demonstrating how AI agents can run companies. This positions them as a hands-on practitioner evaluating these tools in production contexts. The post's credibility derives from grounding the analysis in Anthropic's peer-reviewed engineering research (Nov 26, 2025 blog post on effective harnesses), which provides a neutral framework. Garry Tan's documented 10K LOC/week metrics are verifiable through public retrospectives and the gstack GitHub repository (16K+ stars, 1.8K+ forks). Superpowers' adoption (121K+ installs via Claude plugin marketplace) confirms market presence. Compound Engineering's approach to knowledge compounding via /ce:compound appears novel—the distinction between linear progress files (Anthropic's model) and searchable, categorized solution databases (CE's model) represents a meaningful architectural difference. The restaurant metaphor is pedagogically effective for explaining abstract software engineering concepts. One limitation: the post doesn't deeply explore where tools genuinely conflict in practice (overlapping /review capabilities between gstack and CE), though it briefly acknowledges overlap. The post implicitly assumes readers already use Claude Code actively. No factual errors identified; all claims align with public repositories and plugin marketplaces as of March 2026. The framework Vox applies (derived from Anthropic's research) offers a reusable lens for evaluating future AI development tool ecosystems.

I Compared gstack, Superpowers, and Compound Engineering. They Solve Three Completely Different Problems

Summary

Key Takeaways

About

Sentiment / Tone

Related Links

Research Notes

Topics