URL copied — paste it as a website source in a new notebook
Summary
Vox provides a detailed technical analysis of three popular Claude Code skill packs—gstack, Superpowers, and Compound Engineering—that have emerged as dominant tools for AI-assisted software development. Rather than treating them as competing products, Vox argues they are three distinct layers addressing different responsibilities in the software development lifecycle, using Anthropic's November 2025 engineering blog post on effective harnesses for long-running agents as a conceptual framework.
Using a restaurant metaphor, Vox breaks down software development into four core responsibilities: planning (the head chef deciding the menu), execution (the kitchen team cooking), evaluation (an independent food taster checking quality), and cross-session state (closing notes passed to the morning shift). This framework reveals why blindly installing all three tools causes friction—they overlap in some areas but solve fundamentally different problems. Gstack excels at the planning and evaluation phases through skills like /plan-ceo-review and /plan-eng-review that pressure-test ideas from product and architecture angles before work begins. The tool also includes QA capabilities (/qa) that provide real browser testing rather than code-level checks. Superpowers brings structured workflow discipline by elevating developers from "chatting randomly with AI" to using AI within a clear brainstorm-plan-execute-review methodology, including subagent-driven development with separate spec and code-quality reviewers. However, Superpowers doesn't treat knowledge accumulation as a first-class feature, meaning every session's context stays isolated.
Compound Engineering (CE) from Every Inc fills this gap by focusing on research-driven planning and deep multi-reviewer evaluation, but most critically, it implements a knowledge compounding system through the /ce:compound command. This spawns five parallel subagents that extract problem type, document solutions, check for duplicate previous fixes, develop prevention strategies, and categorize learnings—automatically building a searchable recipe binder of organizational knowledge. This is fundamentally different from Anthropic's linear progress file model; CE creates exponential learning where future agents automatically discover and reuse solutions from past work. Vox recommends a combined workflow for experienced users: use gstack for high-level product and architecture gates, CE for research-driven planning and ensemble review, and the gstack /qa skill for real browser testing, with CE's /ce:compound capturing lessons learned. The core insight throughout is that the maker and checker must be separate—self-evaluation produces systematic overoptimism, and structured knowledge accumulation compounds productivity gains over time.
Key Takeaways
The three tools (gstack, Superpowers, Compound Engineering) are not competitors but three different layers of the software development stack with distinct centers of gravity: planning/evaluation, workflow discipline, and knowledge compounding respectively.
Gstack's core strength is in product-level and architecture-level decision gates (/plan-ceo-review, /plan-eng-review) that pressure-test ideas before development begins, plus real browser-based QA testing that catches bugs invisible to code-level checks alone.
Superpowers introduced structured methodology discipline, upgrading developers from ad-hoc AI usage to a clear brainstorm-plan-execute-review workflow with subagent-driven development, proving this approach works—the project has 121K GitHub stars and 294K installs.
Compound Engineering's distinctive feature is the /ce:compound command, which creates exponential knowledge accumulation through five parallel subagents that automatically extract, categorize, and make reusable every lesson learned—transforming linear progress notes into a searchable organizational recipe binder.
A critical software engineering principle demonstrated: builders who evaluate their own work are systematically overoptimistic; the maker and the checker must be separate roles to achieve rigorous quality, a gap many solo developers overlook.
Garry Tan (gstack creator) documented shipping 600K lines of production code in 60 days (10-20K LOC/day) while running YC full-time, averaging 10,000 LOC and 100 PRs per week over 50 days—demonstrating what structured cognitive modes enable.
Most projects fail not from poor execution but from unclear requirements; Vox recommends using a specific prompt to have AI interview developers until it has 95% confidence about what they actually want versus what they think they should want.
Anthropic's harness research (Nov 2025) emphasizes external state files (feature-list.json, claude-progress.txt) as the primary coordination mechanism for multi-session agents, even more than raw context window size, which informs tool design philosophy.
The combined workflow recommendation for experienced users: gstack for product/architecture gates, CE for research-driven planning and ensemble review, gstack /qa for browser testing, and CE /ce:compound for knowledge capture—each tool used for its specific cognitive role.
A persistent common failure mode: developers treat Claude Code as a single generic mode when the actual power comes from explicitly switching cognitive gears (founder thinking vs. engineering rigor vs. paranoid review) to match the current development phase.
About
Author: Vox (@voxyz_ai)
Publication: X (Twitter)
Published: 2026-03-29
Sentiment / Tone
Educational and analytically balanced. Vox adopts a framework-driven, evidence-based tone that respects each tool's distinct value proposition without dismissing any as inferior. The author positions themselves as a sophisticated practitioner explaining how specialized cognitive modes compound in effect—neither promotional nor cynical, but pragmatically explaining where misconceptions arise (e.g., thinking three tools are "competitors" rather than layers). The restaurant metaphor conveys expertise and clarity of thought. There's implicit respect for the engineering rigor underpinning each tool, especially Anthropic's foundational research, and a constructive, problem-solving tone throughout—the goal is to help readers avoid process conflicts and tool overlap by understanding what each tool actually solves.
GStack GitHub Repository Open-source implementation of gstack with full documentation; demonstrates the nine-skill architecture for planning, review, QA, and shipping; shows real adoption with 16K+ stars and 1.8K+ forks.
Superpowers GitHub Repository (Jesse Vincent) Reference implementation of Superpowers' brainstorm-plan-execute-review methodology; shows 121K+ installs and documents subagent-driven development patterns that influenced the broader ecosystem.
Compound Engineering Plugin Repository (Every Inc) Implementation of CE's knowledge compounding system; demonstrates the five-subagent architecture for extracting, categorizing, and making reusable solutions from each development cycle.
Compound Engineering: How Every Codes With Agents (Every Inc) Deep-dive explanation of compound engineering philosophy by its creators; explains how CE enables single developers to maintain five production software products through knowledge compounding and parallel agentic workflows.
Research Notes
Vox (@voxyz_ai) is a builder focused on visual AI, coding tools, and workflows, according to their X bio. They also maintain VoxYZ, a platform demonstrating how AI agents can run companies. This positions them as a hands-on practitioner evaluating these tools in production contexts. The post's credibility derives from grounding the analysis in Anthropic's peer-reviewed engineering research (Nov 26, 2025 blog post on effective harnesses), which provides a neutral framework. Garry Tan's documented 10K LOC/week metrics are verifiable through public retrospectives and the gstack GitHub repository (16K+ stars, 1.8K+ forks). Superpowers' adoption (121K+ installs via Claude plugin marketplace) confirms market presence. Compound Engineering's approach to knowledge compounding via /ce:compound appears novel—the distinction between linear progress files (Anthropic's model) and searchable, categorized solution databases (CE's model) represents a meaningful architectural difference. The restaurant metaphor is pedagogically effective for explaining abstract software engineering concepts. One limitation: the post doesn't deeply explore where tools genuinely conflict in practice (overlapping /review capabilities between gstack and CE), though it briefly acknowledges overlap. The post implicitly assumes readers already use Claude Code actively. No factual errors identified; all claims align with public repositories and plugin marketplaces as of March 2026. The framework Vox applies (derived from Anthropic's research) offers a reusable lens for evaluating future AI development tool ecosystems.
Topics
Claude Code skill packsAI-assisted software development workflowsAgentic software engineering patternsKnowledge management in long-running AI agentsMulti-session context managementStructured software development methodology