Scaling PostgreSQL to power 800 million ChatGPT users

https://openai.com/index/scaling-postgresql/
Technical engineering case study · Researched March 25, 2026

Summary

OpenAI's engineering blog reveals how the company scaled PostgreSQL to support 800 million ChatGPT users—a scale that conventional database wisdom suggests is impossible for a single-primary architecture. The article challenges the prevailing belief that massive scale requires either distributed databases or complex sharding strategies. Instead, OpenAI maintains one Azure PostgreSQL Flexible Server as the primary write instance, supported by nearly 50 geo-distributed read replicas, handling millions of queries per second with low double-digit millisecond p99 latency and five-nines (99.999%) availability.

The team's approach centers on three core pillars: aggressively reducing write pressure on the primary, ruthless query optimization and resource management, and multi-layer operational resilience. On the write side, they migrate write-heavy workloads that can be horizontally partitioned to sharded systems (Azure Cosmos DB), optimize application-level writes through careful code review to eliminate redundant operations, and implement lazy writes to smooth traffic spikes. For queries, they break up expensive multi-table joins (one problematic query joined 12 tables), review all ORM-generated SQL, and enforce connection pooling with PgBouncer to reduce connection overhead from 50ms to 5ms. Operationally, they implement cache-locking to prevent "thundering herd" problems, multi-layer rate limiting, and workload isolation that separates high-priority user-facing traffic from low-priority analytics to prevent the noisy neighbor effect.

This success came through systematic optimization rather than architectural magic. Over the past year, the database load grew more than 10x while the team avoided sharding, which would have required months or years to modify hundreds of application endpoints. The approach only works because ChatGPT's workload is heavily read-oriented (70-95% of traffic), and because PostgreSQL's design, while challenging under heavy writes due to Multi-Version Concurrency Control (MVCC), remains highly efficient for read-heavy workloads when properly tuned. The article emphasizes that the decision not to shard was pragmatic: given the workload characteristics and current optimizations providing sufficient capacity, sharding remains a future consideration rather than an immediate necessity. This case study implicitly challenges the industry tendency to over-engineer solutions prematurely, demonstrating that mastering fundamentals—query discipline, connection pooling, read offloading, and operational safeguards—can take proven systems much further than commonly assumed.

Key Takeaways

About

Author: Bohan Zhang, with contributions from Jon Lee, Sicheng Liu, Chaomin Yu, and Chenglong Hao

Publication: OpenAI Engineering Blog

Published: February 2026

Sentiment / Tone

Pragmatic, evidence-driven, and somewhat defensive about unconventional choices. The authors acknowledge that their approach contradicts conventional wisdom (which says sharding or distributed databases are necessary at their scale) but present it as a deliberate, workload-appropriate choice rather than a limitation. The tone is confident and slightly self-aware—they explain why they made this choice and what it cost (accepting write bottlenecks), positioning it as disciplined engineering trade-offs rather than architectural success. There's an underlying "we're happy with PostgreSQL for what we do" confidence, with detailed technical accounting of the optimizations that made it work.

Related Links

Research Notes

Bohan Zhang, the lead author, brings significant credibility to this work. He co-founded OtterTune, an AI-powered database optimization startup, and conducted database research at Carnegie Mellon under Prof. Andy Pavlo. He has spoken at multiple PostgreSQL conferences (PostgresConf 2024, PGConf.Dev 2025, POSETTE 2025), indicating deep expertise in the PostgreSQL community and active engagement with the ecosystem. The article generated substantial discussion across the tech community. On Hacker News and Reddit, reactions were predominantly positive, with many noting that the "PostgreSQL can't scale" myth has been thoroughly debunked. However, critics noted important caveats: this success is highly specific to read-heavy workloads, and the engineering discipline required is not typical. Some observers pointed out that the approach essentially pushes write scaling to other systems (Azure Cosmos DB), rather than truly solving it within PostgreSQL. The piece is part of a broader trend of companies publishing detailed scaling case studies. Notion explicitly states they wish they had sharded earlier (480 logical shards), while Instagram scaled to 500M users with custom-sharded PostgreSQL. OpenAI's position of deliberately not sharding despite 800M users illustrates how different workload patterns lead to radically different architectural choices. Key limitations to understand: (1) This approach is optimized for read-heavy workloads where 70-95% of traffic is reads. OLTP systems with balanced read/write patterns would hit write bottlenecks sooner. (2) The single-primary remains a single point of failure for writes—if it fails, write operations fail entirely. Hot standby with failover mitigates risk but doesn't eliminate it. (3) As the number of replicas increases, replication lag management becomes challenging. OpenAI is experimenting with cascading replication (intermediate replicas relaying WAL) to scale beyond ~50 replicas, but this adds operational complexity. (4) The "boring-but-brilliant engineering" required is substantial: this approach works because of rigorous optimization at every layer, careful monitoring, and operational discipline that many teams lack. Microsoft's Azure PostgreSQL team was closely involved in this effort, suggesting that managed database services with tight vendor collaboration can push scale boundaries. The novelty here is not in exotic technology but in execution discipline—the tools (read replicas, connection pooling, rate limiting) are decades old, but applying them systematically at this scale is rare. The timing of this publication (February 2026, several weeks after ChatGPT's explosive growth) serves a secondary purpose: demonstrating that PostgreSQL and cloud managed services can handle unprecedented scale, countering narratives that exotic or expensive infrastructure is required for AI platform operations.

Topics

PostgreSQL scaling Database architecture Read replicas Single-primary databases Connection pooling Query optimization Workload isolation MVCC and write amplification