URL copied — paste it as a website source in a new notebook
Summary
OpenAI's engineering blog reveals how the company scaled PostgreSQL to support 800 million ChatGPT users—a scale that conventional database wisdom suggests is impossible for a single-primary architecture. The article challenges the prevailing belief that massive scale requires either distributed databases or complex sharding strategies. Instead, OpenAI maintains one Azure PostgreSQL Flexible Server as the primary write instance, supported by nearly 50 geo-distributed read replicas, handling millions of queries per second with low double-digit millisecond p99 latency and five-nines (99.999%) availability.
The team's approach centers on three core pillars: aggressively reducing write pressure on the primary, ruthless query optimization and resource management, and multi-layer operational resilience. On the write side, they migrate write-heavy workloads that can be horizontally partitioned to sharded systems (Azure Cosmos DB), optimize application-level writes through careful code review to eliminate redundant operations, and implement lazy writes to smooth traffic spikes. For queries, they break up expensive multi-table joins (one problematic query joined 12 tables), review all ORM-generated SQL, and enforce connection pooling with PgBouncer to reduce connection overhead from 50ms to 5ms. Operationally, they implement cache-locking to prevent "thundering herd" problems, multi-layer rate limiting, and workload isolation that separates high-priority user-facing traffic from low-priority analytics to prevent the noisy neighbor effect.
This success came through systematic optimization rather than architectural magic. Over the past year, the database load grew more than 10x while the team avoided sharding, which would have required months or years to modify hundreds of application endpoints. The approach only works because ChatGPT's workload is heavily read-oriented (70-95% of traffic), and because PostgreSQL's design, while challenging under heavy writes due to Multi-Version Concurrency Control (MVCC), remains highly efficient for read-heavy workloads when properly tuned. The article emphasizes that the decision not to shard was pragmatic: given the workload characteristics and current optimizations providing sufficient capacity, sharding remains a future consideration rather than an immediate necessity. This case study implicitly challenges the industry tendency to over-engineer solutions prematurely, demonstrating that mastering fundamentals—query discipline, connection pooling, read offloading, and operational safeguards—can take proven systems much further than commonly assumed.
Key Takeaways
OpenAI runs ChatGPT for 800 million users on a single-primary PostgreSQL instance with ~50 read replicas—not a distributed database or sharded cluster. This single-writer architecture avoids the operational complexity of sharding but requires accepting write bottlenecks as a tradeoff.
PostgreSQL's write-heavy workload challenges are addressed by explicitly migrating shardable write-intensive workloads to Azure Cosmos DB, while keeping relational data requiring strong consistency on PostgreSQL. This hybrid approach allows selective scaling rather than wholesale re-architecture.
Connection pooling with PgBouncer reduced connection setup latency from 50 milliseconds to 5 milliseconds by reusing database connections across the application layer, directly improving throughput and preventing connection exhaustion under spike loads.
OpenAI implements cache-locking and leasing mechanisms where only one request acquires the lock to fetch from the database on cache miss, preventing 'cache miss storms' where multiple simultaneous cache misses trigger cascading database overload from the same miss pattern.
Workload isolation splits traffic into high-priority (live user queries, payments) and low-priority (analytics, batch jobs) categories, routing them to separate database instances so that inefficient new features or analytics spikes cannot degrade core product performance.
The team identified and eliminated expensive multi-table join patterns by moving some computation to the application layer—one problematic query joined 12 tables and regularly triggered incidents; breaking it into simpler queries reduced primary load.
Rate limiting is implemented at multiple layers (application, connection pooler, proxy, query level) to prevent sudden traffic spikes from overwhelming the database and triggering cascading failures through exponential retry storms.
PostgreSQL's Multi-Version Concurrency Control (MVCC) creates write amplification because updating a single field requires rewriting the entire row, causing index bloat and vacuum overhead. OpenAI mitigates this by minimizing unnecessary writes and migrating write-heavy workloads elsewhere.
Schema evolution constraints are enforced rigorously: only lightweight changes permitted, 5-second timeout on all schema operations, concurrent index creation required, and new tables must use alternative systems. This prevents schema changes from blocking replica replay and causing replication lag spikes.
Over 12 months, database load grew 10x while OpenAI maintained only one SEV-0 PostgreSQL incident (during the ChatGPT ImageGen viral launch when write traffic surged 10x over a week), demonstrating that single-primary architecture can handle enormous growth when carefully optimized.
About
Author: Bohan Zhang, with contributions from Jon Lee, Sicheng Liu, Chaomin Yu, and Chenglong Hao
Publication: OpenAI Engineering Blog
Published: February 2026
Sentiment / Tone
Pragmatic, evidence-driven, and somewhat defensive about unconventional choices. The authors acknowledge that their approach contradicts conventional wisdom (which says sharding or distributed databases are necessary at their scale) but present it as a deliberate, workload-appropriate choice rather than a limitation. The tone is confident and slightly self-aware—they explain why they made this choice and what it cost (accepting write bottlenecks), positioning it as disciplined engineering trade-offs rather than architectural success. There's an underlying "we're happy with PostgreSQL for what we do" confidence, with detailed technical accounting of the optimizations that made it work.
Related Links
ByteByteGo deep dive on OpenAI's PostgreSQL scaling Comprehensive technical breakdown of OpenAI's architecture, covering the three pillars of their strategy (write pressure reduction, query optimization, operational resilience) with concrete diagrams and examples.
EngrLog: OpenAI Serves 800M Users with One Postgres Database Advanced technical guide by Tirtha Sarker on read replica strategies at scale, covering read-after-write consistency problems, lag management, four routing strategies, the WAL distribution bandwidth challenge, and production landmines specific to replica deployments.
InfoQ: OpenAI Scales Single Primary PostgreSQL Instance to Millions of QPS Technology news coverage highlighting OpenAI's achievement and the collaborative work with Azure PostgreSQL team. Summarizes cascading replication strategy for future scaling and Microsoft's role in enabling the infrastructure.
Microsoft for Startups: OpenAI and PostgreSQL scaling with Azure Azure's perspective on the collaboration, emphasizing managed database services benefits: ease of scaling, high availability, automated backups, and the elasticity needed to support rapidly growing AI workloads without managing low-level database operations.
Research Notes
Bohan Zhang, the lead author, brings significant credibility to this work. He co-founded OtterTune, an AI-powered database optimization startup, and conducted database research at Carnegie Mellon under Prof. Andy Pavlo. He has spoken at multiple PostgreSQL conferences (PostgresConf 2024, PGConf.Dev 2025, POSETTE 2025), indicating deep expertise in the PostgreSQL community and active engagement with the ecosystem.
The article generated substantial discussion across the tech community. On Hacker News and Reddit, reactions were predominantly positive, with many noting that the "PostgreSQL can't scale" myth has been thoroughly debunked. However, critics noted important caveats: this success is highly specific to read-heavy workloads, and the engineering discipline required is not typical. Some observers pointed out that the approach essentially pushes write scaling to other systems (Azure Cosmos DB), rather than truly solving it within PostgreSQL.
The piece is part of a broader trend of companies publishing detailed scaling case studies. Notion explicitly states they wish they had sharded earlier (480 logical shards), while Instagram scaled to 500M users with custom-sharded PostgreSQL. OpenAI's position of deliberately not sharding despite 800M users illustrates how different workload patterns lead to radically different architectural choices.
Key limitations to understand: (1) This approach is optimized for read-heavy workloads where 70-95% of traffic is reads. OLTP systems with balanced read/write patterns would hit write bottlenecks sooner. (2) The single-primary remains a single point of failure for writes—if it fails, write operations fail entirely. Hot standby with failover mitigates risk but doesn't eliminate it. (3) As the number of replicas increases, replication lag management becomes challenging. OpenAI is experimenting with cascading replication (intermediate replicas relaying WAL) to scale beyond ~50 replicas, but this adds operational complexity. (4) The "boring-but-brilliant engineering" required is substantial: this approach works because of rigorous optimization at every layer, careful monitoring, and operational discipline that many teams lack.
Microsoft's Azure PostgreSQL team was closely involved in this effort, suggesting that managed database services with tight vendor collaboration can push scale boundaries. The novelty here is not in exotic technology but in execution discipline—the tools (read replicas, connection pooling, rate limiting) are decades old, but applying them systematically at this scale is rare.
The timing of this publication (February 2026, several weeks after ChatGPT's explosive growth) serves a secondary purpose: demonstrating that PostgreSQL and cloud managed services can handle unprecedented scale, countering narratives that exotic or expensive infrastructure is required for AI platform operations.