The microservices tax: when distributed systems cost more than the problems they solve
The industry adopted microservices as the default architecture for modern systems, but the tax is coming due. When the company that helped popularise distributed services publishes a 90% cost reduction by returning to a monolith, the assumption deserves reexamination.In March 2023, the Amazon Prime Video team published a technical case study with an unremarkable title and a remarkable conclusion1. Their audio/video monitoring service—built as a distributed microservices architecture using AWS Step Functions for orchestration and Amazon S3 for intermediate data storage between processing stages—had been redesigned as a monolith. The result was a 90% cost reduction. Not a minor optimisation. Not a marginal improvement from tuning configuration. A reduction to one-tenth of the original cost, achieved by abandoning the distributed architecture and putting the components back together.
The architecture that failed had followed every best practice. Step Functions orchestrated the workflow. S3 handled intermediate state. Each processing stage was an independent component that could scale separately. It was textbook microservices. It also hit a hard scaling limit at 5% of expected load. The distributed orchestration overhead—Step Functions charging per state transition, S3 charging per read and write for intermediate data, network latency between each processing stage—created costs that scaled faster than the actual workload. The architecture designed for scalability became the bottleneck that prevented it.
The Prime Video team was careful to frame this as a specific architectural decision for a specific use case. The industry heard something broader: the company whose cloud platform profits from distributed architectures, the company that helped popularise service-oriented design, had just published evidence that a monolith was the better choice. The case study accumulated millions of views, thousands of citations, and prompted a reckoning that is still under way.
The consolidation wave
Amazon's case study was not an anomaly. It was the most visible data point in a broader trend that the industry has been reluctant to name.
O'Reilly's platform data—drawn from millions of technology professionals—shows microservices interest declined 20% in 2023, with the report noting that 'many organisations are paying the price for moving to microservices because it was the thing to do, not because they needed the scale or flexibility'2. The decline continued: a further 24% drop in 20243. A peer-reviewed multivocal literature review identified five consistent drivers pushing organisations from microservices back to monoliths: cost, complexity, scalability limitations, performance degradation, and organisational overhead4. The consolidation trend is not a fringe movement. It is a systematic correction.
The debugging overhead is well documented. A Perforce survey of Java developers found that 62% experienced performance issues in microservices architectures, with a third citing troubleshooting distributed service-to-service communication as their primary challenge5. The overhead comes from the fundamental nature of distributed systems. When a request fails in a monolith, the stack trace tells you exactly what happened and where. When a request fails across microservices, the failure might be in any of the services involved, in the network between them, in the serialisation of data, in the retry logic, in the circuit breaker configuration, or in the timeout settings. Each service produces its own logs. Correlating those logs requires distributed tracing infrastructure that itself introduces complexity and cost.
Segment's consolidation provides a concrete illustration. The company collapsed 140 destination microservices into a single service after three full-time engineers found themselves spending most of their time firefighting operationally rather than building features6. Developer productivity improved immediately—shared library improvements rose from 32 to 46 per year—and load spikes that previously triggered on-call pages were absorbed by the consolidated worker pool. Every metric that mattered improved, not by optimising the microservices architecture, but by eliminating it.
These numbers are consistent with what the distributed systems literature has predicted for decades, and with the patterns documented by researchers studying architectural decision-making. Dragoni and colleagues, in their comprehensive survey of microservice architectures, observed that the operational complexity of distributed systems is frequently underestimated during initial adoption decisions7. Teams evaluate the benefits—independent scaling, independent deployment, technology diversity—without fully accounting for the costs that emerge only after the architecture is in production and real traffic exposes the latency penalties, consistency challenges, and coordination overhead that theory describes abstractly.
The physics of distribution
The fundamental issue is not architectural philosophy. It is physics.
A function call within a single process takes nanoseconds. A network call between services takes milliseconds. That is a difference of approximately six orders of magnitude—a million-fold penalty. No amount of clever orchestration, optimised serialisation, or network tuning can close this gap. It is a consequence of how computers work. Data moving within a CPU's cache hierarchy operates at the speed of electrical signals across nanometre distances. Data moving across a network operates at the speed of packets traversing cables, switches, and protocol stacks.
When you decompose a monolith into microservices, every function call that crosses a service boundary pays this million-fold penalty. If your system makes ten internal calls to process a request, and you distribute those calls across services, you have added ten network round trips. Each round trip includes serialisation of the request data, DNS resolution or service discovery, TCP connection establishment or reuse from a connection pool, TLS negotiation if the communication is encrypted, transmission, deserialisation of the response, and error handling for all the ways a network call can fail that a function call cannot.
The network introduces failure modes that do not exist in a monolith. Packets can be lost. Connections can time out. Services can be temporarily unreachable. DNS can return stale records. Load balancers can route to unhealthy instances. Each of these failure modes requires explicit handling—retry logic, circuit breakers, fallback behaviours, timeout configuration—that adds code, complexity, and potential bugs. In a monolith, a function either returns a result or throws an exception. In a distributed system, it might return a result, throw an exception, time out, return a partial result, return a stale cached result, or fail silently in ways that only manifest downstream.
Eric Brewer's CAP theorem formalises one dimension of this constraint: a distributed system cannot simultaneously guarantee consistency, availability, and partition tolerance8. Any microservices architecture must choose which guarantees to sacrifice, and most engineers discover the implications of that choice only when production traffic reveals the edge cases that the theorem predicts.
Werner Vogels, Amazon's CTO, described the practical consequence in his foundational paper on eventual consistency9. In distributed systems, data that was written by one service may not be immediately visible to another service. The lag between write and read—whether milliseconds or seconds—creates a window where different services have different views of reality. Building correct behaviour on top of this inconsistency requires careful design that accounts for temporal gaps that simply do not exist when all data lives in a single database.
The operational tax
Beyond the latency and consistency penalties, microservices impose an operational burden that is consistently underestimated during the initial decomposition decision.
Each service requires its own deployment pipeline. Twenty services means twenty pipelines to maintain, monitor, and debug when they break. Each pipeline needs its own configuration, its own secrets management, its own environment variables, its own health checks. The infrastructure team that maintained one deployment process now maintains twenty. The complexity is not additive—it is multiplicative, because services depend on each other and deployments must be coordinated. A change to a shared data format requires synchronised deployments across every service that consumes that format, with careful attention to backwards compatibility and rollback procedures.
Each service requires its own monitoring. Dashboards multiply. Alert rules multiply. On-call rotations must cover more systems with more failure modes. Research from Grafana Labs' 2025 Observability Survey found that observability spend averages 17% of total compute infrastructure cost10. For microservices architectures, this percentage climbs because the system is inherently harder to observe. The number of metrics grows with the square of the number of services, because you need to monitor not just each service but the interactions between them.
Testing complexity escalates in ways that are difficult to anticipate. In a monolith, integration tests exercise real code paths within a single process. In a microservices architecture, integration tests require either running multiple services simultaneously—with all their dependencies, databases, and configuration—or using contract tests that verify interface compatibility without testing actual behaviour. Chris Richardson, in his comprehensive treatment of microservices patterns, describes the testing pyramid for distributed systems as fundamentally more expensive at every level than its monolithic counterpart11. Unit tests remain cheap, but the layers above them—integration tests, contract tests, end-to-end tests—each require infrastructure that did not exist in the monolithic architecture.
Michael Nygard, author of "Release It!", describes the pattern of systems designed for conditions that never materialise12. Teams architect for the traffic they hope to receive, not the traffic they actually receive. They build for the scale they aspire to, not the scale they operate at. The operational tax of microservices is paid immediately and continuously, regardless of whether the scaling benefits ever materialise. The tax is not contingent on success. It is guaranteed.
Conway's Law in reverse
Melvin Conway observed in 1967 that organisations design systems that mirror their communication structures13. This observation, known as Conway's Law, is among the most consistently validated findings in software engineering. It is also among the most consistently misapplied.
Conway's Law is often cited as justification for microservices: small, autonomous teams should build small, autonomous services. The architecture follows the organisation. But many organisations applied Conway's Law in reverse. Instead of letting their organisational structure inform their architecture, they restructured their organisations to match a microservices architecture they had already decided to adopt. Teams were split to match service boundaries. Communication patterns were formalised around API contracts. Organisational complexity increased to serve architectural ideology rather than business need.
The result is teams that own individual services but lack the authority or context to deliver end-to-end features. A feature that touches three services requires coordination across three teams, three sprint backlogs, and three deployment schedules. The autonomy that microservices promised—teams independently building and deploying their services—becomes coordination overhead when features span service boundaries, as most non-trivial features do. The meetings multiply. The Slack channels multiply. The alignment ceremonies multiply. The overhead that microservices were supposed to eliminate is replaced by a different kind of overhead that is harder to see and harder to measure.
Forsgren, Humble, and Kim's research, published as "Accelerate," found that architecture alone does not predict delivery performance14. What predicts performance is the ability to make changes safely and quickly—a capability that depends on loose coupling, good testing, and deployment automation, none of which require microservices specifically. A well-structured monolith with clear module boundaries, comprehensive tests, and automated deployment can achieve the same delivery metrics as a microservices architecture, without the distributed systems tax.
Martin Fowler, drawing on the experience of Sam Newman and other practitioners, argued explicitly for a 'monolith first' approach15. Start with a monolith. Understand your domain boundaries through experience, not speculation. Identify the components that genuinely need independent scaling or deployment through measured evidence, not architectural intuition. Extract those—and only those—into services. The wisdom is in the sequence: understand first, then decompose. The industry often did the opposite: decompose first, then discover which boundaries were wrong, then pay the cost of redrawing them across distributed systems.
When microservices are right
This is not an argument against microservices. It is an argument against microservices as a default.
Microservices solve real problems at genuine scale. They are the right choice when components have genuinely different scaling requirements—a video transcoding pipeline that needs to scale to thousands of instances whilst the user profile service handles modest load. They are right when teams are large enough that independent deployment reduces coordination costs rather than creating them—typically above fifty to one hundred engineers, where the communication overhead of a monolithic codebase exceeds the coordination overhead of distributed services. They are right when fault isolation is critical—when a failure in one component must not cascade to others, and the operational investment in circuit breakers, bulkheads, and graceful degradation is justified by the business cost of correlated failures.
Google, Netflix, and Amazon operate microservices at scale because they have the engineering organisations, the custom tooling, and the operational maturity to absorb the distributed systems tax. Google built Borg, then Omega, then Kubernetes—investing years of dedicated engineering effort into container orchestration before opening it to the wider industry16. Netflix built an entire ecosystem of resilience libraries—Hystrix, Eureka, Zuul—to manage the failure modes that distributed systems introduce. Amazon invested over a decade in service-oriented architecture, learning through painful experience which boundaries worked and which did not, before the term 'microservices' was coined.
These organisations paid the microservices premium and received the benefits because their scale demanded it. But most software organisations are not Google, Netflix, or Amazon. Most operate systems that serve thousands or tens of thousands of users, not hundreds of millions. Most have engineering teams of ten to fifty people, not thousands. Most do not have dedicated platform teams to build and maintain the observability, deployment, and testing infrastructure that microservices demand. For these organisations, the microservices premium is all cost and no benefit.
Martin Fowler, who popularised many of the patterns used in microservices architectures, explicitly warns about what he calls the 'microservices premium'17. The first rule of distributed systems, he argues, is don't distribute unless you have to. The premium is real: microservices only earn their cost when the benefits of independent deployment and independent scaling are actively needed and cannot be achieved through simpler means. For most teams, the premium is paid but the benefits are never collected.
The modular monolith
The alternative is not a return to unstructured monoliths. It is the modular monolith—a single deployable unit with clear internal boundaries, defined interfaces between components, and the discipline of good architecture without the overhead of distribution.
A modular monolith provides most of the architectural benefits that teams seek when they adopt microservices. Clear module boundaries enforce separation of concerns. Defined interfaces between modules enable independent development within the codebase. Data access is channelled through module APIs rather than shared directly. The critical difference is that module boundaries are enforced at the code level rather than the network level. A violation of module boundaries in a monolith is a code review comment. A violation of service boundaries in a microservices architecture is a production incident.
Lehman's Laws of Software Evolution describe the entropic tendency of all software systems18. Systems undergoing continuous change tend towards increasing complexity unless work is specifically invested to reduce it. This law applies equally to monoliths and microservices. But the entropy of a microservices architecture includes not just code complexity but also network topology complexity, deployment pipeline complexity, and data consistency complexity. The modular monolith constrains entropy to a single dimension—code—where it is most visible and most manageable.
The modular monolith also preserves the option to extract services later. When a genuine scaling requirement emerges—demonstrated by production metrics, not anticipated by architectural speculation—a well-bounded module can be extracted into a service with far less risk than decomposing an unstructured monolith. The extraction is informed by real usage patterns rather than speculative domain modelling. The service boundaries reflect actual scaling needs rather than theoretical ones.
The cost of reversal
The organisations now consolidating their microservices are paying twice. They paid to decompose their monolith into services—the architects, the infrastructure, the migration effort, the months of parallel running, the inevitable production incidents during cutover. Now they are paying to recompose those services into a more consolidated architecture—a second migration with its own architects, infrastructure, effort, and incidents.
Both transitions are expensive, risky, and time-consuming. The Prime Video team could quantify their savings because they had clear before-and-after metrics. Many organisations undertaking consolidation lack this clarity. They know the microservices architecture is more expensive than it should be, but they cannot easily measure what a consolidated alternative would cost because they no longer have one to compare against. The decision to consolidate is often based on engineering intuition and accumulated frustration rather than rigorous cost-benefit analysis—which, ironically, mirrors how the decision to adopt microservices was made in the first place.
The software industry has a recurring pattern of adopting solutions before understanding the problems they solve. Object-oriented programming was applied everywhere, including where procedural code was simpler and clearer. Agile methodologies were adopted as rigid frameworks, contradicting their own founding principles. Microservices are the latest iteration: a genuinely useful approach at appropriate scale, applied universally as an article of faith.
Every architecture is a bet on the future. Microservices bet that your system will need independent scaling, independent deployment, and fault isolation at a level that justifies the distributed systems tax. For a significant portion of organisations that made this bet, the future did not arrive. The tax was real. The benefits were not.
The question every engineering team should ask before adopting microservices is not 'is this how Netflix does it?' but 'do we have Netflix's problems?' For most teams, the honest answer renders the architectural choice obvious.
Footnotes
-
Kolny, M. (2023). "Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%." Amazon Prime Video Tech Blog. https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90 ↩
-
O'Reilly. (2024). "Technology Trends for 2024." O'Reilly Media. https://www.oreilly.com/radar/technology-trends-for-2024/ ↩
-
O'Reilly. (2025). "Technology Trends for 2025." O'Reilly Media. https://www.oreilly.com/radar/technology-trends-for-2025/ ↩
-
Fritzsch, J., Bogner, J., Zimmermann, A., & Wagner, S. (2024). "From Microservice to Monolith: A Multivocal Literature Review." Electronics, 13(8), 1452. https://www.mdpi.com/2079-9292/13/8/1452 ↩
-
Perforce/JRebel. (2020). "Java Developer Productivity Report." Perforce Software. https://www.prnewswire.com/news-releases/perforce-java-developer-survey-finds-over-62-of-developers-experiencing-performance-issues-in-microservices-300983080.html ↩
-
Newland, A. (2020). "Goodbye Microservices: From 100s of problem children to 1 superstar." Segment Engineering Blog. https://segment.com/blog/goodbye-microservices/ ↩
-
Dragoni, N., Giallorenzo, S., Lafuente, A. L., et al. (2017). "Microservices: yesterday, today, and tomorrow." Present and Ulterior Software Engineering, 195-216. ↩
-
Brewer, E. (2012). "CAP twelve years later: How the 'rules' have changed." Computer, 45(2), 23-29. ↩
-
Vogels, W. (2009). "Eventually Consistent." Communications of the ACM, 52(1), 40-44. ↩
-
Grafana Labs. (2025). "Observability Survey 2025." Grafana Labs. https://grafana.com/observability-survey/2025/ ↩
-
Richardson, C. (2018). "Microservices Patterns: With examples in Java." Manning Publications. ↩
-
Nygard, M. (2018). "Release It!: Design and Deploy Production-Ready Software." 2nd Edition. Pragmatic Bookshelf. ↩
-
Conway, M. (1968). "How Do Committees Invent?" Datamation, 14(4), 28-31. ↩
-
Forsgren, N., Humble, J., & Kim, G. (2018). "Accelerate: The Science of Lean Software and DevOps." IT Revolution. ↩
-
Fowler, M. (2015). "MonolithFirst." Martin Fowler's Bliki. https://martinfowler.com/bliki/MonolithFirst.html ↩
-
Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). "Borg, Omega, and Kubernetes." ACM Queue, 14(1), 70-93. ↩
-
Fowler, M. (2015). "MicroservicePremium." Martin Fowler's Bliki. https://martinfowler.com/bliki/MicroservicePremium.html ↩
-
Lehman, M. M. (1980). "Programs, Life Cycles, and Laws of Software Evolution." Proceedings of the IEEE, 68(9), 1060-1076. ↩
Published on:
Updated on:
Topics
TL;DR
Amazon Prime Video's distributed architecture hit a scaling limit at 5% of expected load; consolidating to a monolith cut costs by 90%. O'Reilly's platform data shows microservices interest declined 20% in 2023 and a further 24% in 2024, as organisations discovered the operational tax exceeds the benefits. The core penalty—in-memory function calls taking nanoseconds versus network calls taking milliseconds—is physics, not engineering. Most systems adopted microservices as architectural ideology rather than responding to genuine scaling requirements.