A Monolith of Microservices
Microservices were over-hyped years ago, but the design required to make them possible is still useful in a monolith
Developers have been complaining about monoliths for decades. There's good reason for it. Most of us have worked on monoliths that have been worked on for years. That's years of new features, pivoted features, performance tweaks, and abstraction layers. Despite every developer's good intention of keeping code clean and organized, monoliths tend to end up being incredibly difficult to work with. Onboarding for new developers on the team goes from days to weeks or months. Test cases require more and more lines of setup code. Bugs take longer to investigate. Regression possibilities grow exponentially.
Microservices became trendy in the early 2010s. Everyone was hopeful that all their problems with monoliths would now go away. Netflix's architecture became the holy grail.
Unfortunately, there are no silver bullets in software development. Few people promoting microservices actually had experience putting them into production (that includes 2015 me). That experience is required to understand the trade offs being made when moving towards microservices. You may have smaller codebases where each is easier to maintain in isolation. However, the complexity of the system as a whole still has to be dealt with. How will those codebases interact? How will each of them be monitored? What happens if a subset have an outage? How do we debug an issue with one service that may be bad responses from another? How do we manage all the logs?
Most importantly: how do we decide when we should build a new microservice? I've seen companies with 10+ microservices per developer on the team. That is a maintenance nightmare.
I also love working at early stage startups. Microservices are usually untenable in that environment. There are too many micro-pivots in trying to find product market fit to make the overhead of microservices worthwhile. That doesn't mean I was happy to accept all the old problems with monoliths though. It did mean it was time to look harder at those problems and understand why monoliths have been historically problematic.
At the start of this process, I was reminded of an engineer telling me about how their monolith was so hard to scale that they swung very far in the opposite direction. They made every database table its own microservice. That ruled out the possibility of ever using a JOIN in SQL. My immediate thought was how this affected the performance of their application. This was a fairly large company with 100 million users and billions in revenue though and that reminded me that you often take small performance hits in order to scale. It also made me realize that many developers default to using JOINs not because they are necessary at the moment, but because it feels right to care about performance. Performance is important, but trying to optimize for it has trade-offs. To understand those trade-offs, that performance impact needs to be tested.
In the vast majority of systems, the performance difference of 1 query vs 5 queries to a SQL database is not that great. Even 1 query vs 100 queries is often unnoticeable. Performance problems are more likely to occur because 1 query is trying to return too many rows, an index isn't being used, or thousands of queries are being made without realizing it. The impact for not using JOINs is negligible in most use cases.
For that negligible impact, not having JOINs buys you flexbility when needing to scale. Without JOINS, you can move tables to completely separate database servers. You can also shard any table in any way you wish. With JOINs, you'd have to refactor any code that relies on those queries in order to make the separation.
I already mentioned that having a huge number of microservices is untenable at most early stage startups. However, we can still adopt some of the patterns within a monolith. Instead of making every database table a microservice, we can make it a separate isolated module within our monolith. We're effectively treating the module as if it was a microservice. In fact, we probably could do a drop in replacement of that module with a microservice in the future if that was appropriate.
That's easy enough to say, but where does all the business logic go? We can follow a similar pattern, but the rules of how they interact have to be clearly defined. For my applications, I have "manager" files that contain the business logic for user facing features. There are basic ones that just need to interact with the tables for those features. If one feature had something that relied on another feature, I would not just have one manager call another. I'd create another manager at a higher layer to contain those functions. This allows me to keep the simple functions separated from more complex functions, which makes tracking down modifications for new features much easier. It also has the added benefit of preventing circular dependencies.
An example using the leasing module I built:
We end up with worse performance because of redundant and un-optimized function calls, but the performance hit is small and the codebase is significantly more modular and cohesive.
That solves some of the past issues with monoliths, but not all. A lot of the problems with monoliths is how many layers of abstractions there are. Abstractions are a critical part of software development, but they are often taken too far.
Most software engineers are trained on the idea that if you have to write code twice, it should be made into something that can be reused. That's sensible in isolation. It becomes problematic in a situation where code looks and performs similarly *today*. Unfortunately, the nature of an application can make it very likely that a divergence will happen in the future. The abstracted code is no longer appropriate. It exists though and developers rarely remove abstraction layers. Instead they attempt to modify the abstraction to account for both cases. Two unrelated features have now been coupled together and modifying one can create a regression in another.
Let's look at an example in our internal systems at Eight One Partners. We have features to handle loans and leases. They looked very similar to me at first. Both have interest rates. Both have monthly payments. Both have amortization tables. The 2015 version of me would have created an abstraction called lending_item and made loans and leases sub-classes of it. The 2023 version of me copied and pasted all the code for loans and renamed it leases. That ended up being the right choice because of how different loans and leases are. Loans can have custom payments and balloon payments. Leases have payment tiers, but are otherwise not customized. You can make an extra payment one month on a loan. You would not do that with a lease. Loans have different interest conventions which affect the balance of the loan. Leases can be financing leases or operating leases, which change the right of use section of the amortization table. Leases have a right of use section in the amortization table while loans do not. I may have copy/pasted the code at first, but I deleted all the copied code for calculating the amortization table and rewrote it from scratch. Creating an abstract version of that calculation would have taken 3-5x as long. It would have also made bug fixing a lot harder, and I definitely had a lot of bugs in the pre-release versions of those calculations. Both features can now be modified without worrying about creating a regression in the other. My test cases are also a lot easier to write and read.
The abstraction layer I did add there was on the financing vs operating lease. That's worked ok so far, but I have a sneaking suspicion that I may have regrets coming soon.
Simplified code is a lot easier to work with. You may spend more time typing, but you'll spend less time thinking about how to work around your self-inflicted barriers. I had our lease module ready and functional in a week. It does 90% of what most lease applications do and performs way better (real time amortization calculations vs waiting 1 minute for an exported csv/pdf). There are entire businesses focused around this one feature that can be built by a second year college student in roughly the same time it took me, so long as they kept things simple anyway.
The last source of problems with monoliths in the use of frameworks. I remember when Golang was released, a lot of folks asked about what web frameworks were available. They missed the point that the language was meant to be functional without one. I forget what was listed in the whitepaper, but the release of Golang made me realize how much pain frameworks have caused me in the past. Getting a Java Spring upgrade on the roadmap is tough since it generates zero additional revenue. Performing a Java Spring upgrade is also a nightmare since you're playing whack-a-mole for an unknown amount of time. Laravel upgrades are often worse. Why would anyone change the order of function arguments in a dynamically typed language? React upgrades are also awful mostly because of the insane dependency tree in the node ecosystem.
Golang may come with enough functionality out of the box to function without a web framework, but not every language does. In those situations, the ideal is to find a set of libraries. Symfony in PHP is a good example (when included as packages rather than used as a framework). Flask in Python is another great example. That style of developing significantly improves the modularity of a codebase. Instead of building your system as pieces of a framework, you build your system as completely separate independent pieces (as if they were microservices). This makes upgrading individual packages significantly easier compared to the big bang of framework upgrades.
Making these three changes to how I build monoliths has made dealing with a monolith much easier. There are still annoyances with some situations (e.g. testing third party API integrations), but I find software development much more enjoyable in my monoliths today than I did in the first decade of my career.