Breaking up the Monolith – Assessing the status quo
Breaking up the Monolith is a series of articles about transforming a concrete monolithic application into microservices. This post shows my personal decision making approach on the matter. The architecture of the Ameto image processing service will serve as an example.
Identify pain points
The very first step when considering an architecutral change of software is to identify the current or future problems. If there are none then why should we change anything? The evaluation should be done without having a specific pattern in mind to which the architecture should migrate to. Otherwise, it is likely that only those problems are identified to which the desired architecture is a solution and the process becomes kind of a self-fulfilling prophecy.
In Ameto, I discovered the following issues that keep plaguing me.
Long-running continuous integration builds
Ameto uses a custom build system for most artefacts. It is based on the package manager Portage used by Gentoo Linux and allows to create fairly minimal Docker images. The build system has some intriguing properties and deserves a blog post on its own. Suffice to say that the build times are fairly long even when build caches are filled. I frequently have to wait 50 minutes for the build of the backend application to finish.
I noticed that downstream services depend on mutually exclusive parts of the backend service.
Services can be classified to one another with regards to the data flow. If service B is dependant on another service A it is considered a downstream service of A. In turn, A is considered an upstream service of B. Downstream services can be thought of as consumers of an interface and upstream services as providers of an interface.
The management interface is only concerned with user and tenant information, the client library is concerned with assets, jobs, pipelines, and operators. There is also a command line tool that is used to manage pipeline operators, new tenants, and users. This design is a bit "smelly". Considering that the tests for the
/tenant endpoint make use of an external third-party service, it gets even smellier. The situation suggests that the user and tenant endpoints could be moved to a separate artefact. For example:
The management interface and the client library would consume from two distinct backend services and the CLI would consume from both. This way changes to the processing backend should no longer impact the management interface.
Some functionality such as authentication reach across all endpoints of the backend, but these cross-cutting concerns will never go away. However, they are among the hardest to manage in microservice architectures.
How does the transition to a distributed architecture solve the identified problems?
The build times are expected to be lower or the same for the individual services compared to the monolithic application. Lower build times are great, because they allow shorter feedback cycles during development. They also decrease the mean time to recovery in case of a software-related failure.
Deployments are less risky when deploying smaller chunks of functionality. For example, a change in the jobs API is unlikely to affect the management interface which is only concerned with users and tenants.
Both short development cycles and low-risk deployments ultimately lead to more frequent deployments. This is a positive feedback loop, because deploying frequently excercises the deployment pipeline and lowers the risk of subsequent deployments.
As an added benefit, splitting up a monolithic application into smaller, more manageable parts allows the team to use different language ecosystems for different parts of the application. On the downside, this makes the evaluation of new technologies harder, because you are no longer bound to the ecosystem you are working in. On the upside, it is better possible than ever to "use the right tool" for a specific purpose. Microservices are also a chance for the development team to acquire skills in other language ecosystems.
Architectural decisions never come without creating new problems. The weaknesses and possible pitfalls of microservice architectures are discussed at length on the internet. Yet, I have a few "honorable mentions".
Building a distributed monolith
When moving from a monolithic to a distributed architecture, there is always a risk of "carving out the wrong piece". In the worst case you end up with a distributed monolith, which unifies the worst aspects of both worlds.
A distributed monolith consists of several build artefacts. At the same time these artefacts are highly coupled so that they cannot be built or deployed independently.
I am pretty sure I am carving out the right piece in this case, but one can never know.
Higher operational complexity
Distributed architectures also come with significantly higher operational complexity. It is not sufficient to simply deploy multiple artefacts. Every deployed artefact needs to be monitored independently. Every service instance also produces a log which makes it necessary to think about log management.
The current monolith uses a continuous delivery pipeline and similar pipelines can set up and maintained for new services with reasonable effort. Monitoring is already in place and can be easily extended to accommodate new services, so this should be a non-issue.
Harder to debug
Debugging is no longer as easy as spinning up the application on the development computer and stepping through the code with the debugger. Requests now run through multiple processes, often asynchronously. Therefore, the need arises for some means of tracing and log management in order to be able to debug a request properly.
These issues have to be tackled along the way.
Transitioning to microservices is ideally based on scalability issues of the current architecture. For example, you cannot scale vertically any longer and you want to scale horizontally. However, there are other aspects that benefit from a distributed architecture, as I argued throughout the article.
Vertical scaling, also scaling up, is the process of increasing the capacity of a single machine. Horizontal scaling, also scaling out, is process of increasing the number of computers. Both forms of scaling are performed to accommodate higher resource consumption of a software system. Vertical and horizontal scaling can both be performed independently. They are orthogonal to each other.
Make sure you don't miss my next post: Defining a migration strategy
Most of the time it is not worth addressing future problems, so be really sure that there will be problems. ↩︎
Whenever something is remote it is a source of error. Tests might become flaky which in turn leads to unreliable builds, each of which takes some 50 minutes as mentioned earlier. Moreover, an error in the external service leads to a build failure of the whole application. Combined with flaky tests, this could lead a test run that tests changes in the
/jobendpoint to fail due to unavailability of the external service. ↩︎
As a matter of fact, Ameto previously consisted of a couple of microservices. This was too early. The individual components of the architecture were not well defined back then and the system had many moving parts. I often ran into the problem that a simple change impacted two or more services, so I felt that something was wrong here. I moved everything back together into a monolithic backend and gave it more structure. ↩︎