The concept of Agile changed the way we approach software development by early and continuous delivery of valuable software. To support demand on release, companies have started investing in Continuous delivery pipelines (CI/CV/CD) and this became the top priority for many organizations.
Now what happens when your Continuous delivery pipelines are flaky? There will be less confidence on the pipelines and will increase frustration and manual intervention to the whole automated process. There might be several reasons for the flakiness, it might be because of the monoliths that we have built over the years, the hardware we use, frameworks, tests and so on.
The stability of the pipelines depends on three main aspects:
- The stability of the infrastructure that the tests run on
- The stability of the applications/dependencies that our tests rely on
- The stability of the tests
Let us deep dive into these three main pillars which when done right will build immunity into our pipelines from the flakiness.
Stability of the Infrastructure
Infrastructure plays a vital role in the stability of test runs and CI/CD pipelines. Traditionally, we have a set of servers or virtual machines that are dedicated to run our tests. When we continue to use the same servers/VMs over and over again, there will be certain drift happening in the environment as we install and uninstall software and apply patches. These changes are not easy to track and will eventually end up with different results or test failures. Also when we are running applications over and over on the same infrastructure, we will start to encounter port conflicts due to zombie processes that might be hanging around.
So how do we build immunity into our infrastructure? Immutable infrastructure or Infrastructure as Code (IaC) is the mantra, the infrastructure that’s never modified once they are deployed. This will provide consistency, even when you repeat the process a 1000 times. IaC is a way of managing your infrastructure as code through which you provision the machines when you need, configure them, use them and destroy them when complete. This way, you bring up the machine every time in the same exact state and use it in the same exact way. This will eliminate the drift and provide complete consistency and bring in immunity from hardware failures and drifts.
There are numerous tools that are in the market now for provisioning the infrastructure as code, the prominent ones being Packer, Ansible, Terraform which are generic and CloudFormation, Azure Resource Manager which are vendor specific.
Dedicated third party servers running forever
Many of the applications that we build depend on several other third party applications. For example, your application might support exporting the data to a third party database or using third party authentication systems. The usual practise is to install these third party applications on a machine and keep them running forever. During a long run, these applications ultimately become slow or end up running out of memory or table space if they are databases. Also the hardware that these applications are running might become slow and unresponsive, ultimately causing the whole tests and pipelines to fail. Another major issue with this approach is that when your test or machine gets stuck in the middle of the runs, the clean-up of data does not happen. In this case, there will be stale data that remains in the system which might cause the next batch of runs to fail.
So how do we build immunity into your third-party servers so that they are always responsive and stable? Every vendor now supports Docker as a platform for their applications. Dockerize/Containerize your applications where possible and deploy them using IaC, using the same tools that are listed in the above section. This will help in pulling a fresh copy of the application (Ex: Tomcat server from Docker hub or local Artifactory) within seconds every time you run your test. The docker image gets destroyed when you are done running the tests. Even if your test fails to clean up the data, you are getting a fresh copy of the application every time which brings in immunity from stale data or slowness of third-party applications.
There are many companies that are more than a decade old, that developed frameworks long back and keep patching and maintaining them. These monoliths cause some flakiness which takes time to fix.
If your tests are flaky, having two sets of pipelines will help. Any test that fails randomly should go to an unstable pipeline and should pass a certain number of times before moving to a stable pipeline. This way, we keep the stable pipeline very reliable while we fix the flakiness. Many companies have experimented on this approach of unstable vs stable pipelines with a lot of success in bringing immunity to stable pipelines.
We have touched on the infrastructure that we use, touched on third party dependencies that our tests rely on and the tests that we run. These are the three sides of the test run stability triangle, any missing balance here will cause flakiness to our Continuous delivery pipelines. Taking care of these three aspects will make your pipelines immune to changes, provides you with an environment that is always consistent even when done a million times in a day thus building confidence in the team and meeting the goal of release on demand.
Know our Super Writer:
Srinivas Kantipudi is an experienced Software Engineering & Quality Assurance enthusiast with diverse experience driving quality of products in the application development space. He has held a wide variety of roles ranging from Developer, Test engineer, head of QA CoE, Engineering head to Scrum Master and Product Owner. He is passionate about testing, designed and implemented several tools/frameworks, always ready to try out new things, and constantly looks at ways to improve the process.
We are sorry that this article didn't meet your expectation!
Let us improve this article, your feedback matters!
Tell us how we can improve this article?