The CI/CD Pipeline: Why Testing Is Required at Every Stage
Originally published by New Context.
In the modern tech landscape, it is crucial to be able to quickly build and deploy code that you know will work. This is achieved through having a continuous integration and continuous delivery (CI/CD) pipeline that properly tests your application before deployment and monitoring it after deployment. In this series of articles on CI/CD, we are going to talk about how to test throughout the CI/CD pipeline, some CI/CD best practices, and we’ll conclude by waxing philosophical about CI/CD and providing some words of encouragement.
Does your code build properly? Does it do what you expect it to do? Testing is how you can programmatically determine the answer to these questions and more. With those answers, you can better automate your processes. If your tests are asking the right questions and you’re getting the right answers, do you need to look at every single result and give your approval so the code can deploy?
In order to truly know that your code is working, you must implement various kinds of testing throughout your deployment pipeline. When testing is a key component of your continuous integration and continuous delivery (CI/CD) implementation, you will be able to deliver with speed and reliability. When configured, your CI/CD pipeline will automatically build, test, and deploy your code when it is merged into a branch. However, even though your pipeline will have the major stages of build, test, and deploy, there is testing that happens throughout the pipeline. In this article, we will discuss what testing is available to you, what pipelines look like, and then discuss how to test throughout the pipeline.
What Kinds of Testing are Available Throughout the CI/CD Pipeline?
The types of tests you use, regardless of workload type (e.g., web app, container, backend application), are going to be largely the same. There will be special considerations for each language and platform and environment. Languages and platforms will use different tooling. Your environments will have different configurations, networks, and secrets. But the fundamental ideas will be the same.
Regardless of language or platform, you will need to test and those principles remain the same. Let’s run through some core types of testing:
- Unit Testing: A unit test, as the name implies, tests an individual unit within your code: a function, module, or set of processes. By testing this way, you’ll be able to prove that each portion of the code is functioning correctly.
- Integration Testing: Once you’ve proven that each unit of your code is functioning properly, you can then do integration testing to make sure that each unit is working properly with each other.
- Performance Testing: This kind of testing is used to determine if your systems will be responsive and stable.
There are a number of tests you can perform to determine if your system is performant. Let’s cover a few of them:
- Load Testing: Can your system handle the expected load? What about higher loads?
- Stress Testing: Now that your system can handle high loads, let’s push the limits and see what it can really handle.
- Spike Testing: This type of testing determines if the system can handle large and quick spikes in load.
- Configuration Testing: What happens to the system if we start messing around with its configuration? How does this affect the performance?
- Security Testing: Security testing checks for flaws in your project’s security by checking for risks and vulnerabilities.
- Chaos Testing: This is an extreme form of testing that is used in production environments. Chaos testing will kill nodes, disable ports, and make other heinous configuration changes. The purpose of this chaos is to test the resiliency of your system and, to see if things can self-heal properly. A; and all of this testing is done during business hours so your developers can fix things while they’re at work, not at three in the morning.
Testing Throughout the CI/CD Pipeline
You might be asking yourself: What should I be testing for? How and when should I be doing it? Do I need to test every single line of code? We’ve run through some core testing methodologies, so now let’s run through a basic CI/CD pipeline and talk about when we should be applying those tests. First, let’s summarize the basic stages of a CI/CD pipeline and then get more specific:
- Build: Depending on your specific application and environment configuration, your code is compiled, dependencies are pulled in and the application is built into a deployable artifact. Docker images may be built at this time.
- Test: Automated tests are run against your code. These tests are written to ensure not only that your code is written correctly, but that it’s doing what you think it should be doing.
- Deploy: Your code is deployed to your various environments. This can be fully automated. More often, the code is automatically deployed to the lower environments for testing. Then, in production, there may be needed approvals, a scheduled deployment, or, if all tests pass, an automated deployment.
Your CI/CD pipeline always begins with a version control system. When your code is pushed to the remote repository, it should be automatically built. This could be building a Docker image, compiling code, running through an interpreted language. Even though this stage is going to look different for various types of applications, the core principle remains the same: the successful compilation or interpretation of your code is a test unto itself.
Beyond that, you need to validate that your code is doing what you think it should be doing. This is where unit testing and integration testing come into play. Spend some serious time here and make sure that the results you’re getting are what you expect. Do you need to test for every single line of code? Absolutely not. In fact, having 100% code coverage is a fundamentally flawed idea because it works under the assertion that all of your tests are good. We are interested in testing for the right situations which will probably not be every line of code. Additionally, if your tests are too simple or too general, your functions can change in significant ways and the tests will still pass (remember, 1/1=1, but so does 1*1). Every test needs to be meaningful and serve a purpose.
Although your code is not yet deployed anywhere, there is some performance testing you can be doing now. Check for optimization and efficiency. Perform security testing to make sure that the base code isn’t vulnerable. Run through some configuration testing to make sure that your system is robust.
Do your tests need to vary based on the environment? Probably not, unless there are special or obvious caveats. But this is the exception, not the rule. Any differences between environments should be extracted into an environment configuration set and tested for.
Once your code is merged into your main branch, it is ready for deployment to your various environments so that it can be tested, vetted, and ultimately proven worthy of production deployment. In this section we will discuss when to tag, merge, and release code. We will also briefly cover different deployment techniques.
When should you tag and release your code? Like all IT questions, it depends. If you’re using trunk-based development, all of your code will be merged into the main branch. When you are ready for a release, tag the main branch and run that tag through your testing environments before deploying to production. But how often should these releases happen? Again, that depends. For internal services, you can usually release more often. For critical and public systems, a more judicious approach may be warranted, perhaps only releasing on certain days or after certain times. But if your deployment methodology allows it, you can easily roll back to a known safe version which will allow you to deploy more frequently and with more confidence.
There are a variety of deployment methodologies that you will need to consider. Let’s cover some of the more common ones include:
Recreate strategy: Tear down the old and stand up the new. This is an antiquated approach and is listed here for historical reasons and as a cautionary tale. Think about every time you’ve gone to a website that is “Under construction.” Do you ever return? This methodology is dated and is only useful for losing customers.
Canary deployments: A canary deployment slowly rolls out your deployment to a small subset of your systems. This way, if there are issues, only a small percentage of your users and systems will be impacted. This methodology has the added benefit of zero downtime.
Blue/green deployments: With this methodology, you have two production environments which should be as identical as possible. One is live and receives the production traffic. For the purpose of this example, let’s say that blue is live. When you deploy your new release, you deploy to the other environment (in this case, green). Once deployed, you make your router point all of the traffic to the new environment. A big advantage of this approach is that if there are issues, you can easily roll back to the previous environment. And there is zero downtime.
A/B testing: This is more a form of testing than deployment, but it’s important enough to discuss here, especially having just discussed blue/green deployment. A/B testing is when your current production code is running on Production Environment A and you then deploy your new code to Production Environment B. Unlike blue/green testing, though, both A and B are live and receive traffic. This is a useful testing methodology because with both systems live, you can track users and usage and gain valuable insight about your changes.
Feature toggles: Also known as feature flags, this form of development provides a mechanism for developers to toggle features so that they can run, not run, or even be hidden during runtime. This methodology allows developers to easily rollout features to specific subsets of users, allowing for targeted roll outs.
The bulk of your performance testing will happen once your code has been deployed. Make sure that your application can handle the loads you expect. Ensure that it can handle spikes in traffic. If your application auto-scales, is that working? Does it scale back down when the traffic dies down? Really flog the application and see where it breaks. It’s always going to break at some point, you just need to make sure it breaks at a point you’re comfortable with.
Some testing, like chaos testing, happens in production; but not everyone is going to find the same levels of success from testing this way. It is worthwhile to spend some time, discuss these different strategies, and implement them. Even if you end up scrapping some forms of testing, you will have learned and documented what works and what doesn’t work in your environment.
Ensure that Your CI/CD Pipeline Testing Is Dynamic
Testing throughout the pipeline allows you not only to properly test your code, but can also help you speed up your deployment process. Not all tests have to be run serialized. Testing throughout the pipeline will help you parallelize.