How we can set ourselves up for implementing Continuous Performance Testing? In this article, following the first part on four biggest test automation misconceptions and how to avoid them, I’ve laid out a relatively short pipeline to help with the conversation about performance testing and DevOps.
What I'm talking about here is performance testing at the unit level.
There are some tools and libraries purpose-built for unit performance testing. The table here is something I gleaned from a research paper out of Charles University in Prague that lists the most popular performance unit test frameworks. Popular is a relative term here — they reviewed almost 100,000 Java-based projects on GitHub and found that only about 3% had some form of performance unit testing. Of those projects, the vast majority (about 72%) used JMH, which is part of OpenJDK.
Project | Type |
SPL | Performance unit testing research project |
JMH | Standalone microbenchmarking framework |
Caliper | Standalone microbenchmarking framework |
JUnitPerf* | Extension of Junit testing framework |
ContiPerf* | Extension of Junit testing framework |
Japex* | Standalone microbenchmarking framework |
*project is available, but no longer maintained
Not every method in every class warrants performance testing — however, there is a benefit to testing performance at the unit level where appropriate. These are typically called “microbenchmarks.” These should be true unit tests where you’ve mocked the dependencies — you’re just running them with multiple threads and/or multiple iterations. Running them as part of the build process can be problematic because you don’t want other things going on in the environment to impact the results. However, frameworks like JMH can help you package your tests and run them on an isolated machine.
There are some essential considerations for including performance unit testing in your pipeline. Probably two of the most significant are controlling the length of the tests and how results are presented.
First, tests would have to run too long to get enough data to give a high confidence level in small performance fluctuations observed. Therefore, we need to understand that the tests will detect only noticeable changes in performance, which should be fine for most applications.
Second, the absolute value of the results is less important than the trend from build to build. Reporting an individual set of results back to developers is much less impactful than showing them how their last commit impacted performance. While the unit performance testing frameworks have assertions, I wouldn’t have your performance unit tests break the build — at least not at first when you’re starting to implement this practice.
Here’s a short video on the topic:
Moving down the pipeline, we can learn about performance at the integration stage — after we’ve checked in code, unit tests have passed, and now our CI server is deploying a new build into our integration environment. Let’s take a look at what these tests look like.
Integration stage tests focused on individual components, which should be your most critical components (especially if you’re just starting). You can always scale once you’ve gained experience and worked out the kinks.
We are still early in our pipeline and want fast feedback. These are tests that include individual calls executed by a small number of concurrent threads for a small number of iterations. We don’t need to execute a real-life load at this point.
Depending on the maturity of your pipelines — for example, how well you control environment configuration and data — you may want to limit the complexity of your test setup by mocking dependencies of the component. You should also consider measuring system metrics by extending application performance monitoring (APM) tooling into lower environments.
Similar to performance unit tests, you want to execute these tests on every build that gets deployed here. Also similar to unit performance testing, the results are not necessarily deterministic, and you probably shouldn’t have the tests fail the build. Finally, it’s essential to understand that we’re not looking for absolute results here. As we discussed, we’re not putting a real-life load profile on the system. And this environment likely isn’t anything like production, so what we want to do is look for trends across builds.
Remember, we’re not trying to answer questions about how the system will perform in a production at this stage — instead, we’re trying to get feedback about the performance of a component so we can identify and address any detected issues as soon as possible.
However, for monolithic architectures, the idea of a component can sometimes get a bit fuzzy. Therefore, the ability to execute nice, neat tests focusing on a specific component can become considerably more difficult. But while you may not be able to accomplish the same level of compartmentalization, some things can still be done.
An excellent place to start is to map user actions to specific areas of interest in the application. This might be a particular process in the application or business logic layer or stored procedure in the database. So rather than thinking about testing from the user perspective, think about areas of the monolith that have performance issues or, based on your knowledge of the system, are good candidates for performance testing. Then identify external inputs that exercise those areas. Your test script should only include the minimum steps necessary — i.e., avoid the traditional end-to-end flows to reduce scripting complexity and maintenance.
For monoliths, the scope of performance testing you can do early in the delivery cycle may be limited. The tests are likely to be a bit longer and have more dependencies. Therefore, the time to maintain them will be longer. In addition, the ability to mock dependencies and data will likely be limited and require more setup time.
At the acceptance stage, it’s common to have an “in-line” environment where additional component tests or small sub-systems of components can run. Also, an “out of line” environment where more traditional system level performance tests are performed as necessary.
For the in-line environment, this is likely where you’ll be running your automated acceptance and regression tests. I recommend capturing end-user-level performance metrics as part of these tests. This capability can be added to most frameworks fairly easily. Typically we’ll capture the following:
The data can be saved as part of your execution results and sucked into a time series database for analysis if necessary.
Similar to integration tests, you should execute your component-level tests on every deployment. For the “out of line” system-level performance tests, you will execute less frequently depending on where you are in your release cycle and other events that might warrant this type of test. The key is that if you’re evaluating performance throughout the delivery cycle, you can be more selective with the larger, more effort-intensive (and expensive) tests.
It should be noted that all the recommended practices still apply for the system-level performance tests — there’s no magic. You need to ensure your load profile (number of users and what they are doing, appropriate throughput, etc.) is appropriate, understand the test’s data requirements, and prepare accordingly. You should monitor both user-level, system- and infrastructure-level metrics. While I’ve seen organizations execute this level of testing more frequently as part of automated deployment, it’s rare. Environment provisioning and/or configuration and data setup must be part of the automated process. The hardest part is ensuring the tests are ready for execution and haven’t been impacted by application changes.
Here’s a short video on the topic:
Before we talk about testing in production, I want to look at a conceptual overview of the process and representative technologies that can be used to put this all together. I’ve listed some tools — but they are just examples, not necessarily recommendations.
Overview of Continuous Performance Testing implementation
I don’t recommend that you design something similar to what I’ve just presented and start implementing. How continuous performance testing is ideally implemented (or more appropriately, will evolve) in one organization is likely not appropriate for another. Rather, I recommend you start small — running some component-level tests against a few of your more critical components is a good place to start. Get comfortable with the process and the tools it takes to execute tests as part of your CI process and get the information to the team in a consumable and actionable state. Work out the kinks and then scale to other components, other teams, and eventually to other stages of the delivery cycle.
Again, this shouldn’t be “off the books” work – make implementing continuous performance testing part of your technical backlog so it can be prioritized along with the rest of your work. From experience, this type of stuff just doesn’t get done if it’s something the team collectively keeps in their head waiting for some “free time.
Part of getting comfortable with the process is not overwhelming your team (another reason to start small). This is information that the team likely has not received before (at least not with any regularity). You need to prepare them so they are ready to consume and understand the data that is output from the testing. And more importantly, the team needs to be prepared to act on it. Much of the value of performance testing is in the analysis! You can’t assume that your teams possess the ability to understand the data you’re giving them and take action on it. This will be a learning process.
Telling a developer that the response time for a specific call to an endpoint is trending higher is likely not enough for them to address the issue. You will need to give them the right data to help them correlate response time metrics (which are symptoms) with underlying application and infrastructure metrics to begin identifying the root cause of the issue.
Also, successful performance testing activities aren’t just implementing tools – you need to ensure you’ve taken care of supporting factors. Remember, we want fast feedback and don’t have unlimited time to execute tests – you need to think critically about which tests are important to include in your pipeline. And if these tests will be executed as part of the continuous integration, they likely need to be run on demand. This will require you have a certain level of control from a test environment and data perspective.
Finally, this won’t be a fire-and-forget solution. As you implement continuous performance testing, you should continuously get feedback from the team and improve the solution to add the appropriate value to the delivery cycle. As the ability to understand the performance data and take action on it evolves, the team can provide valuable feedback on what information they get and how they get it.
And, of course, it shouldn’t be done in a vacuum — everything you’re doing should be out in plain sight. To ensure that you can make the work part of the technical backlog and continuous improvement efforts.
Here’s a short video on the topic:
Most of us are monitoring our applications in production (or at least we should be). However, testing in production, while once taboo, is fairly common today and an important part of a continuous performance strategy.
From a testing perspective, the obvious benefit is the confirmation that your application works as expected in production. Testing in production also allows us to test the entire system – not just what’s behind the firewall. With the wide availability of cloud-based load tools – we can evaluate the user experience from various locations and network conditions.
We can also combine testing with monitoring to give us the specific data we want by executing synthetic transactions in production. Similarly, we should create a continuous feedback loop between upstream testing and production monitoring to improve both test design and monitoring configuration.
Of course, testing in production isn’t as easy as pointing your tests to the prod environment. There are some things we need to do to maximize the benefit and minimize the risk:
Continuous performance testing will provide a continuum of performance related data as you build your product. Most importantly, it allows you to identify performance issues as early as possible – when you still have time to address them.