Performance testing at every development stage: What I learned

How we can set ourselves up for implementing Continuous Performance Testing? In this article, following the first part on four biggest test automation misconceptions and how to avoid them, I’ve laid out a relatively short pipeline to help with the conversation about performance testing and DevOps.

The old traditional way of performance testing much less frequently leaves many possible culprits that require investigation. This article will go through each development stage where you can implement performance tests for your software product:

Commit Stage. There may be value in learning about performance at the commit stage depending on the context of the application we're testing.
Integration stage. Here we are focused on individual components or smaller groups of components. Tests that can
execute quickly and have few dependencies (or have their dependencies mocked).
Acceptance Stage. This is where we will likely execute something that looks more like a traditional, system-level performance test.
Production. While once taboo, performance testing in production provides valuable insights.

Performance at the Commit Stage

What I'm talking about here is performance testing at the unit level.

There are some tools and libraries purpose-built for unit performance testing. The table here is something I gleaned from a research paper out of Charles University in Prague that lists the most popular performance unit test frameworks. Popular is a relative term here — they reviewed almost 100,000 Java-based projects on GitHub and found that only about 3% had some form of performance unit testing. Of those projects, the vast majority (about 72%) used JMH, which is part of OpenJDK.

Project	Type
SPL	Performance unit testing research project
JMH	Standalone microbenchmarking framework
Caliper	Standalone microbenchmarking framework
JUnitPerf*	Extension of Junit testing framework
ContiPerf*	Extension of Junit testing framework
Japex*	Standalone microbenchmarking framework

*project is available, but no longer maintained

Not every method in every class warrants performance testing — however, there is a benefit to testing performance at the unit level where appropriate. These are typically called “microbenchmarks.” These should be true unit tests where you’ve mocked the dependencies — you’re just running them with multiple threads and/or multiple iterations. Running them as part of the build process can be problematic because you don’t want other things going on in the environment to impact the results. However, frameworks like JMH can help you package your tests and run them on an isolated machine.

There are some essential considerations for including performance unit testing in your pipeline. Probably two of the most significant are controlling the length of the tests and how results are presented.

First, tests would have to run too long to get enough data to give a high confidence level in small performance fluctuations observed. Therefore, we need to understand that the tests will detect only noticeable changes in performance, which should be fine for most applications.

Second, the absolute value of the results is less important than the trend from build to build. Reporting an individual set of results back to developers is much less impactful than showing them how their last commit impacted performance. While the unit performance testing frameworks have assertions, I wouldn’t have your performance unit tests break the build — at least not at first when you’re starting to implement this practice.

Here’s a short video on the topic:

Performance at the Integration Stage

Moving down the pipeline, we can learn about performance at the integration stage — after we’ve checked in code, unit tests have passed, and now our CI server is deploying a new build into our integration environment. Let’s take a look at what these tests look like.

Integration stage tests focused on individual components, which should be your most critical components (especially if you’re just starting). You can always scale once you’ve gained experience and worked out the kinks.

We are still early in our pipeline and want fast feedback. These are tests that include individual calls executed by a small number of concurrent threads for a small number of iterations. We don’t need to execute a real-life load at this point.

Depending on the maturity of your pipelines — for example, how well you control environment configuration and data — you may want to limit the complexity of your test setup by mocking dependencies of the component. You should also consider measuring system metrics by extending application performance monitoring (APM) tooling into lower environments.

Similar to performance unit tests, you want to execute these tests on every build that gets deployed here. Also similar to unit performance testing, the results are not necessarily deterministic, and you probably shouldn’t have the tests fail the build. Finally, it’s essential to understand that we’re not looking for absolute results here. As we discussed, we’re not putting a real-life load profile on the system. And this environment likely isn’t anything like production, so what we want to do is look for trends across builds.

Remember, we’re not trying to answer questions about how the system will perform in a production at this stage — instead, we’re trying to get feedback about the performance of a component so we can identify and address any detected issues as soon as possible.

Testing Monoliths vs. Mocroservices

PerfTestingP2_Graphic1 Executing performance testing performance at the unit and component level is much easier said than done for many applications. For microservice-based architectures, it’s more intuitive (certainly at the component level) to design tests. Each component is likely relatively focused in its scope, and identifying its dependencies for understanding mocking or service virtualization requirements is fairly straightforward. They also lend themselves nicely to short/focused tests that can be run efficiently upstream in the delivery cycle.

However, for monolithic architectures, the idea of a component can sometimes get a bit fuzzy. Therefore, the ability to execute nice, neat tests focusing on a specific component can become considerably more difficult. But while you may not be able to accomplish the same level of compartmentalization, some things can still be done.

An excellent place to start is to map user actions to specific areas of interest in the application. This might be a particular process in the application or business logic layer or stored procedure in the database. So rather than thinking about testing from the user perspective, think about areas of the monolith that have performance issues or, based on your knowledge of the system, are good candidates for performance testing. Then identify external inputs that exercise those areas. Your test script should only include the minimum steps necessary — i.e., avoid the traditional end-to-end flows to reduce scripting complexity and maintenance.

For monoliths, the scope of performance testing you can do early in the delivery cycle may be limited. The tests are likely to be a bit longer and have more dependencies. Therefore, the time to maintain them will be longer. In addition, the ability to mock dependencies and data will likely be limited and require more setup time.

Performance at the Acceptance Stage

At the acceptance stage, it’s common to have an “in-line” environment where additional component tests or small sub-systems of components can run. Also, an “out of line” environment where more traditional system level performance tests are performed as necessary.

For the in-line environment, this is likely where you’ll be running your automated acceptance and regression tests. I recommend capturing end-user-level performance metrics as part of these tests. This capability can be added to most frameworks fairly easily. Typically we’ll capture the following:

Response times
User/browser level performance metrics

The data can be saved as part of your execution results and sucked into a time series database for analysis if necessary.

Similar to integration tests, you should execute your component-level tests on every deployment. For the “out of line” system-level performance tests, you will execute less frequently depending on where you are in your release cycle and other events that might warrant this type of test. The key is that if you’re evaluating performance throughout the delivery cycle, you can be more selective with the larger, more effort-intensive (and expensive) tests.

It should be noted that all the recommended practices still apply for the system-level performance tests — there’s no magic. You need to ensure your load profile (number of users and what they are doing, appropriate throughput, etc.) is appropriate, understand the test’s data requirements, and prepare accordingly. You should monitor both user-level, system- and infrastructure-level metrics. While I’ve seen organizations execute this level of testing more frequently as part of automated deployment, it’s rare. Environment provisioning and/or configuration and data setup must be part of the automated process. The hardest part is ensuring the tests are ready for execution and haven’t been impacted by application changes.

Here’s a short video on the topic:

Putt it All Together

Before we talk about testing in production, I want to look at a conceptual overview of the process and representative technologies that can be used to put this all together. I’ve listed some tools — but they are just examples, not necessarily recommendations.

PerfTestingP2_Graphic2

Overview of Continuous Performance Testing implementation

System that is being tested (including the environment it exists in).
Performance / load test tooling to generate load and capture user level metrics.
Application monitoring / observability tooling to capture application and system level metrics.
Results collection for trending results across builds.
Dashboard and alerting mechanism for broadcasting results.
Including performance testing as part of the continuous integration process.

You might be thinking that this seems like a lot of work, and I don’t see how I can carve off time with everything else in my backlog. And you’re right. This isn’t a trivial amount of work. However, you don’t need to build it overnight or even in one fell swoop. An excellent place to start is executing a few simple component tests automatically as part of your build deployment. Once you work the kinks out of that, you can start to expand your solution.

And speaking of the backlog: this work should be part of your technical backlog, so it becomes part of your efforts towards continuous improvement. If this work is “off the books,” it will likely never be done. It’s best to manage it the same way you do your continuous improvement efforts.

Implementation Approach

I don’t recommend that you design something similar to what I’ve just presented and start implementing. How continuous performance testing is ideally implemented (or more appropriately, will evolve) in one organization is likely not appropriate for another. Rather, I recommend you start small — running some component-level tests against a few of your more critical components is a good place to start. Get comfortable with the process and the tools it takes to execute tests as part of your CI process and get the information to the team in a consumable and actionable state. Work out the kinks and then scale to other components, other teams, and eventually to other stages of the delivery cycle.

Again, this shouldn’t be “off the books” work – make implementing continuous performance testing part of your technical backlog so it can be prioritized along with the rest of your work. From experience, this type of stuff just doesn’t get done if it’s something the team collectively keeps in their head waiting for some “free time.

Part of getting comfortable with the process is not overwhelming your team (another reason to start small). This is information that the team likely has not received before (at least not with any regularity). You need to prepare them so they are ready to consume and understand the data that is output from the testing. And more importantly, the team needs to be prepared to act on it. Much of the value of performance testing is in the analysis! You can’t assume that your teams possess the ability to understand the data you’re giving them and take action on it. This will be a learning process.

Telling a developer that the response time for a specific call to an endpoint is trending higher is likely not enough for them to address the issue. You will need to give them the right data to help them correlate response time metrics (which are symptoms) with underlying application and infrastructure metrics to begin identifying the root cause of the issue.

Also, successful performance testing activities aren’t just implementing tools – you need to ensure you’ve taken care of supporting factors. Remember, we want fast feedback and don’t have unlimited time to execute tests – you need to think critically about which tests are important to include in your pipeline. And if these tests will be executed as part of the continuous integration, they likely need to be run on demand. This will require you have a certain level of control from a test environment and data perspective.

Finally, this won’t be a fire-and-forget solution. As you implement continuous performance testing, you should continuously get feedback from the team and improve the solution to add the appropriate value to the delivery cycle. As the ability to understand the performance data and take action on it evolves, the team can provide valuable feedback on what information they get and how they get it.

And, of course, it shouldn’t be done in a vacuum — everything you’re doing should be out in plain sight. To ensure that you can make the work part of the technical backlog and continuous improvement efforts.

Here’s a short video on the topic:

Performance in Production

Most of us are monitoring our applications in production (or at least we should be). However, testing in production, while once taboo, is fairly common today and an important part of a continuous performance strategy.

From a testing perspective, the obvious benefit is the confirmation that your application works as expected in production. Testing in production also allows us to test the entire system – not just what’s behind the firewall. With the wide availability of cloud-based load tools – we can evaluate the user experience from various locations and network conditions.

We can also combine testing with monitoring to give us the specific data we want by executing synthetic transactions in production. Similarly, we should create a continuous feedback loop between upstream testing and production monitoring to improve both test design and monitoring configuration.

Of course, testing in production isn’t as easy as pointing your tests to the prod environment. There are some things we need to do to maximize the benefit and minimize the risk:

One mistake I see, especially from the performance testing side, is creating the strategy in a vacuum. You should solicit input from across the organization, so you can put together a comprehensive plan that adds value.
If your app is hosted in 3rd party environments, you need to coordinate with them – if you don’t, you risk making them unhappy or being blocked because they think it’s a denial of service attack.
Of course, if your app is already live, you have customers. If possible, test in windows that minimize the impact. If that’s not possible, you can cordon off part of the environment to handle customer traffic while the test is run on the remaining environment.
The system should know it’s being tested. After all, you’re in production and could screw up the business depending on what you’re doing. Whether using custom headers, seeding test data in production, or some other means, you need the system to understand that it’s dealing with synthetic transactions. This is another reason to have broad IT input – if the system is architected from the beginning, or adapted over time, to be tested in production, it becomes a lot easier.
Finally, it’s a good idea to eliminate or mock requests to 3rd party services so you don’t unintentionally impact their other customers as part of your testing efforts.

Key Takeaways

Continuous performance testing will provide a continuum of performance related data as you build your product. Most importantly, it allows you to identify performance issues as early as possible – when you still have time to address them.

Traditional, system-level performance testing does not fit well within agile / iterative delivery processes
Continuous performance testing is not a one-size-fits-all solution – it should be part of the way YOUR organization delivers
Start with small and get feedback from the delivery teams
Continuously improve!