Soak testing new services on the JVM – the why and the how

As a software engineer responsible for building new backend (micro) services, it’s not sufficient to only test that they meet functional requirements. There are additional technical tests, focusing on the quality of the software, that must also be completed to ensure that your new service/application is production-ready. Whilst such tests require additional effort and can push out the date for getting your new service into production, they’re essential if you want to be confident of how your service will operate in production, and avoid any surprises that could negatively impact your business. The main types of tests I’m referring to are load testing (itself a type of performance testing) and soak testing. I’ve covered the subject of load testing (specifically measuring throughput under load) in a previous blog post. In this blog post I cover soak testing, including the aims of this type of test; how to soak test any service (regardless of the language it’s written in) that runs on the Java Virtual Machine (JVM), including those that run in a (e.g. Docker) container.

1) The Aim and Scope of Soak Testing

The aim of a soak test is to gain confidence that your software is capable of running continuously for an indefinite period, when fully exercised, end-to-end, under typical production load, without failing, or without there being any degradation in its performance.

The last thing you want is to deploy a new service into production and find it eventually crashes, or slows down after a while, requiring it to be monitored (babysat) or restarted regularly. If you don’t soak test you’re gambling on the reliability of your service in production, and if you lose this bet it may reduce the availability and throughput of your service, which could ultimately impact your business.

Load testing focuses on establishing how an application’s (e.g. a service’s web APIs) performance, in terms of its throughput (e.g. API response times), vary under increasing load up to a required max. In contrast, a soak test is focused on establishing the long term reliability of your application, under typical load.

Soak testing tests your fully deployed application, end-to-end. It should exercise both your application code and its interaction with the embedded third-party software (libraries) on which your application is built, in a production-like runtime environment. For example, if you’re building a Java application, you’d focus on testing the Java process running on the chosen version of your JRE (e.g. OpenJDK JRE for Java version X) and everything that runs in it, including the 3rd party application framework (e.g. Spring core, Spring MVC, Spring Boot), the embedded web server/container (e.g. Tomcat or Jetty), your application service code, and the many other packaged 3rd party Java libraries on which your app depends (e.g. Apache HTTP client, Jackson, Guava, etc).

A soak test should also encompass the service’s full production runtime. For containerised apps this extends to the Docker container, that is launched from the app’s released Docker image, running on your chosen version of Docker host/engine. (The focus here is on soak testing the specifics of our Docker container/image running in Docker, rather than the Docker software itself).

Testing the first release of a new service is particularly important, as the reliability of the service is as yet unproven. This is especially true if you’ve built the service using third party software (including versions) that you’ve never previously tested in conjunction with any other service.

In the past, for enterprise apps, it was common to hear the general advice that you should test to ensure that your app is capable of running continuously, and meeting its performance requirements, for a minimum of a working week. The assumption being that if there was a problem that impacted the reliability or performance of your service after it had been running for longer then there would always be some point during the week or at the weekend when the demand for your service would be low enough that you could get away with bouncing one of your nodes. This situation would definitely not be ideal, and also assumes your service isn’t running at full capacity 24 x 7. However, personally I’m fairly pragmatic in the time and effort I spend soak testing a new service, especially given today’s pressure to deliver new features faster. I therefore typically do still plan to run a soak test of a new services for a minimum of 7 days.

2) Architecture of a Soak Test for Services Running on a JVM

The architecture of a soak test is similar to a load test, even though, as described above the two types of tests have different aims. The diagram below provides an overview of the architecture / design of a soak test for a service that runs on the Java Virtual Machine (JVM), in terms of the required nodes, component software / tools and protocols.

2.1) Service Node

A soak test should be targeted at a single instance of your back-end service, fully deployed in a production-like (if not scale) environment. This is shown in the box on the right hand side of the diagram labelled “Service node/instance”.  In this example, the service is deployed as a Docker image running in a Docker container. The service has a number of external web APIs which are exercised by the test. The diagram also shows the JVM’s JMX server exposing a number of management APIs, more about this below.

2.2) Test Client Node

Soak testing a back-end service via its external web APIs, entails running tests that continually make requests to one or more the service’s APIs, simulating an API client.  As shown in the box in the top left of the diagram labelled “soak test client node”, your tests should be deployed to and run from a separate node to that of the service under test, so it can be scaled and loaded independently. (This node is commonly referred to as an ‘injector’ node).

You’ll want to automate your soak tests for repeatability and to support attaining the required load on each service API. To make your life easier, use a tool designed for the purpose. There are many proven, popular load testing tools that are also well suited to soak testing. (Personally, I use JMeter as I’m familiar and productive with it based on a no. of years of experience of using it).

2.3) Resource Monitoring & Metrics Node

A third node is required to support monitoring and collecting data/metrics on the resource (CPU, memory, I/O)  usage of your service on the service node, and how this varies over the duration of the soak test. This is the node labelled “Resource usage & collection node(s)” in the bottom left of the diagram above.

Monitoring of your application’s resource usage and how it varies over time is a major focus of a soak test, in order to detect common causes of instability and performance degradation such as memory leaks. One of the most common examples of memory leaks in Java apps occurs in the JVM’s Metaspace memory space (which replaced Permgen) when classes and classloaders cannot be unloaded or destroyed due to references to resources not having been closed. (Tradtionally this was seen a lot when re-deploying WAR files multiple times to a JVM running a centralised app server. This happens less now that deploying services standalone in their own JVM has become more popular).

When soak testing a service running on a JVM that is running in a Docker container there are at least 3 main sources from where resource usage data can be collected and monitored

Node and process resource usage metrics – Querying and monitoring the usage of resources for the whole node and the application (java) process. This data can be collected using operating system (e.g. Linux) command line tools on the server node.

Container resource usage metrics – The Docker CLI provides a command that allows the resource usage of each container to be monitored. This data also needs to be collected from a shell on the server node.

JVM resource usage metrics – The JVM provides APIs that allow a Java process’ resource usage to be monitored, including Java memory (Metaspace and Heap) space sizes, both current and allocated; and CPU, as well as other relevant metrics such as total classes loaded and unloaded; total threads launched. These APIs are exposed as JMX operations by the JVM’s JMX server. The JVM can be configured (at launch time, using JVM system properties) to allow these APIs to be accessed remotely using a JMX client running on the resource collection node. VisualVM or Oracle’s Mission Control are commonly used as JMX clients. They support graphing of JVM resource usage and also generating and saving snapshots which can be analysed in the future.

Further details of how these measurements are taken are provided in the following section.

3) Outline Soak Testing Process

This section outlines a suggested set of steps for executing a cycle of a soak test that is architected as described above, including details of what measurements to take and how these are obtained (the tools and command used).

3.1) Pre-test Setup & Measurements

Before launching the soak tests on the client/injector node a number of precursory steps should be followed to setup and measure resource usage.

1) Connect JMX client – JVM resource (primarily heap and metaspace memory spaces) usage is one of the set of resource usage data collected through the duration of the test.  This data is collected from the JVM’s JMX server using a JMX client. Before starting the test, ensure you can connect your JMX client to the service’s remote JVM.

2) Record Available Resources Prior to Launching app – Before starting the app / service, take a one-off set of measurements of the available resources (primarily total and available RAM) and how they’re being used by existing processes (those on the docker host). Take the same measurements as recorded during the test as described in the “Ongoing Monitoring & Measurements” section below.

3) Record Available Resources when Application at rest – After starting the app, but before starting the test, repeat recording of the available resources as per the previous step. This will provide a baseline of the (minimum) resources required by the app before it is placed under load by the soak test.

3.2) Ongoing Monitoring & Measurements

Given the soak test of the service is automated, once it is running the testing process is limited to checking that the test is continuing to run without error (API calls made by the test plan continue to return success responses) and collecting resource usage measurements.

I suggest taking measurements at least twice a day, at variable times. The steps are described in more detail below.

1) Confirm that the test is still running without error – Log into the node on which the tests were launched and check they’re still running, e.g. by tailing the test’s log file.

If you implement your tests using JMeter include a ‘Generate Summary Results’ component. This will regularly log the total and increase/delta in the number of requests and % of errors (as well as other summary stats such as total duration of the test, min, max and average response times), by default every 2 minutes, e.g.

2018/06/18 17:49:30 INFO  - jmeter.reporters.Summariser: summary +      7 in 30s = 0.2/s Avg: 547 Min: 4 Max:  1112 Err: 0 (0.00%) Active: 3 Started: 3 Finished: 0
2018/06/18 17:49:30 INFO  - jmeter.reporters.Summariser: summary =  74511 in 357795s = 0.2/s Avg: 491 Min:     3 Max: 10011 Err:     8 (0.01%)

2) Execute Commands on Service node to report current resource usage – Log into the node on which the service is running and report on the current resource usage at both the operating system and (Docker) container level, using the commands below. This monitoring can be automated using a shell script.

2.1) Node’s “available” memory – Use the Linux command “free -a -l -k” to report on the “available” RAM on the node, e.g.

             total       used       free     shared    buffers     cached  available
Mem:       1017092     871844     145248         56     121096     408460     568416
Low:       1017092     871844     145248
High:            0          0          0
-/+ buffers/cache:     342288     674804
Swap:            0          0          0

This along with the total RAM can be used to accurately track how much RAM has been used and is remaining (‘available’) for the application / service. (For more info on how to interpret the output of free command see

2.2) Top RAM usage per process – Use the Linux command “ top” and order the output by resident (non-swap) memory usage (by entering shift-O, followed by ‘n’). This can be used to monitor the RAM usage at a per process level, with the service’s java process likely consuming the most memory (followed by the docker host processes).

2.3) Docker container resource usage – Docker supports reporting stats on the resource usage of each container, including CPU, RAM and I/O, using the command “sudo docker stats –no-stream”, e.g.

CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
838c9ab5be36        0.11%               230.9MiB / 993.3MiB   23.25%              3.57GB / 9.99GB     4.1kB / 0B          0

These stats (especially RAM usage) complement the resource usage reports provided by the aforementioned Linux commands. As well as verifying or providing a comparison of resource usage, they also highlight how much additional memory the container is using, over and above the service’s java process. (If your service container only launches a single java process, the figures should be fairly similar).

3) Snapshot JVM resource usage – Using your JMX client record a snapshot of the resource usage as reported by the app/service’s JVM’s JMX server. VisualVM supports saving snapshot data, which can subsequently be loaded for later, inspection, including graphs / visualisations, e.g.

4) Analysing the Results of a Soak Test

This section contains some guidance on how to analyse the results of a soak test following the completion of each test run / cycle. Generally speaking it entails analysing the data obtained from each of the suggested ongoing measurements  described in the previous section.

4.1) Application Error Rates

When soak testing, the performance (response times and throughput) of service APIs are not the primary concern. These are typically considered during load test. The data of greater interest is the % error rate of the total API requests. Is the error rate non-zero? If so, what’re the cause(s) of the errors? Is the rate of errors increasing the longer the test run? This might indicate a stability issue. On the other hand errors can also be caused by problems with the reliability of upstream remote services on which the service under test depends, which are outside of your control (but something else you might need to cater for before going to production).

If you’re using JMeter to automate your soak test, it supports analysing the sampled data (raw test results) using a no. of reports, the most useful of witch are a combination of the ‘Aggregate Report’ and ‘Summary Report’. The Aggregate Report includes the % error rate, e.g.

# Samples Average (ms) Median (ms) 90% Line (ms) 95% Line (ms) 99% Line (ms) Min (ms) Max (ms) Error % Throughput (per min) KB/sec
My API 27436 938 948 1058 1100 1178 439 10012 0.03 3.3 0.22

4.2) Node and Container Level Resource Usage Metrics

Node RAM Usage

Over the course of the test (according to the Linux free command) how did the available (i.e. free) physical memory (RAM) on the node vary? Variations (increases and decreases) are to be expected as the app/service dynamically consumes & releases memory, and what work the service/app was doing at the precise time the measurement was taken.

Compare the range of observed memory usage to the node’s available RAM after the service was started but before it was put under load (before the test was started). This will give you an idea of the peak memory usage of the app for the given load applied during the soak test.

Per Process RAM Usage

Over the course of the test (according to the Linux top command) how much memory did the java process supporting the service’s JVM consume? Did it stabilise?

Docker Container CPU & RAM Usage

Over the course of the test (according to the docker stats command) what was the lower and upper observed memory and CPU usage for the service’s  Docker container?

4.3) JVM Resource Usage Metrics

JVM Heap

A JVM’s max heap size is fixed on startup, and will therefore remain consistent throughout a soak test. Assuming you didn’t set an explicit (min or) max heap size for the JVM, what was the default max heap size chosen by the JVM based on detected ergonomics (machine class and RAM)?

Over the course of the test, the actual heap usage is expected to increase and decrease as objects are allocated and garbage-collected. However, how did the allocated size of the heap change? Whilst it will have increaed since the application was started, did it stabilise? And what was this size as a % of the max heap size?

JVM Metaspace

From Java 8 onwards, the JVM, uses a memory area known as ‘metaspace’ to store class metadata (class files). Metaspace memory usage is allocated from native memory and (unlike PermGen, which it replaced) is dynamically resized as needed. By default the max size of metaspace is unlimited. Some common causes of memory leaks, such as the failure of a classloader to remove references to unused classes, are identifiable by an ever increasing metaspace size.

Over the course of the test, how did the allocated size of metaspace grow? It is expected to increase as the application’s classes are loaded. But did it plateau, or continue to increase to the end of the test?

Summing the allocated size of both the JVM heap and metaspace gives a good indication of the trend in the memory usage of the application and is also a good indication of its stability (if it plateaus that suggests there isn’t a memory leak).

5) Conclusions from Soak Testing

Ultimately you should be looking to draw some conclusions from the analysis of your soak test results. Some areas to consider are suggested below.

1) Evidence of memory leaks – Did the test identify any evidence of memory leaks in the application over the course of continued execution? Did the JVM’s allocated metaspace and heap memory spaces stabilise and plateau? You need to ensure you run the test long enough to be confident of this.

2) High or increasing CPU usage – Did the test identify any evidence of unexpectedly high or slowly increasing CPU usage by the java process?  Increasing CPU usage can be a sign that the application is being starved of other resources. (A CPU usage over ~40% may be cause for a concern, although that would depend on how much load the application was placed under by the soak test).

3) Assess Production Readiness – Ultimately, has this soak test given you confidence that your new service, and its APIs are production-ready from the perspective of being stable enough to run continuously for an indefinite duration? Was the service able to run continuously for the whole duration of the test, processing requests, without an increasing error rate or degradation in its performance?

4) Minimum Application Resource Usage Needs – While a soak test will not necessarily place the service under maximum load, it will serve to establish its minimum resource requirements, primarily RAM, if you haven’t already determined this from earlier load testing.

When the service’s JVM is started with default JVM memory space settings on the current spec’d node, you should now know the minimum RAM consumed by the java process.

The max allocated JVM metaspace (for storing class metadata) observed during the soak test will be the same, regardless of load. (It will likely only increase as the app is extended in the future). The allocated JVM heap (for storage of objects and application data) however will increase as the service is placed under more load.

6) Possible Actions from Soak Testing

Obviously if your tests have identified a memory leak, or unexpectedly high CPU usage in your service then you’ll need to assess how serious it is. To investigate these problems you need to use a JVM profiling tool, of which there are many available. VisualVM or Oracle Mission Control provide some profiling features, including for example CPU sampling to help narrow down the source of high CPU usage. There are also some more modern profilers available which place less load on your application, and can even be used in production without impacting performance.

Beyond resolving resource leaks or contentions, some other follow-up actions or recommendations which might come out of soak testing include the following.

1) Review node resource capacity – It’s typically best to perform load testing prior to soak testing, but if you didn’t then this might be the first opportunity you’ve had to review the resource capacity of the node on which you app/service is running. Given the load applied during the soak test and the application’s observed peak CPU and memory usage, is the node’s CPU and physical RAM likely to be sufficient? This is something best re-checked during load testing, when the application is placed under the max required load.

2) Setting container resource limits – If you are running multiple containers on a node (e.g. supported by docker compose), having established the amount of RAM consumed by the service’s container, it might be now be worth setting a limit on the amount of RAM it can consume, with a view to isolating containers and improving the stability of the node. Docker supports setting both hard or soft limits on a container’s resource usage.

3) Review reliability of remote service dependencies – In addition to your own service, a long running soak test can also serve to assess the reliability of the app’s dependent remote services, e.g. web services, data-stores, etc. Unreliability of dependent remote services will directly impact the stability and reliability of your own service/app. If the test identifies higher than expected % error rate in requests made to one of more the remote services, you may need to plan additional work to improve the resiliency of your service and its APIs, such as ensuring you have appropriate (network connection and read/socket) timeouts set; and using patterns such as Retry (for handling transient errors); and Circuit Breaker (to handle longer running faults in upstream services).

7) Summing Up

It is essential to soak test a new service and its APIs before relying on it in production.  A soak test establishes whether a new service is capable of running continuously for an indefinite period, when fully exercised, end-to-end, under typical production load, without failing, or without there being any degradation in its performance.

In addition to explaining the aim and focus of soak testing, this blog post has covered –

  • How to architect a soak test of a service running on a JVM, in terms of the nodes, component software and protocols.
  • Outlined a process for soak testing a service running in a JVM,  including details of what measurements to take and the tools and command that can be used to obtain them.
  • Provided guidance on how to analyse the results of a soak test following the completion of each test run / cycle.
  • Suggested  some of the conclusions you should be looking to draw from the analysis of your soak test results. And some follow-up actions or recommendations that commonly come out of soak testing.

As always, I hope you found this article useful.


Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s