Throughput, latency, concurrency

The difference between making a good and a bad decisions often comes down to the quality of the pre-defined metrics. If the metric is poor so will be the decision.
When comparing performance between different technologies such as Google Big Query (based on a distributed file system – Colossus to be precise) and MPP technologies such as Redshift, people tend to list concurrency as a limitation of Redshift.
Unfortunately, concurrency is one of those metrics that really suck. In this blog post I will show you why.

Max concurrency

First things first. What do we actually mean when we talk about concurrency? Concurrency is the number of SQL queries that can be run in parallel on a system. When we max out on resources (CPU, memory, and other storage) on a database or some other processing frameworks such as Hadoop, Spark etc. etc. we have reached maximum concurrency.

Resource manager

What happens when we max out on resources depends on the resource manager. In a database, the resource manager allocates CPU, memory, and in some systems also disk to requests that users run against the database, e.g. SQL queries.
If the resource manager has been configured to process queries on a first come first serve basis (FIFO) then queries are serialized. In other words, they have to wait until the next slot becomes available. It’s similar to when you are boarding a plane or checking out at the supermarket. If all tills are serving a customer you have to wait in the queue.
Note: not all databases provide a FIFO resource manager.
If the resource manager is set up to use a fair scheduler, then all of the various queries that have been launched get allocated a fair share of resources. Sounds great. What’s not to be liked about fair?
Well, in this particular case it will lead to an overall degradation of system performance. To serve all queries at the same time, the CPU cores need to switch back and forth between queries. This generates a huge overhead. It also leads to eviction of data from the CPU level caches and in some cases DRAM and that is when things get really slow as we have to endure multiple passes over the same data.
While we have achieved higher concurrency (more queries are served at the same time), the latency of our individual queries increases (in other words they run slower) and overall system throughput has decreased (in other words we can satisfy less queries per time unit). In the worst case we have gone from a state where some users are unhappy because of long wait times to a state where everyone is unhappy.

Throughput, latency, concurrency

So how can we increase concurrency on a database? It’s quite simple really. We just need to decrease the latency of our queries (make them run faster). How do we do that?  There are various ways.
Write better queries, e.g. by using window functions and looking at explain plans
Model your data for performance, e.g. by de-normalizing data, using columnar storage, storage indexes, using good distribution and sort keys, understanding usage patterns etc.
Throw money at the problem by upgrading from an SMP to an MPP database or by adding nodes to your existing MPP database (Redshift, Teradata etc.). The latency of most queries is cut in half by doubling the number of nodes in a cluster.

How are latency and concurrency related?

This blog post here has a great example. Let me go through it here:

  • 1800 similar SQL queries are running per hour on the database.
  • On average that gives us a new query every two seconds.
  • Each query takes 60 seconds to run on average.
  • On average we are running 30 concurrent queries.
  • The database we are using supports 15 queries to run concurrently.
  • We need to queue 15 queries.

Let’s see what happens to our level of concurrency once we tune our SQL queries.

  • We still have 1800 similar SQL queries running per hour on the database.
  • Now our queries execute in 10 seconds.
  • On average we now just have five concurrent queries.
  • No queries get serialised and we even have ten more available slots.


Concurrency is a derived metric from latency. You increase concurrency by decreasing latency.
For Redshift that means writing better queries, adding more nodes, replacing your nodes with dense compute nodes, and using workload management to create queues to separate interactive from batch queries.
While systems that are built on distributed file systems such as Hadoop and Google Big Query can achieve higher levels of concurrency as they don’t evenly split the data across the nodes in the cluster, they tend to show higher latency for a certain class of queries, e.g. joins of very large tables.
When comparing performance always look at latency and throughput to compare performance.