“”You must demonstrate a support for 1000 concurrent SQLs in the database”
During my years at Oracle and Greenplum, I’ve heard similar statements (anywhere between 100 and 5000) numerous times during a data warehouse POC planning. In every case, what followed was more or less the same. I worked toward understanding what do they mean by that – what are the real requirements – and then tried to adjust the POC metrics to reflect the real customer goals.
Looking back, it seems to be two recurring points of confusion. The first one is regarding which type of concurrency are we talking about,. The second one is regarding how the expected workload translates eventually into database concurrency. In this post, I’ll elaborate on the first point and a follow up will discuss the other point.
The crucial thing to understand is, that at most customers most of the times, when different people talk about concurrency, each do likely mean a different thing. So, what could they mean? here are some options:
- Potential Users – What is the total number of users of the application? For example, there are 200 analysts with access to the BI system, or 50,000 external users who have access to the website.
- Concurrent Users – How many users are connected to the application during an average (or a peak) hour? For example, during a peak hour, 120 analysts are connecting to the system or 5,000 external users are accessing the website.
- Concurrent Active Users – How many users are actively waiting for the database at any given time? For example, out of the 120 analysts, at any given moment maybe 20 are waiting, the rest are analyzing a report output (in the application or externally using excel or word etc), answering phone calls and emails or making coffee.
As I’ll show in the next post, this is a tricky thing to estimate and is actually the wrong question.
- Concurrent Database Sessions – how many open connections from the app servers to the database? This is usually a technical artifact, derived from the connection pooling setup of the application servers. It plays some role in helping to optimize resource usage (re-use connections).
- Concurrent Database SQLs – how many active SQLs does the database sees and handles? There are some complications here. For example, maybe when a user opens a dashboard, the application opens ten database connections and submits ten concurrent SQLs.
Anyway, again, this is tricky to estimated and actually another wrong question to ask.
From my experience, in most cases the initial concurrency requirements are based on Concurrent Users, which for some reason was directly translated along the way to a Concurrent Database SQLs requirement…
Why does it matter? In many cases, there is at least an order of magnitude difference between the initial, quoted concurrency requirement and the actual real requirement. So, during a POC, the vendor and the customer waste a lot of time and effort building and testing a big scenario that will not be relevant in the real world. So, all the decision process is biased based on irrelevant result.
Also, that may lead the customer to buy a bigger system than is actually needed, wasting good money and sometimes having to cut down on other critical aspects of the project (like professional services) due to budget constraints. That’s a shame too.
Conclusion for now – when planning a POC, triple check the customer IT and business side what do they mean by concurrency, and what will likely be the projected workload. If the new system replaces an existing system, help them collect some real metrics from the current system to validate their requirements. Which metrics matters? That’s for the next post.