[quartznet:4127] Quartz.net 2.5.1 Threadpool Scaling

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[quartznet:4127] Quartz.net 2.5.1 Threadpool Scaling

Philip Shaffer
Quartz.net 2.5.1
I am currently working with Quartz.net and have set up a simulation representing a sample workload of scheduled jobs.
The scenario is 500 jobs scheduled to be each run once with: request recovery, store durably, misfire handling instruction now with existing count.
Using the AdoJobStore running on SQL Server 2016.
Job runtime is 5 to 60 seconds (randomly selected), with 1 in 25 chance of throwing exception immediately (randomly selected).
Test runtime is 5 minutes from Start() of scheduler to Shutdown(true), plus time to allow running jobs to complete.

Starting with a threadpool size of 5 working threads, I was averaging 75 to 90 jobs launched and executed in within the timeframe.
Moving up to 10 working threads, throughput jumped to around 120.
Moving up to 20 working threads, throughput only jumped up to around 150.

What appears to be happening is the scheduling thread appears to only run once an existing job has completed, and there is only one scheduling thread running.  The short answer is that the thread pool worker threads appear to spend more time idle as the thread pool count is raised.

And before you jump all over me about the duration of the jobs I'm executing on the thread pool, I know the best practices page specifies that they are supposed to be kept short.  But what I'm modeling is the specified architecture we are building, not necessarily the ideal.

I have found other entries on this forum where members of talked of running "up to 1000 jobs per day on a single server".  Also of running thousands of jobs spread across 20 servers.  Frankly I find that more than a little disappointing and am hoping others who haven't posted are running far more jobs on a single server than I've heard about.

What recommendations would the group have for increasing my per-server throughput, with the caveat that my Job execution sample profile not be changed?
Attached is the LINQPad file containing my simulation, for those who are interested.  Also attached is the output from a sample run (with 20 threads).

Thank you all ahead of time for considering my question.
-Phil

--
You received this message because you are subscribed to the Google Groups "Quartz.NET" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/quartznet.
For more options, visit https://groups.google.com/d/optout.

Quartz Template.linq (11K) Download Attachment
Quartz Template, Run Log 05, with 20 threads.html (691K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

[quartznet:4130] Re: Quartz.net 2.5.1 Threadpool Scaling

Philip Shaffer
The "quartz.jobStore.misfireThreshold" (default 60 seconds) seems to be what controls the assignment of scheduled jobs to unused threads.
In my previous post, I mentioned that many of the threads appeared to remain idle for a large chunk of their time.
With the scenario of 20 threads in the thread pool, and a 60 second misfire threshold, I had been seeing 130-150 jobs getting executed in 5 minutes.
Moving the threshold to 30 seconds increased the throughput to around 200 jobs executed.
Moving the threshold to 20 seconds only increased the throughput to around 220 jobs executed.
So it would appear that a point of diminishing returns occurs in the 20 to 30 second misfire threshold range.  It is unclear to me if there are any undesirable side-effects to shortening the misfire threshold?

Still hoping to hear suggestions for how to improve job throughput.

On Tuesday, June 20, 2017 at 12:51:48 AM UTC-4, Philip Shaffer wrote:
Quartz.net 2.5.1
I am currently working with Quartz.net and have set up a simulation representing a sample workload of scheduled jobs.
The scenario is 500 jobs scheduled to be each run once with: request recovery, store durably, misfire handling instruction now with existing count.
Using the AdoJobStore running on SQL Server 2016.
Job runtime is 5 to 60 seconds (randomly selected), with 1 in 25 chance of throwing exception immediately (randomly selected).
Test runtime is 5 minutes from Start() of scheduler to Shutdown(true), plus time to allow running jobs to complete.

Starting with a threadpool size of 5 working threads, I was averaging 75 to 90 jobs launched and executed in within the timeframe.
Moving up to 10 working threads, throughput jumped to around 120.
Moving up to 20 working threads, throughput only jumped up to around 150.

What appears to be happening is the scheduling thread appears to only run once an existing job has completed, and there is only one scheduling thread running.  The short answer is that the thread pool worker threads appear to spend more time idle as the thread pool count is raised.

And before you jump all over me about the duration of the jobs I'm executing on the thread pool, I know the best practices page specifies that they are supposed to be kept short.  But what I'm modeling is the specified architecture we are building, not necessarily the ideal.

I have found other entries on this forum where members of talked of running "up to 1000 jobs per day on a single server".  Also of running thousands of jobs spread across 20 servers.  Frankly I find that more than a little disappointing and am hoping others who haven't posted are running far more jobs on a single server than I've heard about.

What recommendations would the group have for increasing my per-server throughput, with the caveat that my Job execution sample profile not be changed?
Attached is the LINQPad file containing my simulation, for those who are interested.  Also attached is the output from a sample run (with 20 threads).

Thank you all ahead of time for considering my question.
-Phil

--
You received this message because you are subscribed to the Google Groups "Quartz.NET" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/quartznet.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

[quartznet:4158] Re: Quartz.net 2.5.1 Threadpool Scaling

Marko Lahma
In reply to this post by Philip Shaffer

Just closing the loop here, 2.6.1 fixed misfire handling related issues and introduced new configuration property MisfireHandlerFrequency to control retrieval internal in job store:

https://github.com/quartznet/quartznet/milestone/25?closed=1


-Marko

On Tuesday, June 20, 2017 at 7:51:48 AM UTC+3, Philip Shaffer wrote:
Quartz.net 2.5.1
I am currently working with Quartz.net and have set up a simulation representing a sample workload of scheduled jobs.
The scenario is 500 jobs scheduled to be each run once with: request recovery, store durably, misfire handling instruction now with existing count.
Using the AdoJobStore running on SQL Server 2016.
Job runtime is 5 to 60 seconds (randomly selected), with 1 in 25 chance of throwing exception immediately (randomly selected).
Test runtime is 5 minutes from Start() of scheduler to Shutdown(true), plus time to allow running jobs to complete.

Starting with a threadpool size of 5 working threads, I was averaging 75 to 90 jobs launched and executed in within the timeframe.
Moving up to 10 working threads, throughput jumped to around 120.
Moving up to 20 working threads, throughput only jumped up to around 150.

What appears to be happening is the scheduling thread appears to only run once an existing job has completed, and there is only one scheduling thread running.  The short answer is that the thread pool worker threads appear to spend more time idle as the thread pool count is raised.

And before you jump all over me about the duration of the jobs I'm executing on the thread pool, I know the best practices page specifies that they are supposed to be kept short.  But what I'm modeling is the specified architecture we are building, not necessarily the ideal.

I have found other entries on this forum where members of talked of running "up to 1000 jobs per day on a single server".  Also of running thousands of jobs spread across 20 servers.  Frankly I find that more than a little disappointing and am hoping others who haven't posted are running far more jobs on a single server than I've heard about.

What recommendations would the group have for increasing my per-server throughput, with the caveat that my Job execution sample profile not be changed?
Attached is the LINQPad file containing my simulation, for those who are interested.  Also attached is the output from a sample run (with 20 threads).

Thank you all ahead of time for considering my question.
-Phil

--
You received this message because you are subscribed to the Google Groups "Quartz.NET" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/group/quartznet.
For more options, visit https://groups.google.com/d/optout.