Monday, March 2, 2009

TaskManager – The Range Rover of the .Net 4 Parallel Extensions

Copyright 2008-2009, Paul Jackson, all rights reserved

A few years ago a friend of mine bought himself a Range Rover – not the pansy-Discovery from Land Rover, but the all-out, pick a fight with a rhinoceros and win, Range Rover.  When he showed it off to me, he also showed me the Range Rover Driver’s Manual.  Now, there’s not anything special about most Driver’s Manuals for cars, but this one had an interesting section – a section that should have been titled:

“Things You Probably Shouldn’t Do, But If You Must, Here’s How”

Instructions on how to drive your shiny, new Range Rover through a boulder-strewn gully, across 60-degree slopes and stuff like that.  You know, the kind of things that wind up on YouTube with viewers shaking their heads and muttering, “stupid people doing stupid things”.

And the really bad part is that knowing those things are possible and having the instructions right there makes it very tempting to search out a boulder-strewn gully.

Anyway, Task Manager in the .Net 4 Parallel Extensions should have a documentation section like that: the things you probably shouldn’t do, but here’s how.  After the perils of parallel programming in general, this is probably the functionality that’s going to get more people into more trouble than any other, simply from the temptation of knowing it’s there.

TaskManager is the class that creates and manages the threads used by the Parallel Extension classes.  I believe I saw in a blog post somewhere that it’s going to be renamed to TaskScheduler before the .Net 4 release.  Its interface is extremely simple:

   1: public class TaskManager : IDisposable
   2: {
   3:     public TaskManager();
   4:     public TaskManager(TaskManagerPolicy policy);
   6:     public static TaskManager Current { get; }
   7:     public static TaskManager Default { get; }
   8:     public TaskManagerPolicy Policy { get; }
  10:     public void Dispose();
  11:     protected virtual void Dispose(bool disposing);
  12: }

The only things of real interest are the three properties:  Current, Default and Policy.

Current and Default get the TaskManager that created the current thread and the default TaskManager for the application respectively.  Policy gets the TaskManagerPolicy that the TaskManager was created with.  So, basically, you can create a TaskManager and get some values from it, so what can we actually do with one?

A TaskManager gets passed into methods like Parallel.For to control the number and priority of threads used to process the parallel tasks (Parallel.For is examined in detail here):

   1: public static ParallelLoopResult For<TLocal>(
   2:     int fromInclusive, 
   3:     int toExclusive, 
   4:     int step, 
   5:     Func2<TLocal> threadLocalInit, 
   6:     Action2<int, ParallelState<TLocal>> body, 
   7:     Action<TLocal> threadLocalFinally, 
   8:     TaskManager manager, 
   9:     TaskCreationOptions options);

The TaskManagerPolicy that’s passed to the TaskManager constructor defines the options available:

   1: public class TaskManagerPolicy
   2: {
   3:     public TaskManagerPolicy();
   4:     public TaskManagerPolicy(int maxStackSize);
   5:     public TaskManagerPolicy(int minProcessors, int idealProcessors);
   6:     public TaskManagerPolicy(int minProcessors, int idealProcessors, int idealThreadsPerProcessor);
   7:     public TaskManagerPolicy(int minProcessors, int idealProcessors, ThreadPriority threadPriority);
   8:     public TaskManagerPolicy(int minProcessors, int idealProcessors, int idealThreadsPerProcessor, int maxStackSize, ThreadPriority threadPriority);
  10:     public int IdealProcessors { get; }
  11:     public int IdealThreadsPerProcessor { get; }
  12:     public int MaxStackSize { get; }
  13:     public int MinProcessors { get; }
  14:     public ThreadPriority ThreadPriority { get; }
  15: }

IdealThreadsPerProcessor is probably the one that’s going to tempt people the most, so let’s see how much trouble we can get into with that one.

Using the same code from my previous post demonstrating Parallel.For and the deep-dive, I’ve added code to create a new TaskManager using the default values and pass it in to the Parallel.For loop: 

   1: class Program
   2: {
   3:     static void Main(string[] args)
   4:     {
   5:         Stopwatch watch = new Stopwatch();
   6:         watch.Start();
   8:         var mgr = new TaskManager(
   9:             new TaskManagerPolicy(
  10:                 TaskManager.Default.Policy.MinProcessors,
  11:                 TaskManager.Default.Policy.IdealProcessors,
  12:                 TaskManager.Default.Policy.IdealThreadsPerProcessor,
  13:                 TaskManager.Default.Policy.MaxStackSize,
  14:                 TaskManager.Default.Policy.ThreadPriority
  15:                 )
  16:             );
  18:         System.Threading.Parallel.For(0, 100, 1,
  19:             () => { return 1; },
  20:             (i, loopstate) =>
  21:             {
  22:                 doWork(i);
  23:             },
  24:             (threadstate) => { },
  25:             mgr,
  26:             TaskCreationOptions.None
  27:         );
  29:         watch.Stop();
  30:         Console.WriteLine(String.Format("Entire process took {0} milliseconds", watch.ElapsedMilliseconds));
  31:         Console.ReadLine();
  33:     }
  35:     private static void doWork(int instance)
  36:     {
  37:         Stopwatch watch = new Stopwatch();
  38:         watch.Start();
  39:         double result = Math.Acos(new Random().NextDouble()) * Math.Atan2(new Random().NextDouble(), new Random().NextDouble());
  40:         for (int i = 0; i < 20000; i++)
  41:         {
  42:             result += (Math.Cos(new Random().NextDouble()) * Math.Acos(new Random().NextDouble()));
  43:         }
  44:         watch.Stop();
  45:         Console.WriteLine(String.Format("{0} took {1} milliseconds", instance, watch.ElapsedMilliseconds));
  46:     }


As with the original example, this takes between four- and five-seconds to complete; but working on the theory that more is better, what happens if we increase the number of threads per core?  Since we’re on a four-core system, let’s try 25 threads per core – that will create 100 threads, one for each iteration of the loop:

   1: var mgr = new TaskManager(
   2:     new TaskManagerPolicy(
   3:         TaskManager.Default.Policy.MinProcessors,
   4:         TaskManager.Default.Policy.IdealProcessors,
   5:         25, //TaskManager.Default.Policy.IdealThreadsPerProcessor,
   6:         TaskManager.Default.Policy.MaxStackSize,
   7:         TaskManager.Default.Policy.ThreadPriority
   8:         )
   9:     );


This had a slightly negative effect on the overall performance.  Why?  Well, because even if we have 100-threads running, they still have to share time on four cores, so there’s going to be contention as the system tries to fairly allocate time on each core to twenty-five threads. 

The next thing someone might want to change is the ThreadPriority.  For this test, I started the Parallel Extensions Ray Tracer example in the background.  This application uses some significant processing resources, so all cores are close to 100% utilization while it’s running and there’s a significant performance degradation in the test application:

image image

By changing the ThreadPriority, we can effect the performance either positively or negatively:

   1: var mgr = new TaskManager(
   2:     new TaskManagerPolicy(
   3:         TaskManager.Default.Policy.MinProcessors,
   4:         TaskManager.Default.Policy.IdealProcessors,
   5:         TaskManager.Default.Policy.IdealThreadsPerProcessor,
   6:         TaskManager.Default.Policy.MaxStackSize,
   7:         ThreadPriority.Highest // TaskManager.Default.Policy.ThreadPriority
   8:         )
   9:     );

image ThreadPriority.Highest

image ThreadPriority.Lowest

But while setting the Highest thread priority brought some performance back to our application, it had a very negative impact on the RayTracer running in the background, dropping the framerate from 5.2 to 2.8 frames-per-second:

imageDoing this without the user’s permission is rather bad form, since it steals CPU cycles from other applications that the user may be actively working in or relying on. 


TaskManager (or TaskScheduler, if it’s renamed to that) exists, but its power shouldn’t be used arbitrarily, because the effects can be undesirable.  There are times when you might have perfectly legitimate needs to modify these defaults, but if you’re going to take your application on a ride across that 60-degree slope, make sure you know what you’re doing and test thoroughly so you don’t wind up on YouTube with people laughing at you.

kick it on DotNetKicks.comShout it

1 comment:

Bart Czernicki said...

Nice article. This essentially shows you why its called "Parallel extensions" as this gives you the power of true parallism: threads on cores, core affinity, thread priorities etc. In multi-threading these topics are largely ignored/abstracted. I think this and PLINQ are going to be great additions for scalaility/performance on large systems.