Tuesday, February 9, 2010

Parallel Programming in .Net 4.0 and VS2010: Part I – The Parallel Task Library (Parallel.For, Parallel.Foreach() and Invoke())

Copyright 2008-2009, Paul Jackson, all rights reserved

With the availability this week of the Visual Studio 2010 RC, it’s time to update the parallel posts from last year to cover changes in the API – so this is much the same information as presented in the original article, but it’s updated for the Visual Studio 2010 release candidate.

In this article, we’re going to look at the Parallel Task Library and what it makes available for parallelizing for and foreach loops.

Parallel.For()

Starting with a simple, single-threaded application that does some work a few times:

for (int i = 0; i < 100; i++)
{
    var watch = Stopwatch.StartNew();
    doWork();
    watch.Stop();
    Console.WriteLine("{0} took {1} milliseconds", i, watch.ElapsedMilliseconds);
}
 
Console.ReadLine();

And where the doWork() method is simply a really bad way of determining the prime numbers under 30000:

private static void doWork()
 {
     for (int i = 3; i < 30000; i++)
     {
         if (isPrime(i))
             ;
     }
 }
 
 private static bool isPrime(int i)
 {
     for (int j = 2; j <= (i/2); j++)
     {
         if ((i % j) == 0)
             return false;
     }
 
     return true;
 }

The purpose of the doWork() method, of course, is just to give the CPU something to do – which it does, consistently taking about 147 milliseconds for each iteration on a system with a 2.27 GHz quadcore Q9100 processor.

image

Although there’s some activity on all four cores as this runs, the total CPU usage stays between 20% and 25%, never making full use of the processor’s four cores:

image

Although the CPU and operating system will cooperate to send work from multiple applications to the cores in such a way as to make as much use of them as possible, a single-threaded application by itself will never get much above one core’s worth of the CPU’s total power. 

To access that power for our applications, we have to create multiple threads – and .Net 4 makes it very easy to do so. 

New in .Net 4 is the System.Threading.Tasks namespace which contains the Parallel class.  This new class has some static methods which allow us to change the code slightly to parallelize* it:

//for (int i = 0; i < 100; i++)
Parallel.For(0, 100, (i) =>
    {
        var watch = Stopwatch.StartNew();
        doWork();
        watch.Stop();
        Console.WriteLine("{0} took {1} milliseconds", i, watch.ElapsedMilliseconds);
    }
);

Parallel.For has multiple overloads, but at its most basic (above) it takes a from integer (inclusive), a to integer (exclusive) and a delegate (in this case a lambda expression).  The step defaults to 1.

The effect of this small change is dramatic.  The CPU utilization will now spike to 100% when the application is run:

image

And the total runtime for the 100 iterations drops from 14.7 seconds to 4.3 seconds – a huge increase in performance because we’re now making full use of the processor’s four cores.

A look at the Visual Studio debugger’s Thread window (Debug | Windows | Threads) shows what made the difference.  When executing the first version of the code, with the traditional for-loop, only a single thread is executing at any one time:

image

But the change to Parallel.For results in multiple threads doing the work:

image

In this case, five threads are simultaneously executing iterations of the work (the main application thread and four worker threads).  The exact number of threads used for parallelization is determined by .Net based on the number of cores available (there are some advanced options for controlling this, but careful consideration should be given before doing so).

A comparison of the output of both versions does show a couple of important differences that you should consider before parallelizing your code:

image for-loop

image Parallel.For

The first consideration is performance.  Although the total performance improved from 14-seconds to four, the time spent in each iteration has increased and become less predictable.  In a single-threaded, for-loop each iteration takes a consistent 147-milliseconds, but the parallel implementation varies from 147 to over 300-milliseconds.  This is due to overhead associated with the parallelization and the potential for threads to be swapped out by the operating system to provide CPU cycles to other applications.

Another consideration is ordering of the work.  As you can see from the screenshots above, the single-threaded example performs the work sequentially.  Each time through the for loop does its work and completes before the next starts.  But when this is parallelized, the order of completion can’t be guaranteed.  As the work is off-loaded to seven worker threads (in this example), each of those threads could complete its work and get the next task at any time.  So parallelizing isn’t an option (or is a more complex option) when the order of execution matters to your application.

Parallel.ForEach

The Parallel.ForEach() method follows the same pattern:

List<int> list = new List<int>();
// populate list
Parallel.ForEach(list, (item) =>
    {
        // parallel work here
    }
);

Parallel.Invoke

Parallel.For() and Parallel.ForEach() are useful if you have collections you need to act on or a known number of times you have to execute the same code, but what about when you have several different things (methods) you need to do?  For that, we have Parallel.Invoke().

Parallel.Invoke() simply executes, in parallel, a list of methods passed to it:

static void Main(string[] args)
{
    Parallel.Invoke(A, B, C, D, E);
    Console.ReadLine();
}
public static void A(){}
public static void B(){}
public static void C(){}
public static void D(){}
public static void E(){}

Methods A(), B(), C(), D() and E() will be assigned to worker threads as they become available and execute in parallel.  As with the other methods, there are no guarantees about the order of execution.

Conclusion

The Parallel Extensions in .Net 4.0 are going give developers a much easier and consistent way to parallelize their applications.  The library greatly insulates us from the complexity and inner plumbing of the parallel world, but still have responsibilities to use them properly.  Hard questions and issues will remain around the architecture, design and implementation of parallelization in our applications, but Microsoft is aware of this and is busy providing us with more resources than just the tools in this library.  The Parallel Computing Developer Center has a wealth of whitepapers and presentations on these topics.

 

Source code for the Parallel.For example

 

* I’d really like to track down and beat speak strongly to the bastard person who coined the term “parallelize” because every time I talk about this feature I invariably drop a syllable and tell my audience how easy .Net 4 makes it for you to paralyze your code**.

** On the other hand, if you don’t keep the pitfalls of the work of the devil multi-threaded programming in mind, your code will wind up paralyzed, so it’s still an accurate statement.