Thursday, June 3, 2010

Parallel Programming in .Net 4.0 and VS2010: Part II – WinForms, Tasks and Service Level Agreements

Copyright 2008-2009, Paul Jackson, all rights reserved

Note: This article was originally published with code samples from the Visual Studio 2010 Beta.  This version has been updated for the release version.  This article, especially, has changed due to the changes in the Task cancellation mechanism.

The Console application in my previous post was actually the prototype for a more robust WinForms implementation – I have a customer who likes prime numbers.  We’ll call him Bob.

image

Sure, Bob’s a little weird, but he typically pays his invoices Net-10, so I like to keep him happy.

I first deployed the application to Bob without using the Parallel Task Library, so it was single-threaded:

private void goButton_Click(object sender, EventArgs e)
  {
      Stopwatch watch = Stopwatch.StartNew();
      for (int i = 0; i < 100; i++)
      {
          doWork(i);
      }
      watch.Stop();
      listBox1.Items.Add(String.Format("Entire process took {0} milliseconds", watch.ElapsedMilliseconds));
  }
  
  private void doWork(int instance)
  {
      Stopwatch watch = Stopwatch.StartNew();
      for (int i = 3; i < 30000; i++) 
      {
          if (isPrime(i));     
      } 
      watch.Stop();
      listBox1.Items.Add(String.Format("{0} took {1} milliseconds", instance, watch.ElapsedMilliseconds));
  }
  
  
  private static bool isPrime(int i)
  {
      for (int j = 2; j <= (i / 2); j++) 
      { 
          if ((i % j) == 0)             
              return false; 
      } 
      return true; 
  }

Bob was happy with the results, but not with the performance:

“Sixteen seconds is too long!  I can’t wait that long!  Time is money in my business!  It needs to be instantaneous!”

I tried to explain to Bob that “instantaneous” is not a Service-Level Agreement, but he was adamant: “Faster!”

And, no, I have no idea what Bob’s business is or why he needs this application.  He pays on time – I don’t ask a lot of questions.

So my first change is to make the for-loop parallel:

Parallel.For(0, 100, i =>
 {
     doWork(i);
 });

Unfortunately, changing an application from single- to multi-threaded can have unintended consequences.  When writing a single-threaded WinForms application you don’t have to worry about things like WinForms Controls only being accessible from the UI thread:

image

The doWork() method accesses the ListBox directly to add items to it, but since doWork() is now being run on a different thread, it violates a fundamental Windows requirement that UI controls only be accessed from the thread they were created on.

This problem isn’t new with .Net 4.0, it’s always been there and remains even in WPF.  UI Controls can only be accessed from the thread that created them and that should be the main application thread.  So there are some hoops we have to jump through in order to get back to the UI thread in order to update the control.  There are a number of articles available that describe patterns for dealing with this – the one I typically use (now) is:

private void updateList(string item)
{
    if (listBox1.InvokeRequired)
        listBox1.BeginInvoke((Action)delegate() { listBox1.Items.Add(item); });
    else
        listBox1.Items.Add(item);
}

Then I change the doWork() method to add items to the list through the addToList() method instead of directly:

private void doWork(int instance)
 {
     Stopwatch watch = Stopwatch.StartNew();
     for (int i = 3; i < 30000; i++) 
     {
         if (isPrime(i));     
     } 
     watch.Stop();
     updateList(String.Format("{0} took {1} milliseconds", instance, watch.ElapsedMilliseconds));
 
 }

And, for consistency, I do the same to the goButton_Click where the total elapsed time is recorded.

Running the application now results in the same performance improvement seen in the console application:

image

I take this new version to Bob, pretty confident that dropping the processing time from almost 16 seconds to 4 will make him happy.  The only problem is that I’m not sure how to bill him for it – after all, using the Parallel Extensions I was able to get this speed improvement with only a few minutes of work -- .Net 4.0 might improve my productivity, but it could have a negative effect on my Accounts Receivable.

Unfortunately, Bob’s not as impressed as I thought he’d be:

“4 seconds?  I still have to wait 4 seconds?  Faster! Faster! Faster!  I need instantaneous results!  I need to start working with the list as soon as I click the Go button!”

Bob’s a little high-strung.

But something he said sparks an idea.  Bob needs to work with the list immediately, but not with the entire list.  His process is to do something with each item in the list (no, I still don’t know what he does with them), so he doesn’t need everything, he just needs enough to start working.  While he’s working on the early results, later results can be completed and returned.

The way we’d do that today is to start our own background thread using something like ThreadPool.QueueUserWorkItem().  I can move the current code from button click event to another method, then use QueueUserWorkItem to run that code on a background thread:

private void goButton_Click(object sender, EventArgs e)
{
    ThreadPool.QueueUserWorkItem(new WaitCallback(freeUi));
}
 
private void freeUi(object sync)
{
    Stopwatch watch = Stopwatch.StartNew();
    //for (int i = 0; i < 100; i++)
    Parallel.For(0, 100, i =>
    {
        doWork(i);
    });
    watch.Stop();
    updateList(String.Format("Entire process took {0} milliseconds", watch.ElapsedMilliseconds));
}

This works great.  The list starts populating immediately and Bob will be able to get started working on the first items in the list while the remaining work finishes.  But QueueUserWorkItem is a little passé – it’s very … .Net 2.0 – what does 4.0 offer us to replace it with?

Enter the System.Threading.Tasks namespace and the Task class.  Using the StartNew() method on Task.Factory, we can create our background thread using the new, .Net 4.0 Task model, rather than via the old ThreadPool.  New is better.  Mostly.  Except New Coke – New Coke was bad.  Very, very bad.

   1: private void goButton_Click(object sender, EventArgs e)
   2:  {
   3:      Task.Factory.StartNew(() => { freeUi(null); });
   4:  }

Bob is thrilled with this new version.

“I’m thrilled!  This is great!  It’s instantaneous! By the way … how do I make it stop?”

So Bob wants a way to stop the population of the list once he’s started it.  We can give him that with the CancellationTokenSource and CancellationToken.  First we need a token to use for cancelation:

   1: private CancellationTokenSource tokenSource;
   2: private CancellationToken token;
   3:  
   4: public Form1()
   5: {
   6:     InitializeComponent();
   7:  
   8:     tokenSource = new CancellationTokenSource();
   9:     token = tokenSource.Token;
  10: }

Now this token must be passed to the Task:

   1: private void goButton_Click(object sender, EventArgs e)
   2: {
   3:     var task = Task.Factory.StartNew(() => { freeUi(null); }, token);
   4: }

And, via a ParallelOptions object, to the Parallel.For loop:

   1: var options = new ParallelOptions();
   2: options.CancellationToken = token;
   3:  
   4: Parallel.For(0, 100, options, i =>
   5: {
   6:     doWork(i);
   7: });

And, finally, having given Bob a Cancel button, we would call Cancel() on the CancellationTokenSource when he clicks it:

   1: private void cancelButton_Click(object sender, EventArgs e)
   2: {
   3:     tokenSource.Cancel();
   4: }

Now that we have a token in all of our Tasks and threads that will tell us when Bob clicks Cancel, we need to gracefully shutdown all of those background threads and what’s spawning them.

For the Parallel.For loop, we need to get the ParallelOptions down into the doWork method itself, because we want to be able to interrupt the prime number loop itself:

Parallel.For(0, 100, options, (i) =>
{
    doWork(i, options);
});

Once there, we can use the CancellationToken property to break out of the process when it’s cancelled:

private void doWork(int instance, ParallelOptions options)
 {
     Stopwatch watch = Stopwatch.StartNew();
     for (int i = 3; i < 30000; i++) 
     {
         if (isPrime(i));
         options.CancellationToken.ThrowIfCancellationRequested();
     } 
     watch.Stop();
     updateList(String.Format("{0} took {1} milliseconds", instance, watch.ElapsedMilliseconds));
 
 }

This method throws an exception, so you should catch that and use it to clean up after cancelled tasks.

try
 {
     Parallel.For(0, 100, options, (i) =>
      {
          try
          {
              doWork(i, options);
          }
          catch (OperationCanceledException ex)
          {
              // clean up iteration
          }
      });
 }
 catch (OperationCanceledException ex)
 {
     // clean up loop
 }

Bob’s happy with this latest version.  It satisfies both reasons to parallelize or multi-thread an application: Performance and User Experience.  Bob’s experience is improved because he’s able to continue work immediately after clicking on the Go button and doesn’t have to wait at all, and the entire process’ execution time has improved from over fifteen seconds to under four (on a quadcore PC).  Even I’m happy, because I can now send Bob an invoice and get paid.

 

Download Project Source Code

Tuesday, February 9, 2010

Parallel Programming in .Net 4.0 and VS2010: Part I – The Parallel Task Library (Parallel.For, Parallel.Foreach() and Invoke())

Copyright 2008-2009, Paul Jackson, all rights reserved

With the availability this week of the Visual Studio 2010 RC, it’s time to update the parallel posts from last year to cover changes in the API – so this is much the same information as presented in the original article, but it’s updated for the Visual Studio 2010 release candidate.

In this article, we’re going to look at the Parallel Task Library and what it makes available for parallelizing for and foreach loops.

Parallel.For()

Starting with a simple, single-threaded application that does some work a few times:

for (int i = 0; i < 100; i++)
{
    var watch = Stopwatch.StartNew();
    doWork();
    watch.Stop();
    Console.WriteLine("{0} took {1} milliseconds", i, watch.ElapsedMilliseconds);
}
 
Console.ReadLine();

And where the doWork() method is simply a really bad way of determining the prime numbers under 30000:

private static void doWork()
 {
     for (int i = 3; i < 30000; i++)
     {
         if (isPrime(i))
             ;
     }
 }
 
 private static bool isPrime(int i)
 {
     for (int j = 2; j <= (i/2); j++)
     {
         if ((i % j) == 0)
             return false;
     }
 
     return true;
 }

The purpose of the doWork() method, of course, is just to give the CPU something to do – which it does, consistently taking about 147 milliseconds for each iteration on a system with a 2.27 GHz quadcore Q9100 processor.

image

Although there’s some activity on all four cores as this runs, the total CPU usage stays between 20% and 25%, never making full use of the processor’s four cores:

image

Although the CPU and operating system will cooperate to send work from multiple applications to the cores in such a way as to make as much use of them as possible, a single-threaded application by itself will never get much above one core’s worth of the CPU’s total power. 

To access that power for our applications, we have to create multiple threads – and .Net 4 makes it very easy to do so. 

New in .Net 4 is the System.Threading.Tasks namespace which contains the Parallel class.  This new class has some static methods which allow us to change the code slightly to parallelize* it:

//for (int i = 0; i < 100; i++)
Parallel.For(0, 100, (i) =>
    {
        var watch = Stopwatch.StartNew();
        doWork();
        watch.Stop();
        Console.WriteLine("{0} took {1} milliseconds", i, watch.ElapsedMilliseconds);
    }
);

Parallel.For has multiple overloads, but at its most basic (above) it takes a from integer (inclusive), a to integer (exclusive) and a delegate (in this case a lambda expression).  The step defaults to 1.

The effect of this small change is dramatic.  The CPU utilization will now spike to 100% when the application is run:

image

And the total runtime for the 100 iterations drops from 14.7 seconds to 4.3 seconds – a huge increase in performance because we’re now making full use of the processor’s four cores.

A look at the Visual Studio debugger’s Thread window (Debug | Windows | Threads) shows what made the difference.  When executing the first version of the code, with the traditional for-loop, only a single thread is executing at any one time:

image

But the change to Parallel.For results in multiple threads doing the work:

image

In this case, five threads are simultaneously executing iterations of the work (the main application thread and four worker threads).  The exact number of threads used for parallelization is determined by .Net based on the number of cores available (there are some advanced options for controlling this, but careful consideration should be given before doing so).

A comparison of the output of both versions does show a couple of important differences that you should consider before parallelizing your code:

image for-loop

image Parallel.For

The first consideration is performance.  Although the total performance improved from 14-seconds to four, the time spent in each iteration has increased and become less predictable.  In a single-threaded, for-loop each iteration takes a consistent 147-milliseconds, but the parallel implementation varies from 147 to over 300-milliseconds.  This is due to overhead associated with the parallelization and the potential for threads to be swapped out by the operating system to provide CPU cycles to other applications.

Another consideration is ordering of the work.  As you can see from the screenshots above, the single-threaded example performs the work sequentially.  Each time through the for loop does its work and completes before the next starts.  But when this is parallelized, the order of completion can’t be guaranteed.  As the work is off-loaded to seven worker threads (in this example), each of those threads could complete its work and get the next task at any time.  So parallelizing isn’t an option (or is a more complex option) when the order of execution matters to your application.

Parallel.ForEach

The Parallel.ForEach() method follows the same pattern:

List<int> list = new List<int>();
// populate list
Parallel.ForEach(list, (item) =>
    {
        // parallel work here
    }
);

Parallel.Invoke

Parallel.For() and Parallel.ForEach() are useful if you have collections you need to act on or a known number of times you have to execute the same code, but what about when you have several different things (methods) you need to do?  For that, we have Parallel.Invoke().

Parallel.Invoke() simply executes, in parallel, a list of methods passed to it:

static void Main(string[] args)
{
    Parallel.Invoke(A, B, C, D, E);
    Console.ReadLine();
}
public static void A(){}
public static void B(){}
public static void C(){}
public static void D(){}
public static void E(){}

Methods A(), B(), C(), D() and E() will be assigned to worker threads as they become available and execute in parallel.  As with the other methods, there are no guarantees about the order of execution.

Conclusion

The Parallel Extensions in .Net 4.0 are going give developers a much easier and consistent way to parallelize their applications.  The library greatly insulates us from the complexity and inner plumbing of the parallel world, but still have responsibilities to use them properly.  Hard questions and issues will remain around the architecture, design and implementation of parallelization in our applications, but Microsoft is aware of this and is busy providing us with more resources than just the tools in this library.  The Parallel Computing Developer Center has a wealth of whitepapers and presentations on these topics.

 

Source code for the Parallel.For example

 

* I’d really like to track down and beat speak strongly to the bastard person who coined the term “parallelize” because every time I talk about this feature I invariably drop a syllable and tell my audience how easy .Net 4 makes it for you to paralyze your code**.

** On the other hand, if you don’t keep the pitfalls of the work of the devil multi-threaded programming in mind, your code will wind up paralyzed, so it’s still an accurate statement.