Saturday, January 31, 2009

Multi-threading in the Visual Studio 2010 CTP

Copyright 2008-2009, Paul Jackson, all rights reserved

It’s very cool that the Parallel Extensions are going to be built into .Net 4.0, but it does present a problem for those of us who want to work with the new classes now. 

The last independent release of the CTP was in June and that release works with Visual Studio 2008, but the latest we have access to is the September CTP built into .Net 4.0 and the Visual Studio 2010 CTP, so we can’t use that with VS2008. 

But the VS2010 CTP is distributed as a Virtual PC image and Virtual PC only supports a single core.  If we want to work with the new parallel classes and really see what they can do, we need multiple cores.

So we’re stuck with a dilemma – live with the old bits from June or use the new bits with the limit of a single core.

Unless you have access to Windows Server 2008 and Hyper-V.  With that, you can move the VS2010 VPC to Hyper-V and give it up to four cores, making your parallel work in .Net 4.0 much more relevant.

If you have the right level MSDN subscription you might have access to Server 2008 or you can download a trial version good for 60-days (extendable to 240).

It’s a hoop or two to jump through, but does let you work with the latest version in a better environment.

Friday, January 30, 2009

Parallel Programming in .Net 4.0 and VS2010: Part II – WinForms, Tasks and Service Level Agreements

Copyright 2008-2009, Paul Jackson, all rights reserved

Update 6/3/2010 – This original article was written using the Visual Studio 2010 CTP.  I’ve since updated the information in a new article which takes into account changes in the API as of the release of the VS2010 release.

The Console application in my previous post was actually the prototype for a more robust WinForms implementation – I have a customer who likes Math functions.  We’ll call him Bob.

image

Sure, Bob’s a little weird, but he typically pays his invoices Net-10, so I like to keep him happy.

I first deployed the application to Bob without using the Parallel Extensions, so it was single-threaded:

        private void goButton_Click(object sender, EventArgs e)
        {
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < 100; i++)
            {
                doWork(i);
            }
            watch.Stop();
            listBox1.Items.Add(String.Format("Entire process took {0} milliseconds", watch.ElapsedMilliseconds));
        }
        private void doWork(int instance)
        {
            Stopwatch watch = Stopwatch.StartNew();
            double result = Math.Acos(new Random().NextDouble()) * Math.Atan2(new Random().NextDouble(), new Random().NextDouble());
            for (int i = 0; i < 20000; i++)
            {
                result += (Math.Cos(new Random().NextDouble()) * Math.Acos(new Random().NextDouble()));
            }
            
            watch.Stop();
            listBox1.Items.Add(String.Format("{0} took {1} milliseconds", instance, watch.ElapsedMilliseconds));
        }

Bob was happy with the results, but not with the performance:

“Twelve seconds is too long!  I can’t wait that long!  Time is money in my business!  It needs to be instantaneous!”

I tried to explain to Bob that “instantaneous” is not a Service-Level Agreement, but he was adamant: “Faster!”

And, no, I have no idea what Bob’s business is or why he needs this application.  He pays on time – I don’t ask a lot of questions.

So my first change is to make the for-loop parallel:

            //for (int i = 0; i < 100; i++)
            Parallel.For(0, 100, i =>
                {
                    doWork(i);
                }
            );

Unfortunately, changing an application from single- to multi-threaded can have unintended consequences.  When writing a single-threaded WinForms application you don’t have to worry about things like WinForms Controls only being accessible from the UI thread:

image

The doWork() method accesses the ListBox directly to add items to it, but since doWork() is now being run on a different thread, it violates a fundamental Windows requirement that UI controls only be accessed from the thread they were created on.

This problem isn’t new with .Net 4.0, it’s always been there and remains even in WPF.  UI Controls can only be accessed from the thread that created them and that should be the main application thread.  So there are some hoops we have to jump through in order to get back to the UI thread in order to update the control.  There are a number of articles available that describe patterns for dealing with this – the one I typically use is:

        #region UI Threading Pattern
        private  delegate void addToListDelegate(string item);
        private void addToList(string item)
        {
            if (listBox1.InvokeRequired)
            {
                listBox1.BeginInvoke(new addToListDelegate(innerAddToList), new object[] { item });
            }
            else
            {
                innerAddToList(item);
            }
        }
        private void innerAddToList(string item)
        {
            listBox1.Items.Add(item);
        }
        #endregion

Then I change the doWork() method to add items to the list through the addToList() method instead of directly:

        private void doWork(int instance)
        {
            Stopwatch watch = Stopwatch.StartNew();
            double result = Math.Acos(new Random().NextDouble()) * Math.Atan2(new Random().NextDouble(), new Random().NextDouble());
            for (int i = 0; i < 20000; i++)
            {
                result += (Math.Cos(new Random().NextDouble()) * Math.Acos(new Random().NextDouble()));
            }
            
            watch.Stop();
            addToList(String.Format("{0} took {1} milliseconds", instance, watch.ElapsedMilliseconds));
        }

And, for consistency, I do the same to the goButton_Click where the total elapsed time is recorded.

Running the application now results in the same performance improvement seen in the console application:

image

I take this new version to Bob, pretty confident that dropping the processing time from 12.4 seconds to 3.8 will make him happy.  The only problem is that I’m not sure how to bill him for it – after all, using the Parallel Extensions I was able to get this speed improvement with only a few minutes of work -- .Net 4.0 might improve my productivity, but it could have a negative effect on my Accounts Receivable.

Unfortunately, Bob’s not as impressed as I thought he’d be:

“3.8 seconds?  I still have to wait 3.8 seconds?  Faster! Faster! Faster!  I need instantaneous results!  I need to start working with the list as soon as I click the Go button!”

Bob’s a little high-strung.

But something he said sparks an idea.  Bob needs to work with the list immediately, but not with the entire list.  His process is to do something with each item in the list (no, I still don’t know what he does with them), so he doesn’t need everything, he just needs enough to start working.  While he’s working on the early results, later results can be completed and returned.

The way we’d do that today is to start our own background thread using something like ThreadPool.QueueUserWorkItem().  I can move the current code from button click event to another method, then use QueueUserWorkItem to run that code on a background thread:

        private void goButton_Click(object sender, EventArgs e)
        {
            ThreadPool.QueueUserWorkItem(new WaitCallback(freeUi));
        }
        private void freeUi(object state)
        {
            Stopwatch watch = new Stopwatch();
            watch.Start();
            Parallel.For(0, 100, (i) =>
                {
                    doWork(i);
                }
            );
            watch.Stop();
            addToList(String.Format("Entire process took {0} milliseconds", watch.ElapsedMilliseconds));
        }

This works great.  The list starts populating immediately and Bob will be able to get started working on the first items in the list while the remaining work finishes.  But QueueUserWorkItem is a little passé – it’s very … .Net 2.0 – what does 4.0 offer us to replace it with?

Enter the System.Threading.Tasks namespace and the Task class.  Using the StartNew() method on Task, we can create our background thread using the new, .Net 4.0 Task model, rather than via the old ThreadPool.  New is better.

Note: StartNew() was introduced in the September CTP of the Parallel Extensions, which is only available as part of the VS2010 CTP.  Using the June CTP in VS2008, you will have to Task.Create() a task and then Start() it.

        private void goButton_Click(object sender, EventArgs e)
        {
            //ThreadPool.QueueUserWorkItem(new WaitCallback(freeUi));
            System.Threading.Tasks.Task.StartNew(delegate { freeUi(null); });
        }

Bob is thrilled with this new version.

“I’m thrilled!  This is great!  It’s instantaneous! By the way … how do I make it stop?”

So Bob wants a way to stop the population of the list once he’s started it.  I take a quick look in the System.Threading.Tasks namespace, and, sure enough, StartNew() returns a reference to the new Task and the Task has a Cancel() method, so I can give Bob a Cancel button.

        private Task _task;
        private void goButton_Click(object sender, EventArgs e)
        {
            _task = Task.StartNew(delegate { freeUi(null); });
        }
        
        private void cancelButton_Click(object sender, EventArgs e)
        {
            if (_task != null)
                _task.Cancel();
        }

There’s one more step that I need to take, and that’s to also cancel the Parallel.For loop in the freeUi() method.  Each time through that loop is a separate Task (that has the Task running freeUi() as its parent), so as part of that loop I need to check to see if the parent task has been canceled.  I also want cancel any remaining iterations of that loop, so I’ll use a different delegate that brings in a ParallelState object – among other things, this class has a Stop() method that will stop execution of the loop.

Note: As of this writing, the cancellation method for a Task and its children, like everything about this prerelease, is subject to change prior to the release of .Net 4.0.  This is a scenario that will likely be improved before final release.

        private void freeUi(object state)
        {
            Stopwatch watch = new Stopwatch();
            watch.Start();
            Task _currentTask = Task.Current;
            Parallel.For(0, 100, (i, loopState) =>
                {
                    if (_currentTask.IsCancellationRequested)
                    {
                        loopState.Stop();
                        return;
                    }
                    doWork(i);
                }
            );
            if (_currentTask.IsCancellationRequested)
            {
                addToList("Process cancelled");
                return;
            }
            watch.Stop();
            addToList(String.Format("Entire process took {0} milliseconds", watch.ElapsedMilliseconds));
        }

Bob’s happy with this latest version.  It satisfies both reasons to parallelize or multi-thread an application: Performance and User Experience.  Bob’s experience is improved because he’s able to continue work immediately after clicking on the Go button and doesn’t have to wait at all, and the entire process’ execution time has improved from over twelve seconds to under four (on a quadcore PC).  Even I’m happy, because I can now send Bob an invoice and get paid.

Download the source code:

Sunday, January 25, 2009

Parallel Programming in .Net 4.0 and VS2010: Part I – The Parallel Task Library (Parallel.For, Parallel.Foreach() and Invoke())

Copyright 2008-2009, Paul Jackson, all rights reserved

Update 2/9/2010 – This original article was written using the Visual Studio 2010 CTP.  I’ve since updated the information in a new article which takes into account changes in the API as of the release of the VS2010 RC.

The Parallel Extensions, which include the Parallel Task Library, Parallel LINQ and Concurrency Data Structures, have been available as a CTP for some time now and run well in Visual Studio 2008, but will be officially released as part of .Net 4.0. 

To use the library now, in VS2008, you must download and install the Parallel Extensions library.  Once installed, add a reference to the library’s assembly to your project:

image

Yes, that’s System.Threading there with a 1.x version number.  The library install registers the assembly with Visual Studio and the classes are in the System.Threading namespace.   It appears they’ll be built-in for .Net 4.0, but for now we have an extension assembly with the same name as a core .Net namespace.  So when you type "System.Threading. and don’t get any of the extension classes you were expecting in Intellisense, remember you’ll have to add this reference.

Parallel.For()

In this article, we’re going to look at the Parallel Task Library and what it makes available for parallelizing for and foreach loops.

Starting with a simple, single-threaded application that does some work a few times:

            for (int i = 0; i < 100; i++)
            {
                doWork(i);
            }

And where doWork() simply does a lot of math on random numbers:

        private static void doWork(int instance)
        {
            double result = 
                Math.Acos(new Random().NextDouble()) * 
                Math.Atan2(new Random().NextDouble(), new Random().NextDouble());
            
            for (int i = 0; i < 20000; i++)
            {
                result += (
                    Math.Cos(new Random().NextDouble()) * 
                    Math.Acos(new Random().NextDouble()));
            }
        }

Running this code on a system with four cores results in a some activity on all four cores and a runtime of a little over 12 seconds:

image

But by changing one line of code and switching from a traditional, sequential For loop to the Parallel.For loop provided by the library, we can have a significant impact on the performance of the application:

            //for (int i = 0; i < 100; i++)
            System.Threading.Parallel.For(0, 100, delegate(int i)
                {
                    doWork(i);
                }
            );

The change is impressive – all four cores spike and the process finishes in 3.7 seconds instead of 12.

image

A look at what’s happening behind the scenes with the Visual Studio Threads Debug Window gives some insight into where the improvement comes from:

image

The Parallel Library has started seven threads to do the work.  Take note of the number of threads because it changes based on the number of cores available in the system -- (cores * 2) –1 – well, actually (cores * 2), but one of them is doing other work.  You, as a developer, won’t have to determine how many threads it’s appropriate for you to start on a given system, the library does it for you automatically (there is a method to gain more control of the threads, though).  The downside of this, of course, is you’ll now have multiple threads to try and debug if there’s a problem.  Visual Studio 2010 includes some new debugging tools that will make this easier and which I’ll be examining in a future post.

There is some overhead associated with threading, so the individual tasks may take a little longer – in this case, they went from about 120-milliseconds for the single-threaded example to a wide range, some as high as 400-milliseconds, in the parallel:

image image

Single-threaded

Parallel

This overhead should be part of your decision about when and what to parallelize in your application.  What’s more important for your application: that it finish faster or use less total CPU time?  This is something only you can answer with regard to your application and goals.

Another consideration is ordering of the work.  As you can see from the screenshots above, the single-threaded example performs the work sequentially.  Each time through the for loop does its work and completes before the next starts.  But when this is parallelized, the order of completion can’t be guaranteed.  As the work is off-loaded to seven worker threads (in this example), each of those threads could complete its work and get the next task at any time.  So parallelizing isn’t an option (or is a more complex option) when the order of execution matters to your application.

In parallel (pun intended) with this series on how to parallelize the code, I’ll also be writing a series of articles on the process of parallelizing – how to decide when, where and what the candidates for parallelizing are in an application.

Parallel.ForEach()

The Parallel.ForEach() method follows the same pattern:

            List<int> list = new List<int>();
            // populate list
            Parallel.ForEach(list, item =>
                {
                    // parallel work here
                }
            );

In this example I used a slightly different syntax for the body of the method, “=>” instead of “delegate”, but the end result is the same. 

Parallel.Invoke()

Parallel.For() and Parallel.ForEach() are useful if you have collections you need to act on or a known number of times you have to execute the same code, but what about when you have several different things (methods) you need to do?  For that, we have Parallel.Invoke().

Parallel.Invoke() simply executes, in parallel, a list of methods passed to it:

        static void Main(string[] args)
        {
            Parallel.Invoke(A, B, C, D, E);
            Console.ReadLine();
        }
        public static void A(){}
        public static void B(){}
        public static void C(){}
        public static void D(){}
        public static void E(){}

Methods A(), B(), C(), D() and E() will be assigned to worker threads as they become available and execute in parallel.  As with the other methods, there are no guarantees about the order of execution.

Conclusion

The Parallel Extensions in .Net 4.0 are going give developers a much easier and consistent way to parallelize their applications.  The library greatly insulates us from the complexity and inner plumbing of the parallel world, but still have responsibilities to use them properly.  Hard questions and issues will remain around the architecture, design and implementation of parallelization in our applications, but Microsoft is aware of this and is busy providing us with more resources than just the tools in this library.  The Parallel Computing Developer Center has a wealth of whitepapers and presentations on these topics.

Sunday, January 18, 2009

Upcoming Parallel Programming Posts

Copyright 2008-2009, Paul Jackson, all rights reserved

Due to a critical project at work, it’s been a number of months since I’ve had time to blog, but in that time there’ve been some exciting announcements about Parallel Programming and Visual Studio 2010 / .Net 4.0.

Over the next few posts I’ll explore some of these, including:

Part I – The Task Parallel Library

Part II – Parallel LINQ

Part III – Debugging Multi-threaded Applications in Visual Studio 2010

I’m most excited about the third item, because the new parallel debugging tools in VS2010 appear to be the solution to a serious problem in multi-threaded programming: being able to debug multiple threads simultaneously. 

In addition, I’ll be developing a session on these topics to present at area Code Camps.  I’ll be posting the sample code and PowerPoint here when it’s complete.