Task Scheduling Improvements

time to read 3 min | 420 words

I took some time to see how I can improve the sample implementation of the task scheduling that I posted yesterday.

You can checkout the code here. What I wanted to check is the ability to scale the solution to many tasks and to IO bound tasks.

In a stress test that I made, the library held its own while scheduling 9,864,100 tasks, at one point we had ~3,500,000 concurrently queued tasks. Memory usage hovered at the ~500Mb range.

The next stage was to see what happens when we have tasks that takes a long amount of time to execute, depends on IO, etc.

I wrote this piece of code:

public class GatherAllFiles : AbstractTask
{
    private readonly string root;

    public GatherAllFiles(string root)
    {
        this.root = root;
    }

    protected override IEnumerable<Condition> Execute()
    {
        List<string> results = new List<string>();
        results.Add(root);
        
        List<Future> futures = new List<Future>();
        foreach (string directory in Directory.GetDirectories(root))
        {
            Future spawn = Spawn(new GatherAllFiles(directory));
            futures.Add(spawn);
        }
        string[] files = Directory.GetFiles(root);

        yield return Done(futures);

        foreach (Future future in futures)
        {
            results.AddRange(future.GetValue<IEnumerable<string>>());
        }
        results.AddRange(files);
        SetResult(results);
    }
}

I then run it on my OSS directory. This directory contains 133,108 directories and 206,298 files (ask me how I know... ).

The library just ate it up without even noticing. Very nice, even if I say so myself :-)

Tweet Share Share 14 comments

Tags:

Comments

08 Jan 2008
03:54 AM

Francois Tanguay

Using Lambda Expressions, I would simplify:

class AbstractTask:

public WaitFuture Done(IEnumerable<Future> items)

{

return new WaitForFuture(() => items.All(item => item.HasValue));

}

08 Jan 2008
08:32 AM

Frans Bouma

Task scheduling is a topic that's been researched a lot. In general, round-robin with a priority system is on average a system which gets you the best results.

Your testcase is a bit weird. The disk is a resource which can be in 1 task at a time. This means that even if you create a fancy scheduling system, it doesn't matter, everything is done sequential anyway. What's worse: the slowest thing in your system is the head in the harddisk: every time it steps, you'll lose a big chunk of performance.

To get optimal performance on a disk with a set of tasks is to execute all of them in a sequential order and not having any of them interfere with one another. The thing is though that modern disks are able to 'accept' more than one command at a time, and switch between them during atomic block reads for example. This causes a lot of stepping, which results in a degration of performance.

So to illustrate/test task scheduling, I'd pick a topic which isn't limited by a sequential resource like a disk ;)

08 Jan 2008
11:28 AM

Ayende Rahien

Frans,

Actually, the FS is not limited to the HD, there is quite a bit of caching and forward loading going on.

In addition to that, we aren't executing everything at the same time, this is the beauty of this approach, we are only executing N tasks at a time, where N == count(CPU)

08 Jan 2008
18:34 PM

Roy Tate

On a decent server, I would expect multiple CPUs and a RAID configuration, so a file system (FS) example is not totally out of the question.

08 Jan 2008
20:50 PM

Marco

How would you handle the scheduling part? for example run one task every minute and another task every day at 05:00..

Thanks

Marco

08 Jan 2008
22:24 PM

Ayende Rahien

Marco,

That is something that I usually won't do using this approach.

Nevertheless

public class PrintTimeTask : AbstractTask

{

public override IEnumerable<Condition> Execute()

{

      while(true)

      {

                   Console.WriteLine(DateTime.Now);

                   DateTime nextSchedule = DateTime.Now.AddSeconds(5);

                  yield return delegate { return DateTime.Now >= nestSchedule; };


      }

}

08 Jan 2008
23:01 PM

Jeff Brown

It works of course, but I can imagine it would get really tiresome having to write code in this stilted form all of the time.

It's not clear to me why you don't just compose all of the futures together more directly using lambdas to encapsulate dependent computations instead of all of this tedious "yield return" nonsense.

09 Jan 2008
01:47 AM

Ayende Rahien

Jeff,

Can you show me the code of doing this with Lambda?

09 Jan 2008
03:21 AM

Francois Tanguay

Probably something like

class AbstractTask

public Condition Where(Condition condition)

{

return condition;

}

And then

while(true)

{

Console.WriteLine(DateTime.Now);

return Where(() => DateTime.Now >= nextSchedule);

09 Jan 2008
06:24 AM

Ayende Rahien

Francois,

I am using .NET 2.0 to write the code.

This approach is the same thing, just with lambda syntax.

12 Jan 2008
09:03 AM

Jeff Brown

The idea with lambda is instead of making up a scheduling loop with yield return which only works within the context of a single method, we make the tasks chain themselves around an asynchronous callback.

So basically, we have Task objects floating around. To start running some Task, we create a new Task object containing a delegate for the first block of code we want to run. We let the Task bubble up to the scheduler.

When the delegate eventually runs, it can begin a new Task which gets chained onto the previous one.

This idea works particularly well if we think of Tasks as spawning asynchronous operations. That is, we might run a little code in a delegate that starts the asynchronous operation. When the asynchronous operation finishes, we finally mark the Task finished and set its result (or an exception).

When the Task is marked done, it sends out a notification that enables other chained Tasks to start up using whatever result was previously computed.

I did something very much like this in ActionScript 3 because I had to manage a large number of asynchronous operations all sharing the only thread available. So everything ends up having to post back to the event loop. Basically the event loop is functioning as a kind of scheduler.

Here's the code for the ActionScript 3 stuff.

https://svn.castleproject.org/svn/castlecontrib/flexbridge/trunk/src/Castle.FlexBridge.Flash/src/castle/flexbridge/common/AsyncTask.as

Unfortunately most of the cool code using AsyncTask is in other proprietary projects that have yet to be OSS'd However typical usage looks like:

            public function signInWithSavedCredentials():AsyncTask

    {

        return AsyncTask.start(function(task:AsyncTask):void

        {

            if (hasSavedCredentials)

            {

                task.joinTask(signIn(savedUserName, savedPassword, true));

            }

            else

            {

                task.done(false);

            }

        });

    }


    public function signIn(userName:String, password:String, saveCredentials:Boolean):AsyncTask

    {

        return authenticationService.onResultStart(function(task:AsyncTask, service:AbstractService):void

        {

            task.joinToken(service.GetAuthenticationTicket(userName, password));

        }).onResultStart(function(task:AsyncTask, ticket:String):void

        {

            if (ticket != null)

            {

                _webServiceLocator.setAuthenticationTicket(ticket);

                _signedInUserName = userName;


                dispatchEvent(new Event("signedInUserNameChange"));


                if (saveCredentials)

                {

                    _savedCredentials.data['userName'] = userName;

                    _savedCredentials.data['password'] = password;

                    _savedCredentials.flush();

                }   


                task.done(true);

                return;

            }


            task.done(false);

        });

    }

Now in .Net we have a lot more power available so we will probably want to use a somewhat different abstraction. Moreover with the C# 3.0 syntax, we can cook up a more fluent interface for creating and chaining these asynchronous tasks together.

And we can rearrange the control-flow a bit to put some kind of TaskScheduler in change of the scheduling so that each new Task to run gets posted back to the scheduler instead of running immediately. If desired.

Lots of possibilities in this design space. So don't stop with "yield return" loops...

13 Jan 2008
14:44 PM

Ayende Rahien

Jeff,

what you are talking about is message passing architecture.

I have tried that with Retlang, and it works very nicely. For some things.

For others, this approach is far preferable, it is much easier to write code this way than explicitly handle the state transfer issues.

13 Jan 2008
17:25 PM

Jeff Brown

No. It's not strictly message passing... Quite different from passing data around using message ports.

What we're really doing is chaining Futures that to create composite asynchronous operations.

13 Jan 2008
18:26 PM

Ayende Rahien

I don't see the difference, can you elaborate?

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB