Task Scheduling Improvements

time to read 3 min | 420 words

I took some time to see how I can improve the sample implementation of the task scheduling that I posted yesterday.

You can checkout the code here. What I wanted to check is the ability to scale the solution to many tasks and to IO bound tasks.

In a stress test that I made, the library held its own while scheduling 9,864,100 tasks, at one point we had ~3,500,000 concurrently queued tasks. Memory usage hovered at the ~500Mb range.

The next stage was to see what happens when we have tasks that takes a long amount of time to execute, depends on IO, etc.

I wrote this piece of code:

public class GatherAllFiles : AbstractTask
{
    private readonly string root;

    public GatherAllFiles(string root)
    {
        this.root = root;
    }

    protected override IEnumerable<Condition> Execute()
    {
        List<string> results = new List<string>();
        results.Add(root);
        
        List<Future> futures = new List<Future>();
        foreach (string directory in Directory.GetDirectories(root))
        {
            Future spawn = Spawn(new GatherAllFiles(directory));
            futures.Add(spawn);
        }
        string[] files = Directory.GetFiles(root);

        yield return Done(futures);

        foreach (Future future in futures)
        {
            results.AddRange(future.GetValue<IEnumerable<string>>());
        }
        results.AddRange(files);
        SetResult(results);
    }
}

I then run it on my OSS directory. This directory contains 133,108 directories and 206,298 files (ask me how I know... ).

The library just ate it up without even noticing. Very nice, even if I say so myself :-)