8

I wasn't sure if here or SO was the right place to ask this, but here goes anyway.

So I want to improve a system that is currently running. It has services and many stand alone apps, but none of these things are properly coordinated. Now I want to build a mini app to prove the improvements to the system should we build these things into a new "framework" if you will.

The idea is to create a dashboard application that manages and reports on a service in the background. The service will run lets say every 3 minutes and this will in turn call a class (FileProcessor i.e.) that in turn calls each type of processor's Run method. This Run method will be a call to launch a thread of a Runnable class. I want to then use this runnable class to hold properties of the file it is/was processing, if an error existed - the cause, an option to rerun the method, its current status (i.e. at what point in processing it is).

These "runnables" will sometimes be a direct interaction with a file in a directory, or it would be a record in the database requesting a file be processed.

Now having some background, I'd like to ask the following:

Imagine the direct interaction with the files in a directory (i.e. changing of an extension) is done immediately in a "batch mode" where lets say five files were found in C:\Program Files\My App\Files\\*.xyz, and this process (directly after a directory search) renames all these files to C:\Program Files\My App\Files\\*.zyx one at a time, lets say with the following:

foreach (FileInfo fi in di.GetFiles("*.xyz", TopDirectoryOnly))
{
    fi.MoveTo(Path.ChangeDirectory(fi.FullName, ".zyx"));
}

Oppositely, the records retrieved from the database will be each given to a separate thread to process, lets say 10 files request processing, 10 threads will be launched of this Runnable type, each holding this status information etc. These threads will be added to a list to keep track of these processes to be able to pass this information to the application.

The renaming of files are simple and quick and thus I believe to put it in a single "method call", whereas the files that request processing can be anything between 20 to 700 lines where values are drawn from the line and inserted into a MySQL database. For this reason I want to allow all files to "start at the same time" so that lets say one or two 700 line files does not block/delay the other twenty 15 line files that could have all been done by the time a single 700 line file processed.

Now basically I'd like to know if someone with more knowledge and experience than me can say if this is a good idea or way of approaching the solution. Perhaps I'm missing something or my design could be off.

P.S. This service and application will run on a server that is quite capable.

Thank you in advance for any direction and/or advice.

1 Answers1

10

Threads do have significant costs - VERY roughly - imagine 100K bytes per thread (they each need a stack for one thing), and they each place a slight burdon on operating system components (e.g. scheduler) which have to manage them all.

Threads DO present a very simple model for managing async tasks. I'm a big fan of that approach.

But if you are going to use a lot of threads, please consider using thread pools as a way to re-use the underlying thread objects (while having lots of runnables - just not running).

And - since you are using C#, async tasks (https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/) are a more efficient stategy to consider.

Often though - simplicity of implementation matters more than efficiency (up to a point). What you described with a thread-pool (to throttle actual thread count) may work fine.

Robert Harvey
  • 200,592
Lewis Pringle
  • 2,975
  • 1
  • 11
  • 15