Models for multithreaded applications

As you know, I changed a couple of workplaces during my career. Long story short, one interesting thing that I noticed in different companies is various models for multi-threaded programs (mostly for large embedded systems).

Lets say we have a multi-threaded program. The program should handle heterogeneous tasks. On a very basic level, our program probably receives some data, processes it and produces an output. Question of how to manage threads usually arises in performance critical applications. Otherwise, it is quiet simple – we squeeze into single thread as many tasks as possible.

Actual data processing may take several steps. We may want to split it to several tasks, each handled by thread of its own.

Thread per task model

When using this model, we allocate single thread per task. Since we want to pass work from one thread to another, we will need some synchronization mechanism.

I suppose we’ll need some queues to pass things between threads. Also, we may need muteces or semaphores to synchronize access to queues and other shared objects.

One obvious weakness of this approach is performance. When moving task objects from one thread to another we often move them from CPU to CPU. Modern CPUs heavily deploy caching mechanisms. So keeping objects local to one CPU brings significant performance boost. By moving task from one CPU to another we heart caching and cause performance degradation.

Another performance related problem is that while your task object moves from one thread to another it should undergo at least one context switch. Depending on overall number of tasks, time it takes to process single task can be greater. Another cause of context switches is mutual exclusion mechanism that have to protect the data structure used to communicate objects between threads.

One last significant performance issue that I would like to mention is in the core of this approach. Each thread is bound by processing power of a single core. But what if one of the tasks require more processing power. The computer may have enough processing power. This is especially true with recent multi-core trend. But when a single thread runs on one core at a time inevitably each thread processing the request, is a potential bottle-neck.

Advantage of this approach is in its simplicity. You don’t have to develop significant infrastructure to make it work. All you need is pthreads and a couple of classes that do the job. All tools that you will use are available out of the box. Because it is so common, it will be easier to find skilled and experienced engineers to implement and use with this model.

Worker threads

In this model, each task split into multiple jobs. Each job being handled by single thread. There is number of worker threads each taking care of at most single job at any particular moment. To improve performance, system tries to handle jobs that belong to same task on the same core. This means that there might be number of threads per single core.

Unlike thread per task model, this model is much more difficult to implement. It implies some constraints on the tasks:

Engineer designing such system should be able to split tasks into multiple jobs. Alternatively, tasks can be very short.
Jobs should be more or less equal in size in terms of required processing resources.

In addition, this approach requires smart task management. Such system should implement scheduler that spreads tasks between cores. A serios problem arises when number of running tasks becomes larger than number of cores in the system. When it happens, scheduler should start two or more tasks on single core. To do it wisely, it should be aware of how much processing power required by tasks. It should manage number of tasks running on particular core and assign new task to least busy core.

Needless to say that this approach is way more complex than naive thread per task model. It requires more skilled engineers. Moreover, these engineers should spend their time developing something that is not directly related to the matters of the project – that is various infrastructures for the project.

On the other hand, this approach allows engineers to squeeze every bit of performance out of the machine. This approach will take the hardware to its limit.

Other approaches

There are many other approaches of course. Most of them however, are an attempt to merge between two models I described. For instance, one company started with thread per task model for their embedded system and quickly ran into performance problems. So they tried to solve problems by splitting busiest tasks into multiple threads, basically changing threading model for one particular thread into worker threads model.

Recently I ran into another model that is somewhat unique. This model implements so called user-mode threads. Entire program runs in a single thread. Like worker threads model, in this model tasks split into smaller jobs. However, instead of letting the operating system to schedule between job execution, this model implements a user-mode scheduler.

The actual scheduling is made using setjmp() and longjmp() calls. One obvious constrain that this model presents is that if thread goes to sleep in operating system terms it will send entire system to sleep. As a result, the system should have very tight control over what threads does and in particular how it sleeps. I.e. threads are not allowed to do regular I/O. Instead they should do I/O in a centralized way.

For example, thread that wishes to read from a file-descriptor should register this file-descriptor with scheduler. Scheduler will select() or poll() on all registered file-descriptors. Depending on what file-descriptor requires attention, scheduler will schedule thread that takes care of that file-descriptor.

In this model you can run only a single instance of scheduler on single processor. So this model is not entierly optimized for MP machines. If you still want to run multiple schedulers on single computer then you can either run them in separate processes or multiple threads of the same process. In both cases communication between threads that belong to different schedulers is cumbersome. For instance if run multiple schedulers in multiple threads, you cannot use mutexes to synchronize between threads that belong to various schedulers because locking such mutex will send entire system to sleep. Instead you should use some sort of I/O based IPC.

Despite number of disadvantages that this approach has, there are couple of good things into it. Synchronizing between threads that belong to same instance of scheduler is much easier because there is, at most, one thread that runs at any particular moment. So, if, for instance, you have some simple data structure that being used by multiple threads there is no need to protect it from simultaneous access.

Image may be NSFW.
Clik here to view.

Models for multithreaded applications

Thread per task model

Worker threads

Other approaches

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112