Multithreading can be used to drastically speed up your application’s performance, but no speed is free – handling parallel threads requires careful programming and without the right precautions you can encounter competition conditions, blockages and even crashes.
What makes Multithreading hard?
Unless you say otherwise to your program, all your codes will run on the “Main Thread”;. From the starting point of your application, it goes through and runs all your functions one after the other. This has a limit to performance, because of course you can only do so much if you have to process everything one at a time. Most modern processors have six or more cores with 12 or more threads, so there is performance left on the table if you do not use them.
However, it is not as simple as just “turning on multithreading.” Only specific things (like loops) can be worn properly and there are many considerations to take into account when doing so.
The first and most important question is competition conditions. These often occur during write operations, when a thread changes a resource shared by multiple threads. This leads to behavior where output from the program depends on which thread ends or changes something first, which can lead to random and unexpected behavior.
These can be very, very simple – for example, you may need to keep a running number between the loops. The most obvious way to do this is to create a variable and increase it, but this is not thread safe.
This racial condition arises because it is not just “adding one to the variable” in an abstract sense; CPU loads the value on
number in the registry, add one to that value and then store the result as the new value for the variable. It does not know that another thread meanwhile also tried to do exactly the same and charged a soon incorrect value on
number. The two threads come into conflict and at the end of the loop,
number may not equal 100.
.NET provides a feature to handle this:
lock keyword. This does not prevent you from making changes right away, but it helps to handle simultaneity by just letting one thread at a time get the lock. If another thread tries to enter a locking pronunciation while another thread is being processed, it waits up to 300 ms before continuing.
You can only lock reference types, so a regular pattern creates a lock object in advance and uses it as a replacement to lock the value type.
However, you may notice that there is now another problem: blockages. This code is a worst case scenario, but here it’s almost exactly the same as just making a regular one
for loop (actually a little slower, because extra threads and locks are extra overhead). Each thread tries to get the lock, but only one at a time can have the lock, so only one thread at a time can actually run the code inside the lock. In this case, it’s the entire loop code, so the lock expression removes all the benefits of threading and only slows things down.
Generally, you want to lock as needed when you need to type. However, you want to have the same simultaneity in mind when choosing what to lock, as readings are not always thread-safe either. If another thread writes to the object, reading it from another thread may give an incorrect value or cause a certain state to return an incorrect result.
Fortunately, there are some tricks to doing it properly where you can balance the multi-thread speed while using locks to avoid competition conditions.
Use Interlocked for Atomic Operations
For basic operations you use
lock statement may be exaggerated. While very useful for locking before complex modifications, it is too much overhead for something as simple as adding or replacing a value.
Interlocked is a class that encompasses certain memory operations such as addition, replacement, and comparison. The underlying methods are implemented at the CPU level and are guaranteed to be atomic and much faster than the standard
lock statement. You want to use them whenever possible, but they do not completely replace locking.
In the example above, you replace the lock with another call
Interlocked.Add() will speed up the operation a lot. Although this simple example is not faster than just not using Interlocked, it is useful as part of a larger operation and is still a faster one.
There is also
-- operations, which saves two solid keystrokes. They literally encircle
Add(ref count, 1) under the hood, so there is no specific speed to use them.
You can also use Exchange, a generic method that sets a variable equal to the value sent to it. However, you should be careful with this – if you set it to a value that you calculated with the original value, this is not thread safe, as the old value could have been changed before running Interlocked.Exchange.
CompareExchange checks two values for equality and replaces the value if they are equal.
Use thread-safe collections
The standard collections in
System.Collections.Generic can be used with multithreading, but they are not completely thread safe. Microsoft provides secure implementations of certain collections in
ConcurrentBag, a disordered generic collection, and
ConcurrentDictionary, a thread-safe dictionary. There are also concurrent queues and stacks, and
OrderablePartitioner, which can divide orderable data sources as lists into separate partitions for each thread.
Look at parallelizing loops
It is often the easiest place to multi-thread in large, expensive loops. If you can perform several options in parallel, you can get a tremendous speed during the total driving time.
The best way to handle this is with
System.Threading.Tasks.Parallel. This class provides replacements for
foreach loops that run the loop bodies on separate wires. It’s easy to use, but requires a little different syntax:
Of course, the catch here is that you have to make sure
DoSomething() is thread-safe and does not interfere with shared variables. However, it is not always as easy as just replacing the loop with a parallel loop, and in many cases you will need to
lock shared items to make changes.
To alleviate some of the problems with blockages,
Parallel.ForEach provide additional features to manage permissions. Basically, not every iteration will run on a separate thread – if you have 1000 elements, it will not create 1000 threads; it will make as many threads as your CPU can handle and run multiple iterations per thread. This means that if you calculate the total, you do not have to lock for each iteration. You can simply pass a subtotal variable, and at the end, lock the object and make changes once. This drastically reduces overheads on very large lists.
Let’s take a look at an example. The following code takes a large list of objects and must be serialized each separately to JSON and ends with one
List of all objects. JSON serialization is a very slow process, so splitting each element across multiple threads is a great speed.
There are a lot of arguments and a lot to unpack here:
- The first argument takes an IEnumerable, which defines the data it orbits. This is a ForEach loop, but the same concept works for basic For-loops.
- The first action initializes the local subtotal variable. This variable will be divided over each iteration of the loop, but only within the same thread. Other threads will have their own subtotals. Here we initialize it to an empty list. If you calculated a numeric sum, you could
- The second action is the main loop body. The first argument is the current element (or index in a For-loop), the second is a ParallelLoopState object that you can use to call
.Break(), and the last is the subtotal variable.
- In this loop you can use the element and change the subtotal. The value you return replaces the subtotal for the next loop. In this case, we series the element into a string and then add the string to the subtotal, which is a list.
- Finally, the last action takes the subtotal ‘result’ after all executions have been completed, so you can lock and change a resource based on the final sum. This action runs once, at the end, but it still runs on a separate thread, so you must lock or use locked methods to change resources. Here we call
AddRange()to add the subtotal list to the final list.
One last note – if you use the Unity game engine, you want to be careful with multi-threading. You can not call any Unity API, otherwise the game will crash. It is possible to use it sparingly by doing API operations on the main thread and switching back and forth when you need to parallelize something.
For the most part, this applies to operations that interact with the stage or physics engine. Vector3 math is not affected, and you can use it from a separate thread without any problems. You are also free to change the fields and properties of your own objects, provided they do not call any device operations under the hood.