520 lines
25 KiB
Markdown
520 lines
25 KiB
Markdown
|
---
|
|||
|
title: C++ Multithreading
|
|||
|
---
|
|||
|
Computers have to handle many different things at once, for example
|
|||
|
screen, keyboard, drives, database, internet.
|
|||
|
|
|||
|
These are best represented as communicating concurrent processes, with
|
|||
|
channels, as in Go routines. Even algorithms that are not really handling
|
|||
|
many things at once, but are doing a single thing, such as everyone’s
|
|||
|
sample program, the sieve of Eratosthenes, are cleanly represented as
|
|||
|
communicating concurrent processes with channels.
|
|||
|
|
|||
|
[asynch await]:../client_server.html#the-equivalent-of-raii-in-event-oriented-code
|
|||
|
|
|||
|
On the other hand, also, not quite so cleanly, represented by [asynch await] which makes for much lighter weight code, more cleanly interfaceable with C++.
|
|||
|
|
|||
|
Concurrency is not the same thing as parallelism.
|
|||
|
|
|||
|
A node.js program is typically thousands of communicating concurrent
|
|||
|
processes, with absolutely no parallelism, in the sense that node.js is single
|
|||
|
threaded, but a node.js program typically has an enormous number of code
|
|||
|
continuations, each of which is in effect the state of a concurrent
|
|||
|
communicating process. Lightweight threads as in Go are threads that on
|
|||
|
hitting a pause get their stack state stashed into an event handler and
|
|||
|
executed by event oriented code, so one can always accomplish the same
|
|||
|
effect more efficiently by writing directly in event oriented code.
|
|||
|
|
|||
|
And it is frequently the case that when you cleverly implement many
|
|||
|
concurrent processes with more than one thread of execution, so that some
|
|||
|
of your many concurrent processes are executed in parallel, your program
|
|||
|
runs slower, rather than faster.
|
|||
|
|
|||
|
C++ multithreading is written around a way of coding that in practice does
|
|||
|
not seem all that useful – parallel bitbashing. The idea is that you are
|
|||
|
doing one thing, but dividing that one thing up between several threads to get
|
|||
|
more bits bashed per second, the archetypical example being a for loop
|
|||
|
performed in parallel, and then all the threads join after the loop is
|
|||
|
complete.
|
|||
|
|
|||
|
The normal case however is that you want to manage a thousand things at
|
|||
|
once, for example a thousand connections to the server. You are not
|
|||
|
worried about how many millions of floating point operations per second,
|
|||
|
but you are worried about processes sitting around doing nothing while
|
|||
|
waiting for network or disk operations to complete.
|
|||
|
|
|||
|
For this, you need concurrent communicating processes, as in Go or event
|
|||
|
orientation as in node.js or nginx, node.js, not necessarily parallelism,
|
|||
|
which C++ threads are designed around.
|
|||
|
|
|||
|
The need to deal with many peers and a potentially enormous number of
|
|||
|
clients suggests multiprocessing in the style of Go and node.js, rather than
|
|||
|
what C++ multiprocessing is designed around, suggests a very large
|
|||
|
number of processes that are concurrent, but not all that parallel, rather
|
|||
|
than a small number of processes that are concurrent and also substantially
|
|||
|
parallel. Representing a process by a thread runs into troubles at around
|
|||
|
sixty four threads.
|
|||
|
|
|||
|
It is probably efficient to represent interactions between peers as threads,
|
|||
|
but client/peer are going to need either events or Go lightweight threads,
|
|||
|
and client/client interactions are going to need events.
|
|||
|
|
|||
|
Existing operating systems run far more than sixty four threads, but this
|
|||
|
only works because grouped into processes, and most of those processes
|
|||
|
inactive. If you have more than sixty four concurrently active threads in an
|
|||
|
active process, with the intent that half a dozen or so of those active
|
|||
|
concurrent threads will be actually executing in parallel, as for example a
|
|||
|
browser with a thread for each tab, and sixty four tabs, that active process
|
|||
|
is likely to be not very active.
|
|||
|
|
|||
|
Thus scaling Apache, whether as threads on windows or processes under
|
|||
|
Linux, is apt to die.
|
|||
|
|
|||
|
# Need the solutions implemented by Tokio, Actix, Node.js and Go
|
|||
|
|
|||
|
Not the solutions supplied by the C++ libraries, because we are worrying
|
|||
|
about servers, not massive bit bashing.
|
|||
|
|
|||
|
Go routines and channels can cleanly express both the kind of problems
|
|||
|
that node.js addresses, and also address the kind of problem that C++
|
|||
|
threads address, typically that you divide a task into a dozen subtasks, and
|
|||
|
then wait for them all to complete before you take the next step, which are
|
|||
|
hard to express as node.js continuations. Goroutines are a more flexible
|
|||
|
and general solution, that make it easier to express a wider range of
|
|||
|
algorithms concisely and transparently, but I am not seeing any mass rush
|
|||
|
from node.js to Go. Most of the time, it is easy enough to write in code
|
|||
|
continuations inside an event handler.
|
|||
|
|
|||
|
The general concurrent task that Google’s massively distributed database
|
|||
|
is intended to express is that you have a thousand tasks each of which
|
|||
|
generate a thousand outputs, which get sorted, and each of the enormous
|
|||
|
number of items that sort into the same equivalence group gets aggregated
|
|||
|
in a commutative operation, which can therefore be handled by any
|
|||
|
number of processes in any order, and possibly the entire sort sequence
|
|||
|
gets aggregated in an associative operation, which can therefore be
|
|||
|
handled by any number of processes in any order.
|
|||
|
|
|||
|
The magic in the Google massively parallel database is that one can define a
|
|||
|
a massively parallel operation on a large number of items in a database
|
|||
|
simultaneously, much as one defines a join in SQL, and one can define
|
|||
|
another massively parallel operation as commutative and or associative
|
|||
|
operations on the sorted output of such a massively parallel operation. But
|
|||
|
we are not much interested in this capability. Though something
|
|||
|
resembling that is going to be needed when we have to shard.
|
|||
|
|
|||
|
# doing node.js in C++
|
|||
|
|
|||
|
Dumb idea. We already have the node.js solution in a Rust library.
|
|||
|
|
|||
|
Actix and Tokio are the (somewhat Cish) solutions.
|
|||
|
|
|||
|
## Use Go
|
|||
|
|
|||
|
Throw up hands in despair, and provide an interface linking Go to secure
|
|||
|
Zooko ids, similar to the existing interface linking it to Quic and SSL.
|
|||
|
|
|||
|
This solution has the substantial advantage that it would then be relatively
|
|||
|
easy to drop in the existing social networking software written in Go, such
|
|||
|
as Gitea.
|
|||
|
|
|||
|
We probably don’t want Go to start managing C++ spawned threads, but
|
|||
|
the Go documentation seems to claim that when a Go heavyweight thread
|
|||
|
gets stuck at a C mutex while executing C code, Go just spawns another to
|
|||
|
deal with the lightweight threads when the lightweight threads start piling
|
|||
|
up.
|
|||
|
|
|||
|
When a C++ thread wants to despatch an event to Go, it calls a Go routine
|
|||
|
with a select and a default, so that the Go routine will never attempt to
|
|||
|
pause the C++ spawned thread on the assumption that it is a Go spawned
|
|||
|
thread. But it would likely be safer to call Goroutines on a thread that was
|
|||
|
originally spawned by Go.
|
|||
|
|
|||
|
## doing it in C the C way
|
|||
|
|
|||
|
Processes represented as threads. Channels have a mutex. A thread grabs
|
|||
|
total exclusive ownership of a channel whenever it takes something out or
|
|||
|
puts something in. If a channel is empty or full, it then waits on a
|
|||
|
condition on the mutex, and when the other thread grabs the mutex and
|
|||
|
makes the channel ready, it notices that the other process or processes are
|
|||
|
waiting on condition, the condition is now fulfilled, and sends a
|
|||
|
notify_one.
|
|||
|
|
|||
|
Or, when the channel is neither empty nor full, we have an atomic spin lock,
|
|||
|
and when sleeping might become necessary, then we go to full mutex resolution.
|
|||
|
|
|||
|
Which implies a whole pile of data global to all threads, which will have
|
|||
|
to be atomically changed.
|
|||
|
|
|||
|
This can be done by giving each thread two buffers for this global data
|
|||
|
subject to atomic operations, and single pointer or index that points to the
|
|||
|
currently ruling global data set. (The mutex is also of course global, but
|
|||
|
the flag saying whether to use atomics or mutex is located in a data
|
|||
|
structure managed by atomics.)
|
|||
|
|
|||
|
When a thread wants to atomically update a large object (which should be
|
|||
|
sixty four byte aligned) it constructs a copy of the current object, and
|
|||
|
atomically updates the pointer to the copy, if the pointer was not changed
|
|||
|
while it was constructing. The object is immutable while being pointed at.
|
|||
|
|
|||
|
Or we could have two such objects, with the thread spinning if one is in
|
|||
|
use and the other already grabbed, or momentarily sleeping if an atomic
|
|||
|
count indicates other threads are spinning on a switch awaiting
|
|||
|
completion.
|
|||
|
|
|||
|
The read thread, having read, stores its read pointer atomically with
|
|||
|
`memory_order_release`, ored with the flag saying if it is going to full
|
|||
|
mutex resolution. It then reads the write pointer with
|
|||
|
`memory_order_acquire`, that the write thread atomically wrote with
|
|||
|
`memory_order_release`, and if all is well, keeps on reading, and if it is
|
|||
|
blocked, or the write thread has gone to mutex resolution, sets its mutex
|
|||
|
resolution flag and proceeds to mutex resolution. When it is coming out of
|
|||
|
mutex resolution, about to release the mutex, it clears its mutex resolution
|
|||
|
flag. The mutex is near the flags by memory location, all part of one object
|
|||
|
that contains a mutex and atomic variables.
|
|||
|
|
|||
|
So the mutex flag is atomically set when the mutex has not yet been
|
|||
|
acquired, but the thread is unconditionally going to acquire it, but non
|
|||
|
atomically cleared when the mutex still belongs to the thread, but is
|
|||
|
unconditionally going to release it.
|
|||
|
|
|||
|
If many read threads reading from one channel, then each thread has to
|
|||
|
`memory_order_acquire` the read pointer, and then, instead of
|
|||
|
`memory_order_release`ing it, has to do an
|
|||
|
`atomic_compare_exchange_weak_explicit`, and if it changed while it was
|
|||
|
reading abort its reads and start over.
|
|||
|
|
|||
|
Similarly if many write threads writing to one channel, each write thread
|
|||
|
will have first spin lock acquire the privilege of being the sole write thread
|
|||
|
writing, or spin lock acquire a range to write to. Thus in the most general
|
|||
|
case, we have a spin locked atomic write state that specifies an area that
|
|||
|
has been written to, an area that is being written to, and an area that is
|
|||
|
available to be acquired for writing, a spin locked atomic read state, and
|
|||
|
mutex that holds both the write state and the read state. In the case of a
|
|||
|
vector buffer with multiple writers, the atomic states are three wrapping
|
|||
|
atomic pointers that go through the buffer in the same direction,
|
|||
|
|
|||
|
We would like to use direct memory addresses, rather than vector or deque
|
|||
|
addresses, which might require us to write our own vector or deque. See
|
|||
|
the [thread safe deque](https://codereview.stackexchange.com/questions/238347/a-simple-thread-safe-deque-in-c "A simple thread-safe Deque in C++"), which however relies entirely on locks and mutexes,
|
|||
|
and whose extension to atomic locks is not obvious.
|
|||
|
|
|||
|
Suppose you are doing atomic operations, but some operations might be
|
|||
|
expensive and lengthy. You really only want to spin lock on amending data
|
|||
|
that is small and all in close together in memory, so on your second spin,
|
|||
|
the lock has likely been released.
|
|||
|
|
|||
|
Well, if you might need to sleep a thread, you need a regular mutex, but
|
|||
|
how are you going to interface spin locks and regular mutexes?
|
|||
|
|
|||
|
You could cleverly do it with notifies, but I suspect it is costly compared
|
|||
|
to just using a plain old vanilla mutex. Instead you have some data
|
|||
|
protected by atomic locks, and some data protected by regular old
|
|||
|
mutexes, and any time the data protected by the regular old mutex might
|
|||
|
change, you atomically flag a change coming up, and every thread then
|
|||
|
grabs the mutex in order to look amend or even look at the data, until on
|
|||
|
coming out of the mutex with the data, they see the flag saying the mutex
|
|||
|
protected data might change is now clear.
|
|||
|
|
|||
|
After one has flagged the change coming up, and grabbed the mutex, wha
|
|||
|
happens if another thread is cheerfully amending the data in a fast
|
|||
|
operation, having started before you grabbed the mutex? The other thread
|
|||
|
has to be able to back out of that, and then try again, this try likely to be
|
|||
|
with mutex resolution. But what if the other thread wants to write into a
|
|||
|
great big vector, and reallocations of the vector are mutex protected. And
|
|||
|
we want atomic operations so that not everyone has to grab the mutex every
|
|||
|
time.
|
|||
|
|
|||
|
Well, any time you want to do something to the vector, it fits or it does not.
|
|||
|
And if it does not fit, then mutex time. You want all threads to switch
|
|||
|
to mutex resolution, before any thread actually goes to work reallocating
|
|||
|
the vector. So you are going to have to use the costly notify pattern. “I am
|
|||
|
out of space, so going to sleep until I can use the mutex to amend the
|
|||
|
vector. Wake me up when last thread using atomics has stopped using
|
|||
|
atomics that directly reference memory, and has switched to reading the
|
|||
|
mutex protected data, so that I can change the mutex protected data.”
|
|||
|
|
|||
|
The std::vector documentation says that vector access is just as efficient as
|
|||
|
array access, but I am a little puzzled by this claim, as a vector can be
|
|||
|
moved, and specifically requests that you have a no throw move operation for
|
|||
|
optimization, and having a no copy is standard where it contains things that
|
|||
|
might have ownership. (Which leads to complications when one has containers
|
|||
|
of containers, since C++ is apt to helpfully generate a broken copy
|
|||
|
implementation.)
|
|||
|
|
|||
|
Which would suggest that vector access is through indirection, and
|
|||
|
indirects with threading create problems.
|
|||
|
|
|||
|
## lightweight threads in C
|
|||
|
|
|||
|
A lightweight thread is just a thread where, whenever a lightweight thread
|
|||
|
needs to be paused by its heavyweight thread, the heavyweight thread
|
|||
|
stores the current stack state in the heap, and move on to deal with other
|
|||
|
lightweight threads that need to be taken care of. Which collection of
|
|||
|
preserved lightweight thread stack states amount to a pile of event
|
|||
|
handlers that are awaiting events, and having received events, are then
|
|||
|
waiting for a heavyweight thread to process that event handler.
|
|||
|
|
|||
|
Thus one winds up with what suspect it the Tokio solution, a stack that
|
|||
|
is a tree, rather than a stack.
|
|||
|
|
|||
|
Hence the equivalence between node.js and nginx event oriented
|
|||
|
programming, and Go concurrent programming.
|
|||
|
|
|||
|
# costs
|
|||
|
|
|||
|
Windows 10 is limited to sixty four threads total. If you attempt to create
|
|||
|
more threads than that, it still works, but performance is apt to bite, with
|
|||
|
arbitrary and artificial thread blocking. Hence goroutines, that implement
|
|||
|
unofficial threads inside the official threads.
|
|||
|
|
|||
|
Thread creation and destruction is fast, five to twenty microseconds, so
|
|||
|
thread pools do not buy you much, except that your memory is already
|
|||
|
going to be cached. Another source says 40 microseconds on windows,
|
|||
|
and fifty kilobytes per thread. So, a gigabyte of ram could have twenty
|
|||
|
thousand threads hanging around. Except that the windows thread
|
|||
|
scheduler dies on its ass.
|
|||
|
|
|||
|
There is a reasonable discussion of thread costs [here](https://news.ycombinator.com/item?id=22456642)
|
|||
|
|
|||
|
General message is that lots of languages have done it better, often
|
|||
|
immensely better, Go among them.
|
|||
|
|
|||
|
Checking the C++ threading libraries, they all single mindedly focus on
|
|||
|
the particular goal of parallelizing computationally intensive work. Which
|
|||
|
is not in fact terribly useful for anything you are interested in doing.
|
|||
|
|
|||
|
# Atomics
|
|||
|
|
|||
|
```C++
|
|||
|
typedef enum memory_order {
|
|||
|
memory_order_relaxed, // relaxed
|
|||
|
memory_order_consume, // consume
|
|||
|
/* No one, least of all compiler writers, understands what
|
|||
|
"consume" does.
|
|||
|
It has consequences which are difficult to understand or predict,
|
|||
|
and which are apt to be inconsistent between architectures,
|
|||
|
libraries, and compilers. */
|
|||
|
memory_order_acquire, // acquire
|
|||
|
memory_order_release, // release
|
|||
|
memory_order_acq_rel, // acquire/release
|
|||
|
memory_order_seq_cst // sequentially consistent
|
|||
|
/* "sequentially consistent" interacts with the more commonly\
|
|||
|
used acquire and release in ways difficult to understand or
|
|||
|
predict, and in ways that compiler and library writers
|
|||
|
disagree on. */
|
|||
|
} memory_order;
|
|||
|
```
|
|||
|
|
|||
|
I don’t think I understand how to use atomics correctly.
|
|||
|
|
|||
|
`Atomic_compare_exchange_weak_explicit` inside a while loop is
|
|||
|
a spin lock, and spin locks are complicated, apt to be inefficient,
|
|||
|
potentially catastrophic, and avoiding catastrophe is subtle and complex.
|
|||
|
|
|||
|
To cleanly express a concurrent algorithm you need a thousand
|
|||
|
communicating processes, as goroutines or node.js continuations, nearly
|
|||
|
all of which are sitting around waiting for the another thing to send them
|
|||
|
a message or be ready to receive their message, while atomics give you a
|
|||
|
fixed small number of threads all barreling full speed ahead. Whereupon
|
|||
|
you find yourself using spin locks.
|
|||
|
|
|||
|
Rather than moving data between threads, you need to move threads between
|
|||
|
data, between one continuation and the next.
|
|||
|
|
|||
|
Well, if you have a process that interacts with Sqlite, each thread has to
|
|||
|
have its own database connection, in which case it needs to be a pool of
|
|||
|
threads maybe you have a pool of database threads that do work received
|
|||
|
from a bunch of asynch tasks through a single fixed sized fifo queue, and
|
|||
|
send the results back through another fifo queue, with threads waking up
|
|||
|
when the queue gets more stuff in it, and going to sleep when the queue
|
|||
|
empties, with the last thread signalling “wake me up when there is
|
|||
|
something to do”, and pushback happening when buffer is full.
|
|||
|
|
|||
|
Go demonstrates that you can cleanly express algorithms as concurrent
|
|||
|
communicating processes using fixed size channels. An unbuffered
|
|||
|
channel is just a coprocess, with a single thread of execution switching
|
|||
|
between the two coprocesses, without any need for locks or atomics, but
|
|||
|
with a need for stack fixups. But Node.js seems to get by fine with code
|
|||
|
continuations instead of Go’s stack fixups.
|
|||
|
|
|||
|
A buffered channel is just a fixed size block of memory with alignment,
|
|||
|
size, and atomic wrapping read and write pointers.
|
|||
|
|
|||
|
Why do they need to be atomic?
|
|||
|
|
|||
|
So that the read thread can acquire the write pointer to see how much data
|
|||
|
is available, and release the read pointer so that the write thread can
|
|||
|
acquire the read pointer to see how much space is available, and
|
|||
|
conversely the write thread acquires the read pointer and releases the write
|
|||
|
pointer.And when write thread updates the write pointer it updates it *after*
|
|||
|
writing the data and does a release on the write pointer atomic, so that
|
|||
|
when the read thread does an acquire on the write pointer, all the data that
|
|||
|
the write pointer says was written will actually be there in the memory that
|
|||
|
read thread is looking at.
|
|||
|
|
|||
|
Multiple routines can send data into a single channel, and, with select, a
|
|||
|
single channel can receive data from any channels.
|
|||
|
|
|||
|
But, with go style programming, you are apt to have far more routines
|
|||
|
than actual hardware threads servicing them, so you are still going to need
|
|||
|
to sleep your threads, making atomic channels an optimization of limited
|
|||
|
value.
|
|||
|
|
|||
|
Your input buffer is empty. If you have one thread handling the one
|
|||
|
process for that input stream, going to have to sleep it. But this is costly.
|
|||
|
Better to have continuations that get executed when data is available in the
|
|||
|
channel, which means your channels are all piping to one thread, that then
|
|||
|
calls the appropriate code continuation. So how is one thread going to do a
|
|||
|
select on a thousand channels?
|
|||
|
|
|||
|
Well, we have a channel full of channels that need to be serviced. And
|
|||
|
when that channel empties, mutex.
|
|||
|
|
|||
|
Trouble is, I have not figured out how to have a thread wait on multiple
|
|||
|
channels. The C++ wait function does not implement a select. Well, it
|
|||
|
does, but you need a condition statement that looks over all the possible
|
|||
|
wake conditions. And it looks like all those wake conditions have to be on
|
|||
|
a single mutex, on which there is likely to be a lot of contention.
|
|||
|
|
|||
|
It seems that every thread grabs the lock, modifies the data protected by
|
|||
|
the lock, performs waits on potentially many condition variables all using
|
|||
|
the same lock and protected by the same lock, condition variables that
|
|||
|
look at conditions protected by the lock, then releases the lock
|
|||
|
immediately after firing the notify.
|
|||
|
|
|||
|
But it could happen that if we try to avoid unnecessarily grabbing the
|
|||
|
mutex, one thread sees the other thread awake, just when it is going to
|
|||
|
sleep, so I fear I have missed a spin lock somewhere in this story.
|
|||
|
|
|||
|
If we want to avoid unnecessary resort to mutex, we have to spin lock on a
|
|||
|
state machine that governs entry into mutex resolution. Each thread makes
|
|||
|
its decision based on the current state of channel and state machine, an
|
|||
|
does a `Atomic_compare_exchange_weak_explicit` to amend the state of the
|
|||
|
state machine. If the state machine has not changed, the decision goes
|
|||
|
through. If the state machine was changed, presumably by the other thread,
|
|||
|
it re-evaluates its decision and tries again.
|
|||
|
|
|||
|
Condition variables are designed to support the case where you have one
|
|||
|
thread or a potentially vast pool of threads waiting for work, but are not
|
|||
|
really designed to address the case where one thread is waiting for work
|
|||
|
from a potentially vast pool of threads, and I rather think I will have to
|
|||
|
handcraft a handler for this case from atomics and, ugh, dangerous spin
|
|||
|
loops implemented in atomics.
|
|||
|
|
|||
|
A zero capacity Go channel sort of corresponds to a C++ binary
|
|||
|
semaphore. A finite and small Go channel sort of corresponds to C++
|
|||
|
finite and small semaphore. Maybe the solution is semaphores, rather than
|
|||
|
atomic variables. But I am just not seeing a match.
|
|||
|
|
|||
|
I notice that notifications seems to be built out of a critical section, with
|
|||
|
lots of grabbing a mutex and releasing a mutex, with far too much
|
|||
|
grabbing a mutex and releasing a mutex. Under the hood, likely a too-clever
|
|||
|
and complicated use of threads piling up on the same critical
|
|||
|
section. So maybe we need some spin state atomic state machine system
|
|||
|
that drops spinning threads to wait on a semaphore. Each thread on a
|
|||
|
channel drops the most recent state channel after reading, and most recent
|
|||
|
state after writing, onto an atomic variable.
|
|||
|
|
|||
|
But the most general case is many to many, with many processes doing a
|
|||
|
select on many channels. We want a thread to sleep if all the channels on
|
|||
|
which it is doing a select are blocked on the operation it wants to do, and
|
|||
|
we want processes waiting on a channel to keep being woken up, one at a
|
|||
|
time, as long a channel has stuff that processes are waiting on.
|
|||
|
|
|||
|
# C++ Multithreading
|
|||
|
|
|||
|
`std:aysnc` is designed to support the case where threads spawn more
|
|||
|
threads if there is more work to do, and the pool of threads is not too large,
|
|||
|
and threads terminate when they are out of work, or do the work
|
|||
|
sequentially if doing it in parallel seems unlikely do yield benefits. C++ by
|
|||
|
default manages the decision for you.
|
|||
|
|
|||
|
Maybe the solution is to use threads where we need stack state, and
|
|||
|
continuations serviced by a single thread where we expect to handle one
|
|||
|
and only one reply. Node.js gets by fine on one thread and one database
|
|||
|
connection.
|
|||
|
|
|||
|
```C++
|
|||
|
#include &t;thread>
|
|||
|
static_assert(__STDCPP_THREADS__==1, "Needs threads");
|
|||
|
// As thread resources have to be managed, need to be wrapped in
|
|||
|
// RAII
|
|||
|
class ThreadRAII {
|
|||
|
std::thread & m_thread;
|
|||
|
public:
|
|||
|
// As a thread object is moveable but not copyable, the thread obj
|
|||
|
// needs to be constructed inside the invocation of the ThreadRAII
|
|||
|
// constructor. */
|
|||
|
ThreadRAII(std::thread & threadObj) : m_thread(threadObj){}
|
|||
|
~ThreadRAII(){
|
|||
|
// Check if thread is joinable then detach the thread
|
|||
|
if(m_thread.joinable()){
|
|||
|
m_thread.detach();
|
|||
|
}
|
|||
|
}
|
|||
|
};
|
|||
|
```
|
|||
|
|
|||
|
Examples of thread construction
|
|||
|
|
|||
|
```C++
|
|||
|
void foo(char *){
|
|||
|
…
|
|||
|
}
|
|||
|
|
|||
|
class foo_functor
|
|||
|
{
|
|||
|
public:
|
|||
|
void operator()(char *){
|
|||
|
…
|
|||
|
}
|
|||
|
};
|
|||
|
|
|||
|
|
|||
|
int main(){
|
|||
|
ThreadRAII thread_one(std::thread (foo, "one"));
|
|||
|
ThreadRAII thread_two(
|
|||
|
std::thread (
|
|||
|
(foo_functor()),
|
|||
|
"two"
|
|||
|
)
|
|||
|
);
|
|||
|
const char three[]{"three"};
|
|||
|
ThreadRAII thread_lambda(
|
|||
|
std::thread(
|
|||
|
[three](){
|
|||
|
…
|
|||
|
}
|
|||
|
)
|
|||
|
);
|
|||
|
}
|
|||
|
```
|
|||
|
|
|||
|
C++ has a bunch of threading facilities that are designed for the case that
|
|||
|
a normal procedural program forks a bunch of tasks to do stuff in parallel,
|
|||
|
and then when they are all done, merges the results with join or promise
|
|||
|
and future, and then the main program does its thing.
|
|||
|
|
|||
|
This is not so useful when the main program is a event oriented, rather
|
|||
|
than procedural.
|
|||
|
|
|||
|
If the main program is event oriented, then each thread has to stick around
|
|||
|
for the duration, and has to have its own event queue, which C++ does not
|
|||
|
directly provide.
|
|||
|
|
|||
|
In this case threads communicate by posting events, and primitives that do
|
|||
|
thread synchronization (promise, future, join) are not terribly useful.
|
|||
|
|
|||
|
A thread grabs its event queue, using the mutex, pops out the next event,
|
|||
|
releases the mutex, and does its thing.
|
|||
|
|
|||
|
If the event queue is empty, then, without releasing it, the thread
|
|||
|
processing events waits on a [condition variable](https://thispointer.com//c11-multithreading-part-7-condition-variables-explained/). (which wait releases the
|
|||
|
mutex). When another thread grabs the event queue mutex and stuffs
|
|||
|
something into into the event queue, it fires the [condition variable](https://thispointer.com//c11-multithreading-part-7-condition-variables-explained/), which
|
|||
|
wakes up and restores the mutex of the thread that will process the event
|
|||
|
queue.
|
|||
|
|
|||
|
Mutexes need to construct RAII objects, one of which we will use in
|
|||
|
constructing the condition object.
|