On the other hand, also, not quite so cleanly, represented by [asynch await] which makes for much lighter weight code, more cleanly interfaceable with C++.
Concurrency is not the same thing as parallelism.
A node.js program is typically thousands of communicating concurrent
processes, with absolutely no parallelism, in the sense that node.js is single
threaded, but a node.js program typically has an enormous number of code
continuations, each of which is in effect the state of a concurrent
communicating process. Lightweight threads as in Go are threads that on
hitting a pause get their stack state stashed into an event handler and
executed by event oriented code, so one can always accomplish the same
effect more efficiently by writing directly in event oriented code.
And it is frequently the case that when you cleverly implement many
concurrent processes with more than one thread of execution, so that some
of your many concurrent processes are executed in parallel, your program
runs slower, rather than faster.
C++ multithreading is written around a way of coding that in practice does
not seem all that useful – parallel bitbashing. The idea is that you are
doing one thing, but dividing that one thing up between several threads to get
more bits bashed per second, the archetypical example being a for loop
performed in parallel, and then all the threads join after the loop is
complete.
The normal case however is that you want to manage a thousand things at
once, for example a thousand connections to the server. You are not
worried about how many millions of floating point operations per second,
but you are worried about processes sitting around doing nothing while
waiting for network or disk operations to complete.
For this, you need concurrent communicating processes, as in Go or event
orientation as in node.js or nginx, node.js, not necessarily parallelism,
which C++ threads are designed around.
The need to deal with many peers and a potentially enormous number of
clients suggests multiprocessing in the style of Go and node.js, rather than
what C++ multiprocessing is designed around, suggests a very large
number of processes that are concurrent, but not all that parallel, rather
than a small number of processes that are concurrent and also substantially
parallel. Representing a process by a thread runs into troubles at around
sixty four threads.
It is probably efficient to represent interactions between peers as threads,
but client/peer are going to need either events or Go lightweight threads,
and client/client interactions are going to need events.
Existing operating systems run far more than sixty four threads, but this
only works because grouped into processes, and most of those processes
inactive. If you have more than sixty four concurrently active threads in an
active process, with the intent that half a dozen or so of those active
concurrent threads will be actually executing in parallel, as for example a
browser with a thread for each tab, and sixty four tabs, that active process
is likely to be not very active.
Thus scaling Apache, whether as threads on windows or processes under
Linux, is apt to die.
# Need the solutions implemented by Tokio, Actix, Node.js and Go
Not the solutions supplied by the C++ libraries, because we are worrying
about servers, not massive bit bashing.
Go routines and channels can cleanly express both the kind of problems
that node.js addresses, and also address the kind of problem that C++
threads address, typically that you divide a task into a dozen subtasks, and
then wait for them all to complete before you take the next step, which are
hard to express as node.js continuations. Goroutines are a more flexible
and general solution, that make it easier to express a wider range of
algorithms concisely and transparently, but I am not seeing any mass rush
from node.js to Go. Most of the time, it is easy enough to write in code
continuations inside an event handler.
The general concurrent task that Google’s massively distributed database
is intended to express is that you have a thousand tasks each of which
generate a thousand outputs, which get sorted, and each of the enormous
number of items that sort into the same equivalence group gets aggregated
in a commutative operation, which can therefore be handled by any
number of processes in any order, and possibly the entire sort sequence
gets aggregated in an associative operation, which can therefore be
handled by any number of processes in any order.
The magic in the Google massively parallel database is that one can define a
a massively parallel operation on a large number of items in a database
simultaneously, much as one defines a join in SQL, and one can define
another massively parallel operation as commutative and or associative
operations on the sorted output of such a massively parallel operation. But
we are not much interested in this capability. Though something
resembling that is going to be needed when we have to shard.
# doing node.js in C++
Dumb idea. We already have the node.js solution in a Rust library.
Throw up hands in despair, and provide an interface linking Go to secure
Zooko ids, similar to the existing interface linking it to Quic and SSL.
This solution has the substantial advantage that it would then be relatively
easy to drop in the existing social networking software written in Go, such
as Gitea.
We probably don’t want Go to start managing C++ spawned threads, but
the Go documentation seems to claim that when a Go heavyweight thread
gets stuck at a C mutex while executing C code, Go just spawns another to
deal with the lightweight threads when the lightweight threads start piling
up.
When a C++ thread wants to despatch an event to Go, it calls a Go routine
with a select and a default, so that the Go routine will never attempt to
pause the C++ spawned thread on the assumption that it is a Go spawned
thread. But it would likely be safer to call Goroutines on a thread that was
originally spawned by Go.
## doing it in C the C way
Processes represented as threads. Channels have a mutex. A thread grabs
total exclusive ownership of a channel whenever it takes something out or
puts something in. If a channel is empty or full, it then waits on a
condition on the mutex, and when the other thread grabs the mutex and
makes the channel ready, it notices that the other process or processes are
waiting on condition, the condition is now fulfilled, and sends a
notify_one.
Or, when the channel is neither empty nor full, we have an atomic spin lock,
and when sleeping might become necessary, then we go to full mutex resolution.
Which implies a whole pile of data global to all threads, which will have
to be atomically changed.
This can be done by giving each thread two buffers for this global data
subject to atomic operations, and single pointer or index that points to the
currently ruling global data set. (The mutex is also of course global, but
the flag saying whether to use atomics or mutex is located in a data
structure managed by atomics.)
When a thread wants to atomically update a large object (which should be
sixty four byte aligned) it constructs a copy of the current object, and
atomically updates the pointer to the copy, if the pointer was not changed
while it was constructing. The object is immutable while being pointed at.
Or we could have two such objects, with the thread spinning if one is in
use and the other already grabbed, or momentarily sleeping if an atomic
count indicates other threads are spinning on a switch awaiting
completion.
The read thread, having read, stores its read pointer atomically with
`memory_order_release`, ored with the flag saying if it is going to full
mutex resolution. It then reads the write pointer with
`memory_order_acquire`, that the write thread atomically wrote with
`memory_order_release`, and if all is well, keeps on reading, and if it is
blocked, or the write thread has gone to mutex resolution, sets its mutex
resolution flag and proceeds to mutex resolution. When it is coming out of
mutex resolution, about to release the mutex, it clears its mutex resolution
flag. The mutex is near the flags by memory location, all part of one object
that contains a mutex and atomic variables.
So the mutex flag is atomically set when the mutex has not yet been
acquired, but the thread is unconditionally going to acquire it, but non
atomically cleared when the mutex still belongs to the thread, but is
unconditionally going to release it.
If many read threads reading from one channel, then each thread has to
`memory_order_acquire` the read pointer, and then, instead of
`memory_order_release`ing it, has to do an
`atomic_compare_exchange_weak_explicit`, and if it changed while it was
reading abort its reads and start over.
Similarly if many write threads writing to one channel, each write thread
will have first spin lock acquire the privilege of being the sole write thread
writing, or spin lock acquire a range to write to. Thus in the most general
case, we have a spin locked atomic write state that specifies an area that
has been written to, an area that is being written to, and an area that is
available to be acquired for writing, a spin locked atomic read state, and
mutex that holds both the write state and the read state. In the case of a
vector buffer with multiple writers, the atomic states are three wrapping
atomic pointers that go through the buffer in the same direction,
We would like to use direct memory addresses, rather than vector or deque
addresses, which might require us to write our own vector or deque. See
the [thread safe deque](https://codereview.stackexchange.com/questions/238347/a-simple-thread-safe-deque-in-c "A simple thread-safe Deque in C++"), which however relies entirely on locks and mutexes,
and whose extension to atomic locks is not obvious.
Suppose you are doing atomic operations, but some operations might be
expensive and lengthy. You really only want to spin lock on amending data
that is small and all in close together in memory, so on your second spin,
C++ has a bunch of threading facilities that are designed for the case that
a normal procedural program forks a bunch of tasks to do stuff in parallel,
and then when they are all done, merges the results with join or promise
and future, and then the main program does its thing.
This is not so useful when the main program is a event oriented, rather
than procedural.
If the main program is event oriented, then each thread has to stick around
for the duration, and has to have its own event queue, which C++ does not
directly provide.
In this case threads communicate by posting events, and primitives that do
thread synchronization (promise, future, join) are not terribly useful.
A thread grabs its event queue, using the mutex, pops out the next event,
releases the mutex, and does its thing.
If the event queue is empty, then, without releasing it, the thread
processing events waits on a [condition variable](https://thispointer.com//c11-multithreading-part-7-condition-variables-explained/). (which wait releases the
mutex). When another thread grabs the event queue mutex and stuffs
something into into the event queue, it fires the [condition variable](https://thispointer.com//c11-multithreading-part-7-condition-variables-explained/), which
wakes up and restores the mutex of the thread that will process the event
queue.
Mutexes need to construct RAII objects, one of which we will use in