Added design of concurrency for peer to peer sockets
/docs/design/peer_socket.md
This commit is contained in:
parent
6dfee3e91f
commit
38f6246990
@ -72,12 +72,241 @@ When we initialize a connection, we establish a state machine
|
||||
at both ends, both the application factor of the state machine,
|
||||
and the socket factor of the state machine.
|
||||
|
||||
When I say we are using state machines, this is just the
|
||||
message handling event oriented architecture familiar in
|
||||
programming graphical user interfaces.
|
||||
|
||||
Such a program consists of a pile of derived classes whose
|
||||
base classes have all the machinery for handling messages.
|
||||
Each instance of one of these classes is a state machine,
|
||||
which contains member functions that are event handlers.
|
||||
|
||||
So when I say "state machine", I mean a class for handling
|
||||
events like the many window classes in wxWidgets.
|
||||
|
||||
One big difference will be that we will be posting a lot of events
|
||||
that we expect to trigger events back to us. And we will want
|
||||
to know which posting the returning event came from. So we will
|
||||
want to create some data that is associated with the fired event,
|
||||
and when a resulting event is fired back to us, we can get the
|
||||
correct associated data, because we might fire a lot of events,
|
||||
and they might come back out of order. Gui code has this capability,
|
||||
but it is rarely used.
|
||||
|
||||
## Implementing concurrent state machines in C++
|
||||
|
||||
Most of this is me re-inventing Asio, which is part of the
|
||||
immense collection of packages of Msys2, Obviously I would be
|
||||
better off integrating Asio than rebuilding it from the ground up
|
||||
But I need to figure out what needs to be done, so that I can
|
||||
find the equivalent Asio functionality.
|
||||
|
||||
Or maybe Asio is bad idea. Boost Asio was horribly broken.
|
||||
I am seeing lots of cool hot projects using Tokio, not seeing any cool
|
||||
hot projects use Asio.
|
||||
If Bittorrent DHT library did their own
|
||||
implementation of concurrent communicating processes,
|
||||
maybe Asio is broken at the core
|
||||
|
||||
And for flow control, I am going to have to integrate Quic,
|
||||
though I will have to fork it to change its security model
|
||||
from certificate authorities to Zooko names. You can in theory
|
||||
easily plug any kind of socket into Asio,
|
||||
but I see a suspicious lack of people plugging Quic into it,
|
||||
because Quic contains a huge amount of functionality that Asio
|
||||
knows nothing of. But if I am forking it, can probably ignore
|
||||
or discard most of that functionality.
|
||||
|
||||
Gui code is normally single threaded, because it is too much of
|
||||
a bitch to lock an instance of a message handling class when one of
|
||||
its member functions is handling a message (the re-entrancy problem).
|
||||
|
||||
However the message plumbing has to know which class is going
|
||||
to handle the message (unless the message is being handled by a
|
||||
stateless state machine, which it often is) so there is no reason
|
||||
the message handling machinery could not atomically lock the class
|
||||
before calling its member function, and free it on return from
|
||||
its member function.
|
||||
|
||||
State machines (message handling classes, as for example
|
||||
in a gui) are allocated in the heap and owned by the message
|
||||
handling machinery. The base constructor of the object plugs it
|
||||
into the message handling machinery. (Well, wxWidgets uses the
|
||||
base constructor with virtual classes to plug it in, but likely the
|
||||
curiously recurring template pattern would be saner
|
||||
as in ATL and WTL.)
|
||||
|
||||
This means they have to be destroyed by sending a message to the message
|
||||
handling machinery, which eventually results in
|
||||
the destructor being called. The destructor does not have to worry
|
||||
about cleanup in all the base classes, because the message handling
|
||||
machinery is responsible for all that stuff.
|
||||
|
||||
Our event despatcher does not call a function pointer,
|
||||
because our event handlers are member functions.
|
||||
We call an object of type `std::function`. We could also use pointer to member,
|
||||
which is more efficient.
|
||||
|
||||
All this complicated machinery is needed because we assume
|
||||
our interaction is stateful. But suppose it is not. The request‑reply
|
||||
pattern, where the request contains all information that
|
||||
determines the reply is very common, probably the great majority
|
||||
of cases. This corresponds to an incoming message where the
|
||||
in‑regards‑to field and in‑reply‑to field is empty,
|
||||
because the incoming message initiates the conversation,
|
||||
and its type and content suffices to determine the reply. Or the incoming message
|
||||
causes the recipient to reply and also set up a state machine,
|
||||
or a great big pile of state machines (instances of a message handling class),
|
||||
which will handle the lengthy subsequent conversation,
|
||||
which when it eventually ends results in those objects being destroyed,
|
||||
while the connection continues to exist.
|
||||
|
||||
In the case of an incoming message of that type, it is despatched to
|
||||
a fully re-entrant static function on the basis of its type.
|
||||
The message handling machinery calls a function pointer,
|
||||
not a class member.
|
||||
We don't use, should not use, and cannot use, all the
|
||||
message handling infrastructure that keeps track of state.
|
||||
|
||||
|
||||
## receive a message with no in‑regards‑to field, no in‑reply‑to field
|
||||
|
||||
This is directed to a re-entrant function, not a functor,
|
||||
because re‑entrant and stateless.
|
||||
It is directed according to message type.
|
||||
|
||||
### A message initiating a conversation
|
||||
|
||||
It creates a state machine (instance of a message handling class)
|
||||
sends the start event to the state machine, and the state machine
|
||||
does whatever it does. The state machine records what message
|
||||
caused it to be created, and for its first message,
|
||||
uses it in the in‑reply‑to field, and for subsequent messages,
|
||||
for its in‑regards‑to field,
|
||||
|
||||
### A request-reply message.
|
||||
|
||||
Which sends a message with the in-reply-to field set.
|
||||
The recipient is expected to have a hash-map associating this field
|
||||
with information as to what to do with the message.
|
||||
|
||||
#### A request-reply message where counterparty matters.
|
||||
|
||||
Suppose we want to read information about this entity from
|
||||
the database, and then write that information. Counterparty
|
||||
information is likely to be needed to be durable. Then we do
|
||||
the read-modify write as a single sql statement,
|
||||
and let the database serialize it.
|
||||
|
||||
## receive a message with no in‑regards‑to field, but with an in‑reply‑to field
|
||||
|
||||
The dispatch layer looks up a hash-map table of functors,
|
||||
by the id of the field and id of the sender, and despatches the message to
|
||||
that functor to do whatever it does.
|
||||
When this is the last message expected in‑reply‑to the functor
|
||||
frees itself, removes itself from the hash-map. If a message
|
||||
arrives with no entry in the table, it is silently dropped.
|
||||
|
||||
## receive a message with an in‑regards‑to field, with or without an in‑reply‑to to field.
|
||||
|
||||
Just as before, the dispatch table looks up the hash-map of state machines
|
||||
(instances of message handling classes) and dispatches
|
||||
the message to the stateful message handler, which figures out what
|
||||
to do with it according to its internal state. What to do with an
|
||||
in‑reply‑to field, if there is one, is something the stateful
|
||||
message handler will have to decide. It might have its own hashmap for
|
||||
the in‑reply‑to field, but this would result in state management and state
|
||||
transition of huge complexity. The expected usage is it has a small
|
||||
number of static fields in its state that reference a limited number
|
||||
of recently sent messages, and if the incoming message is not one
|
||||
of them, it treats it as an error. Typically the state machine is
|
||||
only capable of handling the
|
||||
response to its most recent message, and merely wants to be sure
|
||||
that this *is* a response to its most recent message. But it could
|
||||
have shot off half a dozen messages with the in‑regards‑to field set,
|
||||
and want to handle the response to each one differently.
|
||||
Though likely such a scenario would be easier to handle by creating
|
||||
half a dozen state machines, each handling its own conversation
|
||||
separately. On the other hand, if it is only going to be a fixed
|
||||
and finite set of conversations, it can put all ongoing state in
|
||||
a fixed and finite set of fields, each of which tracks the most
|
||||
recently sent message for which a reply is expected.
|
||||
|
||||
|
||||
## A complex conversation.
|
||||
|
||||
We want to avoid complex conversations, and stick to the
|
||||
request‑reply pattern as much as possible. But this is apt to result
|
||||
in the server putting a pile of opaque data (a cookie) its reply,
|
||||
which it expects to have sent back with every request.
|
||||
And these cookies can get rather large.
|
||||
|
||||
Bob decides to initiate a complex conversation with Carol.
|
||||
|
||||
He creates an instance of a state machine (instance of a message
|
||||
handling class) and sends a message with no in‑regards‑to field
|
||||
and no in‑reply‑to field but when he sends that initial message,
|
||||
his state machine gets put in, and owned by,
|
||||
the dispatch table according to the message id.
|
||||
|
||||
Carol, on receiving the message, also creates a state machine,
|
||||
associated with that same message id, albeit the counterparty is
|
||||
Bob, rather than Carol, which state machine then sends a reply to
|
||||
that message with the in‑reply‑to field set, and which therefore
|
||||
Bob's dispatch layer dispatches to the appropriate state machine
|
||||
(message handler)
|
||||
|
||||
And then it is back and forth between the two stateful message handlers
|
||||
both associated with the same message id until they shut themselves down.
|
||||
|
||||
## factoring layers.
|
||||
|
||||
A layer is code containing state machines that receive messages
|
||||
on one machine, and then send messages on to other code on
|
||||
*the same machine*. The sockets layer is the code that receives
|
||||
messages from the application layer, and then sends them on the wire,
|
||||
and the code that receives messages from the wire,
|
||||
and sends messages to the application layer.
|
||||
|
||||
But a single state machine at the application level could be
|
||||
handling several connections, and a single connection could have
|
||||
several state machines running independently, and the
|
||||
socket code should not need to care.
|
||||
|
||||
Further, these concurrent communicating processes are going to
|
||||
We have a socket layer that receives messages containing an
|
||||
opaque block of bytes, and then sends a message to
|
||||
the application layer message despatch machinery, for whom the
|
||||
block is not opaque, but rather identifies a message type
|
||||
meaningful for the despatch layer, but meaningless for the socket layer.
|
||||
|
||||
The state machine terminates when its job is done,
|
||||
freeing up any allocated memory,
|
||||
but the connection endures for the life of the program,
|
||||
and most of the data about a connection endures in
|
||||
an sql database between reboots.
|
||||
|
||||
The connection is a long lived state machine running in
|
||||
the sockets layer, which sends and receives what are to it opaque
|
||||
blocks of bytes to and from the dispatch layer, and the dispatch
|
||||
layer interprets these blocks of bytes as having information
|
||||
(message type, in‑regards‑to field and in‑reply‑to field)
|
||||
that enables it to despatch the message to a particular method
|
||||
of a particular instance of a message handling class in C++,
|
||||
corresponding to a particular channel in Go.
|
||||
And these message handling classes are apt to be short lived,
|
||||
being destroyed when their task is complete.
|
||||
|
||||
Because we can have many state machines on a connection,
|
||||
most of our state machines can have very little state,
|
||||
typically an infinite receive loop, an infinite send receive loop,
|
||||
or an infinite receive send loop, which have no state at all,
|
||||
are stateless. We factorize the state machine into many state machines
|
||||
to keep each one manageable.
|
||||
|
||||
|
||||
## Comparison with concurrent interacting processes in Go
|
||||
|
||||
These concurrent communicating processes are going to
|
||||
be sending messages to each other on the same machine.
|
||||
We need to model Go's goroutines.
|
||||
|
||||
@ -93,7 +322,7 @@ in different machines.
|
||||
|
||||
The equivalent of Go channel is not a connection. Rather,
|
||||
one sends a message to the other to request it create a state machine,
|
||||
which will be the in-regards-to message, and the equivalent of a
|
||||
which will correspond to the in-regards-to message, and the equivalent of a
|
||||
Go channel is a message type, the in-regards-to message id,
|
||||
and the connection id. Which we pack into a single class so that we
|
||||
can use it the way Go uses channels.
|
||||
@ -108,8 +337,9 @@ correct factor of that state machine, which we have factored into
|
||||
as many very small, and preferably stateless, state machines as possible.
|
||||
|
||||
We factor the potentially ginormous state machine into
|
||||
many small state machines, in the same style as Go factors a potentially
|
||||
ginormous Goroutine into many small goroutines.
|
||||
many small state machines (message handling classes),
|
||||
in the same style as Go factors a potentially ginormous Goroutine into
|
||||
many small goroutines.
|
||||
|
||||
The socket code being a state machine composed of many
|
||||
small state machines, which communicates with the application layer
|
||||
@ -128,47 +358,14 @@ the opaque block of bytes, and the recipient deserializes,
|
||||
and sends it to a routine that acts on an object of that
|
||||
deserialized class.
|
||||
|
||||
Since the sockets layer does not know the internals of the message struct, the message has
|
||||
to be serialized and deserialized into the corresponding class by the application layer.
|
||||
|
||||
Or perhaps the callback routine deserializes the object into a particular class, and then calls
|
||||
a routine for that class, but another state machine on the application layer would call the
|
||||
class specific routine directly. The equivalent of Go channel between one state machine on the
|
||||
application layer and another in the same application layer is directly calling what
|
||||
the class specific routine that the callback routine would call.
|
||||
|
||||
The state machine terminates when its job is done,
|
||||
freeing up any allocated memory,
|
||||
but the connection endures for the life of the program,
|
||||
and most of the data about a connection endures in
|
||||
an sql database between reboots.
|
||||
|
||||
Because we can have many state machines on a connection,
|
||||
most of our state machines can have very little state,
|
||||
typically an infinite receive loop, an infinite send receive loop,
|
||||
or an infinite receive send loop, which have no state at all,
|
||||
are stateless. We factorize the state machine into many state machines
|
||||
to keep each one manageable.
|
||||
Since the sockets layer does not know the internals
|
||||
of the message struct, the message has to be serialized and deserialized
|
||||
into the corresponding class by the dispatch layer,
|
||||
and thence to the application layer.
|
||||
|
||||
Go code tends to consist of many concurrent processes
|
||||
continually being spawned by a master concurrent process,
|
||||
and themselves spawning more concurrent processes.
|
||||
For most state machines, we do not need recursion,
|
||||
so it is reasonable for their state to be a fixed allocation
|
||||
inside the state of their master concurrent process.
|
||||
In the unlikely event we do need recursion
|
||||
we usually only have one instance running at one time,
|
||||
so we can allocate an `std::stack` in the master concurrent process.
|
||||
|
||||
And what if we do want to spawn many in parallel?
|
||||
Well, they will usually be stateless.
|
||||
What if they are not not stateless?
|
||||
Well that would require an `std::vector` of states.
|
||||
And if we need many running in parallel with recursion,
|
||||
an `std::vector` with each element containing an `std::stack`.
|
||||
And to avoid costly reallocations, we create the `std::vector`
|
||||
and the `std::vector`s underlying the `std::stack`s with
|
||||
realistic initial allocations that are only infrequently exceeded.
|
||||
|
||||
# flow control and reliability
|
||||
|
||||
@ -182,7 +379,9 @@ to handle bulk data transfer is to fork it to use Zooko based keys.
|
||||
[Tailscale]:https://tailscale.com/blog/how-nat-traversal-works
|
||||
"How to communicate peer-to-peer through NAT firewalls"{target="_blank"}
|
||||
|
||||
[Tailscale] has solved a problem very similar to the one I am trying to solve, albeit their solutions rely on a central human authority, and they recommend:
|
||||
[Tailscale] has solved a problem very similar to the one I am trying to solve,
|
||||
albeit their solutions rely on a central human authority,
|
||||
which authority they ask for money and they recommend:
|
||||
|
||||
> If you’re reaching for TCP because you want a
|
||||
> stream‑oriented connection when the NAT traversal is done,
|
||||
@ -198,6 +397,10 @@ matter what, if we stall in the socket layer rather than the
|
||||
application layer, which makes life a whole lot easier for the
|
||||
application programmer, going to need something like Tokio.
|
||||
|
||||
Or we could open up Quic, which we have to do anyway
|
||||
to get it to use our keys rather than enemy controlled keys,
|
||||
and plug it into our C++ message passing layer.
|
||||
|
||||
On the application side, we have to lock each state machine
|
||||
when it is active. It can only handle one message at at time.
|
||||
So the despatch layer has to queue up messages and stash them somewhere,
|
||||
@ -230,6 +433,13 @@ Most of this machinery seems like a re-implementation of Tokio-rust,
|
||||
which is a huge project. I don't wanna learn Tokio-rust, but equally
|
||||
I don't want to re-invent the wheel.
|
||||
|
||||
Or perhaps we could just integrate QUICs internal message
|
||||
passing infrastructure to our message passing infrastructure.
|
||||
It probably already supports a message passing interface.
|
||||
|
||||
Instead of synchronously writing data, you send a message to it
|
||||
to write some data, and hen it is done, it calls a callback.
|
||||
|
||||
# Minimal system
|
||||
|
||||
Prototype. Limit global bandwidth at the application
|
||||
|
@ -151,6 +151,20 @@ paid for in services or crypto currency.
|
||||
filecoin provides this, but is useless for frequent small incremental
|
||||
backups.
|
||||
|
||||
## Bittorrent DHT library
|
||||
|
||||
This is a general purpose library, not all that married to bittorrent
|
||||
|
||||
It is available of as an MSYS2 library , MSYS2 being a fork of
|
||||
the semi abandoned mingw libary, with the result that the name of the
|
||||
very dead project Mingw-w64 is all over it.
|
||||
|
||||
Its pacman name is mingw-w64-dht, but it has repos all over the plac under its own name
|
||||
|
||||
It is async, driven by being called on a timer, and called when
|
||||
data arrives. It contains a simple example program, that enables you to publish any data you like.
|
||||
|
||||
|
||||
## libp2p
|
||||
|
||||
[p2p]:https://github.com/elenaf9/p2p
|
||||
@ -814,7 +828,17 @@ libraries, but I hear it cursed as a complex mess, and no one wants to
|
||||
get into it. They find the far from easy `cmake` easier. And `cmake`
|
||||
runs on all systems, while autotools only runs on linux.
|
||||
|
||||
I believe `cmake` has a straightforward pipeline into `*.deb` files, but if it has, the autotools pipleline is far more common and widely used.
|
||||
MSYS2, which runs on Windows, supports autotools. So, maybe it does run
|
||||
on windows.
|
||||
|
||||
[autotools documentation]:https://thoughtbot.com/blog/the-magic-behind-configure-make-make-install
|
||||
{target="_blank"}
|
||||
|
||||
Despite the complaints about autotools, there is [autotools documentation]
|
||||
on the web that does not make it sound too bad.
|
||||
|
||||
I believe `cmake` has a straightforward pipeline into `*.deb` files,
|
||||
but if it has, the autotools pipleline is far more common and widely used.
|
||||
|
||||
## The standard windows installer
|
||||
|
||||
@ -847,6 +871,8 @@ NSIS can create msi files for windows, and is open source.
|
||||
|
||||
[NSIS Open Source repository]
|
||||
|
||||
NSIS is also available as an MSYS package
|
||||
|
||||
People who know what they are doing seem to use this open
|
||||
source install system, and they write nice installs with it.
|
||||
|
||||
@ -1596,6 +1622,12 @@ You can create a pool of threads processing connection handlers (and waiting
|
||||
for finalizing database connection), by running `io_service::run()` from
|
||||
multiple threads. See Boost.Asio docs.
|
||||
|
||||
## Asio
|
||||
I tried boost asio, and concluded it was broken, trying to do stuff that cannot be done,
|
||||
and hide stuff that cannot be hidden in abstractions that leak horribly.
|
||||
|
||||
But Asio by itself (comes with MSYS2) might work.
|
||||
|
||||
## Asynch Database access
|
||||
|
||||
MySQL 5.7 supports [X Plugin / X Protocol, which allows asynchronous query execution and NoSQL But X devapi was created to support node.js and stuff. The basic idea is that you send text messages to mysql on a certain port, and asynchronously get text messages back, in google protobuffs, in php, JavaScript, or sql. No one has bothered to create a C++ wrapper for this, it being primarily designed for php or node.js](https://dev.mysql.com/doc/refman/5.7/en/document-store-setting-up.html)
|
||||
|
@ -437,11 +437,23 @@ A class can be explicitly defined to take aggregate initialization
|
||||
}
|
||||
}
|
||||
|
||||
but that does not make it of aggregate type. Aggregate type has *no*
|
||||
constructors except default and deleted constructors
|
||||
but that does not make it of aggregate type.
|
||||
Aggregate type has *no* constructors
|
||||
except default and deleted constructors
|
||||
|
||||
# functional programming
|
||||
|
||||
A lambda is a nameless value of a nameless class that is a
|
||||
functor, which is to say, has `operator()` defined.
|
||||
|
||||
But, of course you can get the class with `decltype`
|
||||
and assign that nameless value to an `auto` variable,
|
||||
or stash it on the heap with `new`,
|
||||
or in preallocated memory with placement `new`
|
||||
|
||||
But if you are doing all that, might as well explicitly define a
|
||||
named functor class.
|
||||
|
||||
To construct a lambda in the heap:
|
||||
|
||||
auto p = new auto([a,b,c](){})
|
||||
@ -475,8 +487,8 @@ going to have to introduce a compile time name, easier to do it as an
|
||||
old fashioned function, method, or functor, as a method of a class that
|
||||
is very possibly pod.
|
||||
|
||||
If we are sticking a lambda around to be called later, might copy it by
|
||||
value into a templated class, or might put it on the heap.
|
||||
If we are sticking a lambda around to be called later, might copy
|
||||
it by value into a templated class, or might put it on the heap.
|
||||
|
||||
auto bar = []() {return 5;};
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user