Added design of concurrency for peer to peer sockets

/docs/design/peer_socket.md
2023-12-24 03:39:39 +00:00 · 2023-12-24 03:39:39 +00:00 · 38f6246990
commit 38f6246990
parent 6dfee3e91f
3 changed files with 301 additions and 47 deletions
--- a/docs/design/peer_socket.md
+++ b/docs/design/peer_socket.md
@ -72,12 +72,241 @@ When we initialize a connection, we establish a state machine
 at both ends, both the application factor of the state machine,
 and the socket factor of the state machine.

+When I say we are using state machines, this is just the
+message handling event oriented architecture familiar in
+programming graphical user interfaces.
+
+Such a program consists of a pile of derived classes whose
+base classes have all the machinery for handling messages.
+Each instance of one of these classes is a state machine,
+which contains member functions that are  event handlers.
+
+So when I say "state machine", I mean a class for handling
+events like the many window classes in wxWidgets.
+
+One big difference will be that we will be posting a lot of events
+that we expect to trigger events back to us.  And we will want
+to know which posting the returning event came from.  So we will
+want to create some data that is associated with the fired event,
+and when a resulting event is fired back to us, we can get the
+correct associated data, because we might fire a lot of events,
+and they might come back out of order.  Gui code has this capability,
+but it is rarely used.
+
+## Implementing concurrent state machines in C++
+
+Most of this is me re-inventing Asio, which is part of the
+immense collection of packages of Msys2,  Obviously I would be
+better off integrating Asio than rebuilding it from the ground up
+But I need to figure out what needs to be done, so that I can
+find the equivalent Asio functionality.
+
+Or maybe Asio is bad idea.  Boost Asio was horribly broken.
+I am seeing lots of cool hot projects using Tokio, not seeing any cool
+hot projects use Asio.
+If Bittorrent DHT library did their own
+implementation of concurrent communicating processes,
+maybe Asio is broken at the core
+
+And for flow control, I am going to have to integrate Quic,
+though I will have to fork it to change its security model
+from certificate authorities to Zooko names.  You can in theory
+easily plug any kind of socket into Asio, 
+but I see a suspicious lack of people plugging Quic into it,
+because Quic contains a huge amount of functionality that Asio
+knows nothing of.  But if I am forking it, can probably ignore
+or discard most of that functionality.
+
+Gui code is normally single threaded, because it is too much of
+a bitch to lock an instance of a message handling class when one of
+its member functions is handling a message (the re-entrancy problem).
+
+However the message plumbing has to know which class is going
+to handle the message (unless the message is being handled by a
+stateless state machine, which it often is) so there is no reason
+the message handling machinery could not atomically lock the class
+before calling its member function, and free it on return from
+its member function.
+
+State machines (message handling classes, as for example
+in a gui) are allocated in the heap and owned by the message
+handling machinery.  The base constructor of the object plugs it
+into the message handling machinery.  (Well, wxWidgets uses the
+base constructor with virtual classes to plug it in, but likely the
+curiously recurring template pattern would be saner
+as in ATL and WTL.)
+
+This means they have to be destroyed by sending a message to the message
+handling machinery, which eventually results in
+the destructor being called.  The destructor does not have to worry
+about cleanup in all the base classes, because the message handling
+machinery is responsible for all that stuff.
+
+Our event despatcher does not call a function pointer,
+because our event handlers are member functions.
+We call an object of type `std::function`. We could also use pointer to member,
+which is more efficient.
+
+All this complicated machinery is needed because we assume
+our interaction is stateful.  But suppose it is not.  The request‑reply
+pattern, where the request contains all information that
+determines the reply is very common, probably the great majority
+of cases.  This corresponds to an incoming message where the
+in‑regards‑to field and in‑reply‑to field is empty,
+because the incoming message initiates the conversation,
+and its type and content suffices to determine the reply. Or the incoming message
+causes the recipient to reply and also set up a state machine,
+or a great big pile of state machines (instances of a message handling class),
+which will handle the lengthy subsequent conversation,
+which when it eventually ends results in those objects being destroyed,
+while the connection continues to exist.
+
+In the case of an incoming message of that type, it is despatched to
+a fully re-entrant static function on the basis of its type.
+The message handling machinery calls a function pointer,
+not a class member.
+We don't use, should not use, and cannot use, all the
+message handling infrastructure that keeps track of state.
+
+
+## receive a message with no in‑regards‑to field, no in‑reply‑to field
+
+This is directed to a re-entrant function, not a functor,
+because re‑entrant and stateless. 
+It is directed according to message type.
+
+### A message initiating a conversation
+
+It creates a state machine (instance of a message handling class)
+sends the start event to the state machine, and the state machine
+does whatever it does.  The state machine records what message
+caused it to be created, and for its first message,
+uses it in the in‑reply‑to field, and for subsequent messages,
+for its in‑regards‑to field,
+
+### A request-reply message.
+
+Which sends a message with the in-reply-to field set.
+The recipient is expected to have a hash-map associating this field
+with information as to what to do with the message.
+
+#### A request-reply message where counterparty matters.
+
+Suppose we want to read information about this entity from
+the database, and then write that information.  Counterparty
+information is likely to be needed to be durable. Then we do
+the read-modify write as a single sql statement,
+and let the database serialize it.
+
+## receive a message with no in‑regards‑to field, but with an in‑reply‑to field
+
+The dispatch layer looks up a hash-map table of functors,
+by the id of the field and id of the sender, and despatches the message to
+that functor to do whatever it does.
+When this is the last message expected in‑reply‑to the functor
+frees itself, removes itself from the hash-map.  If a message
+arrives with no entry in the table, it is silently dropped.
+
+## receive a message with an in‑regards‑to field, with or without an in‑reply‑to to field.
+
+Just as before, the dispatch table looks up the hash-map of state machines
+(instances of message handling classes) and dispatches
+the message to the stateful message handler, which figures out what
+to do with it according to its internal state.  What to do with an
+in‑reply‑to field, if there is one, is something the stateful
+message handler will have to decide.  It might have its own hashmap for
+the in‑reply‑to field, but this would result in state management and state
+transition of huge complexity.  The expected usage is it has a small
+number of static fields in its state that reference a limited number
+of recently sent messages, and if the incoming message is not one
+of them, it treats it as an error.  Typically the state machine is 
+only capable of handling the
+response to its most recent message, and merely wants to be sure
+that this *is* a response to its most recent message.  But it could
+have shot off half a dozen messages with the in‑regards‑to field set,
+and want to handle the response to each one differently.
+Though likely such a scenario would be easier to handle by creating
+half a dozen state machines, each handling its own conversation 
+separately. On the other hand, if it is only going to be a fixed
+and finite set of conversations, it can put all ongoing state in
+a fixed and finite set of fields, each of which tracks the most
+recently sent message for which a reply is expected.
+
+
+## A complex conversation.
+
+We want to avoid complex conversations, and stick to the
+request‑reply pattern as much as possible.  But this is apt to result
+in the server putting a pile of opaque data (a cookie) its reply,
+which it expects to have sent back with every request.
+And these cookies can get rather large.
+
+Bob decides to initiate a complex conversation with Carol.
+
+He creates an instance of a state machine (instance of a message
+handling class) and sends a message with no in‑regards‑to field
+and no in‑reply‑to field but when he sends that initial message,
+his state machine gets put in, and owned by,
+the dispatch table according to the message id.  
+
+Carol, on receiving the message, also creates a state machine,
+associated with that same message id, albeit the counterparty is
+Bob, rather than Carol, which state machine then sends a reply to
+that message with the in‑reply‑to field set, and which therefore
+Bob's dispatch layer dispatches to the appropriate state machine
+(message handler)
+
+And then it is back and forth between the two stateful message handlers
+both associated with the same message id until they shut themselves down.
+ 
+## factoring layers.
+
+A layer is code containing state machines that receive messages
+on one machine, and then send messages on to other code on
+*the same machine*.  The sockets layer is the code that receives
+messages from the application layer, and then sends them on the wire,
+and the code that receives messages from the wire,
+and sends messages to the application layer.
+
 But a single state machine at the application level could be
 handling several connections, and a single connection could have
 several state machines running independently, and the
 socket code should not need to care.

-Further, these concurrent communicating processes are going to
+We have a socket layer that receives messages containing an
+opaque block of bytes, and then sends a message to
+the application layer message despatch machinery, for whom the
+block is not opaque, but rather identifies a message type
+meaningful for the despatch layer, but meaningless for the socket layer.
+
+The state machine terminates when its job is done,
+freeing up any allocated memory,
+but the connection endures for the life of the program,
+and most of the data about a connection endures in
+an sql database between reboots.
+
+The connection is a long lived state machine running in
+the sockets layer, which sends and receives what are to it opaque
+blocks of bytes to and from the dispatch layer, and the dispatch
+layer interprets these blocks of bytes as having information
+(message type, in‑regards‑to field and in‑reply‑to field)
+that enables it to despatch the message to a particular method
+of a particular instance of a message handling class in C++,
+corresponding to a particular channel in Go.
+And these message handling classes are apt to be short lived,
+being destroyed when their task is complete.
+
+Because we can have many state machines on a connection,
+most of our state machines can have very little state,
+typically an infinite receive loop, an infinite send receive loop,
+or an infinite receive send loop, which have no state at all,
+are stateless.  We factorize the state machine into many state machines
+to keep each one manageable.
+
+
+## Comparison with concurrent interacting processes in Go
+
+These concurrent communicating processes are going to
 be sending messages to each other on the same machine.
 We need to model Go's goroutines.

@ -93,7 +322,7 @@ in different machines.

 The equivalent of Go channel is not a connection.  Rather,
 one sends a message to the other to request it create a state machine,
-which will be the in-regards-to message, and the equivalent of a
+which will correspond to the in-regards-to message, and the equivalent of a
 Go channel is a message type, the in-regards-to message id,
 and the connection id.  Which we pack into a single class so that we
 can use it the way Go uses channels.
@ -108,8 +337,9 @@ correct factor of that state machine, which we have factored into
 as many very small, and preferably stateless, state machines as possible.

 We factor the potentially ginormous state machine into
-many small state machines, in the same style as Go factors a potentially
-ginormous Goroutine into many small goroutines.
+many small state machines (message handling classes),
+in the same style as Go factors a potentially ginormous Goroutine into
+many small goroutines.

 The socket code being a state machine composed of many
 small state machines, which communicates with the application layer
@ -128,47 +358,14 @@ the opaque block of bytes, and the recipient deserializes,
 and sends it to a routine that acts on an object of that
 deserialized class.

-Since the sockets layer does not know the internals of the message struct, the message has 
-to be serialized and deserialized into the corresponding class by the application layer.
-
-Or perhaps the callback routine deserializes the object into a particular class, and then calls
-a routine for that class, but another state machine on the application layer would call the 
-class specific routine directly. The equivalent of Go channel between one state machine on the
-application layer and another in the same application layer is directly calling what
-the class specific routine that the callback routine would call.
-
-The state machine terminates when its job is done,
-freeing up any allocated memory,
-but the connection endures for the life of the program,
-and most of the data about a connection endures in
-an sql database between reboots.
-
-Because we can have many state machines on a connection,
-most of our state machines can have very little state,
-typically an infinite receive loop, an infinite send receive loop,
-or an infinite receive send loop, which have no state at all,
-are stateless.  We factorize the state machine into many state machines
-to keep each one manageable.
+Since the sockets layer does not know the internals
+of the message struct, the message has to be serialized and deserialized
+into the corresponding class by the dispatch layer,
+and thence to the application layer.

 Go code tends to consist of many concurrent processes
 continually being spawned by a master concurrent process,
 and themselves spawning more concurrent processes.
-For most state machines, we do not need recursion,
-so it is reasonable for their state to be a fixed allocation
-inside the state of their master concurrent process.
-In the unlikely event we do need recursion
- we usually only have one instance running at one time,
-so we can allocate an `std::stack` in the master concurrent process.
-
-And what if we do want to spawn many in parallel?
-Well, they will usually be stateless.
-What if they are not not stateless?
-Well that would require an `std::vector` of states.
-And if we need many running in parallel with recursion,
-an `std::vector` with each element containing an `std::stack`.
-And to avoid costly reallocations, we create the `std::vector`
-and the `std::vector`s underlying the `std::stack`s with
-realistic initial allocations that are only infrequently exceeded.

 # flow control and reliability

@ -182,7 +379,9 @@ to handle bulk data transfer is to fork it to use Zooko based keys.
 [Tailscale]:https://tailscale.com/blog/how-nat-traversal-works
 "How to communicate peer-to-peer through NAT firewalls"{target="_blank"}

-[Tailscale] has solved a problem very similar to the one I am trying to solve, albeit their solutions rely on a central human authority, and they recommend:
+[Tailscale] has solved a problem very similar to the one I am trying to solve,
+albeit their solutions rely on a central human authority,
+which authority they ask for money and they recommend:

 > If you’re reaching for TCP because you want a
 > stream‑oriented connection when the NAT traversal is done,
@ -198,6 +397,10 @@ matter what, if we stall in the socket layer rather than the
 application layer, which makes life a whole lot easier for the
 application programmer, going to need something like Tokio.

+Or we could open up Quic, which we have to do anyway
+to get it to use our keys rather than enemy controlled keys,
+and plug it into our C++ message passing layer.
+
 On the application side, we have to lock each state machine
 when it is active.  It can only handle one message at at time.
 So the despatch layer has to queue up messages and stash them somewhere,
@ -230,6 +433,13 @@ Most of this machinery seems like a re-implementation of Tokio-rust,
 which is a huge project.  I don't wanna learn Tokio-rust, but equally
 I don't want to re-invent the wheel.

+Or perhaps we could just integrate QUICs internal message
+passing infrastructure to our message passing infrastructure.
+It probably already supports a message passing interface.
+
+Instead of synchronously writing data, you send a message to it
+to write some data, and hen it is done, it calls a callback.
+
 # Minimal system

 Prototype.  Limit global bandwidth at the application
--- a/docs/libraries.md
+++ b/docs/libraries.md
@ -151,6 +151,20 @@ paid for in services or crypto currency.
 filecoin provides this, but is useless for frequent small incremental
 backups.

+## Bittorrent DHT library
+
+This is a general purpose library, not all that married to bittorrent
+
+It is available of as an MSYS2 library , MSYS2 being a fork of
+the semi abandoned mingw libary, with the result that the name of the
+very dead project Mingw-w64 is all over it.
+
+Its pacman name is mingw-w64-dht, but it has repos all over the plac under its own name
+
+It is async, driven by being called on a timer, and called when
+data arrives.  It contains a simple example program, that enables you to publish any data you like.
+
+
 ## libp2p

 [p2p]:https://github.com/elenaf9/p2p
@ -814,7 +828,17 @@ libraries, but I hear it cursed as a complex mess, and no one wants to
 get into it.  They find the far from easy `cmake` easier.  And `cmake`
 runs on all systems, while autotools only runs on linux.

-I believe `cmake` has a straightforward pipeline into `*.deb` files, but if it has, the autotools pipleline is far more common and widely used.
+MSYS2, which runs on Windows, supports autotools.  So, maybe it does run 
+on windows.
+
+[autotools documentation]:https://thoughtbot.com/blog/the-magic-behind-configure-make-make-install
+{target="_blank"}
+
+Despite the complaints about autotools, there is [autotools documentation]
+on the web that does not make it sound too bad.
+
+I believe `cmake` has a straightforward pipeline into `*.deb` files,
+but if it has, the autotools pipleline is far more common and widely used.

 ## The standard windows installer

@ -847,6 +871,8 @@ NSIS can create msi files for windows, and is open source.

 [NSIS Open Source repository]

+NSIS is also available as an MSYS package
+
 People who know what they are doing seem to use this open
 source install system, and they  write nice installs with it.

@ -1596,6 +1622,12 @@ You can create a pool of threads processing connection handlers (and waiting
 for finalizing database connection), by running `io_service::run()` from
 multiple threads. See Boost.Asio docs.

+## Asio
+I tried boost asio, and concluded it was broken, trying to do stuff that cannot be done,
+and hide stuff that cannot be hidden in abstractions that leak horribly.
+
+But Asio by itself (comes with MSYS2) might work.
+
 ## Asynch Database access

 MySQL 5.7 supports [X Plugin / X Protocol, which allows asynchronous query execution and NoSQL But X devapi was created to support node.js and stuff. The basic idea is that you send text messages to mysql on a certain port, and asynchronously get text messages back, in google protobuffs, in php, JavaScript, or sql. No one has bothered to create a C++ wrapper for this, it being primarily designed for php or node.js](https://dev.mysql.com/doc/refman/5.7/en/document-store-setting-up.html)
--- a/docs/libraries/cpp_automatic_memory_management.md
+++ b/docs/libraries/cpp_automatic_memory_management.md
@ -437,11 +437,23 @@ A class can be explicitly defined to take aggregate initialization
 		}
 	}

-but that does not make it of aggregate type. Aggregate type has *no*
-constructors except default and deleted constructors
+but that does not make it of aggregate type.
+Aggregate type has *no* constructors
+except default and deleted constructors

 # functional programming

+A lambda is a nameless value of a nameless class that is a
+functor, which is to say, has `operator()` defined.
+
+But, of course you can get the class with `decltype`
+and assign that nameless value to an `auto` variable,
+or stash it on the heap with `new`,
+or in preallocated memory with placement `new`
+
+But if you are doing all that, might as well explicitly define a
+named functor class.
+
 To construct a lambda in the heap:

 	auto p = new  auto([a,b,c](){})
@ -475,8 +487,8 @@ going to have to introduce a compile time name, easier to do it as an
 old fashioned function, method, or functor, as a method of a class that
 is very possibly pod.

-If we are sticking a lambda around to be called later, might copy it by
-value into a templated class, or might put it on the heap.
+If we are sticking a lambda around to be called later, might copy
+it by value into a templated class, or might put it on the heap.

 	auto bar = []() {return 5;};