1
0
forked from cheng/wallet
wallet/docs/design/server.md

1156 lines
75 KiB
Markdown
Raw Normal View History

---
title: Server Data Representation
sidebar: true
...
# related
[Replacing TCP, SSL, DNS, CAs, and TLS](TCP.html){target="_blank"}
# clients and hosts, masters and servers
A server does the same things for a master as a host does for a client.
The difference is how identity is seen by third parties. The servers identity
is granted by the master, and if the master switches servers, third parties
scarcely notice. It the same identity. The client's identity is granted by the
host, and if the client switches hosts, the client gets a new identity, as for
example a new email address.
If we use [Pake and Opaque](libraries.html#opaque-password-protocol) for client login, then all other functionality of
the server is unchanged, regardless of whether the server is a host or a
server. It is just that in the client case, changing servers is going to change
your public key.
Experience with bitcoin is that a division of responsibilities, as between Wasabi wallet and Bitcoin core, is the way to go - that the peer to peer networking functions belong in another process, possibly running on
another machine, possibly running on the cloud.
You want a peer on the blockchain to be well connected with a well
known network address. You want a wallet that contains substantial value
to be locked away and seldom on the internet. These are contradictory
desires, and contradictory functions. Ideally one would be in a basement
and generally turned off, the other in the cloud and always on.
Plus, I have come to the conclusion that C and C++ just suck for
networking apps. Probably a good idea to go Rust for the server or host.
The wallet is event oriented, but only has a small number of concurrent
tasks. A host or server is event oriented, but has a potentially very large
number of concurrent tasks. Rust has no good gui system, there is no
wxWidgets framework for Rust. C++ has no good massive concurrency
system, there is no Tokio for C++.
Where do we put the gui for controlling the server? In the master, of
course.
Where do we put the networking stuff? in the server.
# the select problem
To despatch an `io` event, the standard is `select()`. Which standard sucks
when you have a lot of sockets to manage.
The recommended method for servers with massive numbers of clients is overlapped IO, of which Wikipedia says:
> Utilizing overlapped I/O requires passing an `OVERLAPPED` structure to API functions that normally block, including ReadFile(), WriteFile(), and Winsock's WSASend() and WSARecv(). The requested operation is initiated by a function call which returns immediately, and is completed by the OS in the background. The caller may optionally specify a Win32 event handle to be raised when the operation completes. Alternatively, a program may receive notification of an event via an I/O completion port, *which is the preferred method of receiving notification when used in symmetric multiprocessing environments or when handling I/O on a large number of files or sockets*. The third and the last method to get the I/O completion notification with overlapped IO is to use ReadFileEx() and WriteFileEx(), which allow the User APC routine to be provided, which will be fired on the same thread on completion (User APC is the thing very similar to UNIX signal, with the main difference being that the signals are using signal numbers from the historically predefined enumeration, while the User APC can be any function declared as "void f(void* context)"). The so-called overlapped API presents some differences depending on the Windows version used.[1]
>
> Asynchronous I/O is particularly useful for sockets and pipes.
>
> Unix and Linux implement the POSIX asynchronous I/O API (AIO)
Which kind of hints that there might be a clean mapping between Windows `OVERLAPPED` and Linux `AIO*`
Because generating and reading the select() bit arrays takes time
proportional to the largest fd that you provided for `select()`, the `select()`
scales terribly when the number of sockets is high.
Different operating systems have provided different replacement functions
for select. These include `WSApoll()`, `epoll()`, `kqueue()`, and `evports()`. All of these give better performance than select(), all give O(1) performance
for adding a socket, removing a socket, and for noticing that a socket is
ready for IO. (Well, `epoll()` does when used in edge triggered (`EPOLLET`)
mode. It has a `poll()` compatibility mode which fails to perform when you
have a large number of file descriptors)
Windows has `WSAPoll()`, which can be a blocking call, but if it blocks
indefinitely, the OS will send an alert callback to the paused thread
(asynchronous procedure call, APC) when something happens. The
callback cannot do another blocking call without crashing, but it can do a
nonblocking poll, followed by a nonblocking read or write as appropriate.
This analogous to the Linux `epoll()`, except that `epoll()` becomes ungodly
slow, rather than crashing. The practical effect is that "wait forever"
becomes "wait until something happens that the APC did not handle, or
that the APC deliberately provoked")
Using the APC in Windows gets you behavior somewhat similar in effect
to using `epoll()` with `EPOLLET` in Linux. Not using the APC gets you
behavior somewhat similar in effect to Linux `poll()` compatibility mode.
Unfortunately, none of the efficient interfaces is a ubiquitous standard. Windows has `WSAPoll()`, Linux has `epoll()`, the BSDs (including Darwin) have `kqueue`(), … and none of these operating systems has any of the others. So if you want to write a portable high-performance asynchronous application, youll need an abstraction that wraps all of these interfaces, and provides whichever one of them is the most efficient.
The Libevent api wraps various unix like operating system efficient replacements, but unfortunately missing from its list is the windows efficient replacement.
The way to make them all look alike is to make them look like event
handlers that have a pool of threads that fish stuff out of a lock free
priority queue of events, create more threads capable of handling this kind
of event if there is a lot of stuff in the queue and more threads are needed,
and release all threads but one that sleeps on the queue if the queue is
empty and stays empty.
Trouble is that windows and linux are just different. Except both support
select, but everyone agrees that select really sucks, and sucks worse the
more connections.
A windows gui program with a moderate number of connections should use windows asynchronous sockets, which are designed to deliver events on the main windows gui event loop, designed to give you the benefits of a separate networking thread without the need for a separate networking thread. Linux does not have asynchronous sockets. Windows servers should use overlapped io, because they are going to need ten thousand sockets, they do not have a window
Linux people recommended a small number of threads, reflecting real hardware threads, and one edge triggered `epoll()` per thread, which sounds vastly simpler than what windows does.
I pray that that wxWidgets takes care of mapping windows asynchronous sockets to their near equivalent functionality on Linux.
But writing a server/host/server for Linux is fundamentally different to
writing one for windows. Maybe we can isolate the differences by having
pure windows sockets, startup and shutdown code, pure Linux sockets,
startup and shutdown code, having the sockets code stuff data to and from
lockless priority queues (which revert to locking when a thread needs to
sleep or startup) Or maybe we can use wxWidgets. Perhaps worrying
about this stuff is premature optimization. But the samples directory has
no service examples, which suggests that writing services in wxWidgets is
a bad idea. And it is an impossible idea if we are going to write in Rust.
Tokio, however, is a Rust framework for writing services, which runs on
both Windows and Linux. Likely Tokio hides the differences, in a way
optimal for servers, as wxWidgets hides them in a way optimal for guis.
# the equivalent of RAII in event oriented code
Futures, promises, and cooperative multi tasking.
Is asynch await. Implemented in a Rust library.
This is how a server can have ten thousand tasks dealing with ten thousand clients.
Implemented, in C++20 as co_return, co_await, and co_yield, co_yield
being the C++ equivalent of Rusts poll. But C++20 has no standard
coroutine libraries, and various peoples half baked ideas for a coroutine
library dont seem to be in actual use solving real problems just yet, while
actual people are using the Rust library to solve real world problems.
I have read reviews by people attempting to use C++20 co-routines, and
the verdict is that they are useless and unusable,
And we should use fibres instead. Fibres?
Boost fibres provide multiple stacks on a single thread of execution. But
the consensus is that [fibres just massively suck](https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989).
But, suppose we don't use stack. We just put everything into a struct and disallow recursion (except you create a new struct) Then we have the functionality of fibres and coroutines, with code continuations.
Word is that co_return, co_await, and co_yield do stuff that is
complicated, difficult to understand, and frequently surprising and not
what you want, but with std::future, you can reasonably straightforwardly
do massive concurrency, provided you have your own machinery for
scheduling tasks. Maybe we do massive concurrency with neither fibres,
nor coroutines -- code continuations or close equivalent.
> we get not coroutines with C++20; we get a coroutines framework.
> This difference means, if you want to use coroutines in C++20,
> you are on your own. You have to create your coroutines based on
> the C++20 coroutines framework.
C++20 coroutines seem to be designed for the case of two tasks each of
which sees the other as a subroutine, while the case that actually matters
in practice is a thousand tasks holding a relationship with a thousand clients.
(Javascripts async). It is far from obvious how one might do what
Javascript does using C++20 coroutines, while it is absolutely obvious
how to do it with Goroutines.
## Massive concurrency in Rust
Well supported, works, widely used.
The way Rust does things is that the input that you are waiting for is itself a
future, and that is what drives the cooperative multi tasking engine.
When the event happens, the future gets flagged as fulfilled, so the next
time the polling loop is called, co_yield never gets called. And the polling
loop in your await should never get called, except the event arrives on the
event queue. The Tokio tutorial explains the implementation in full detail.
From the point of view of procedural code, await is a loop that endlessly
checks a condition, calls yield if the condition is not fulfilled, and exits the
loop when the condition is fulfilled. But you would rather it does not
return from yield/poll until the condition is likely to have changed. And
you would rather the outermost future pauses the thread if nothing has
changed, if the event queue is empty.
The right way to implement this is have the stack as a tree. Not sure if
Tokio does that. C++20 definitely does not but then it does not do
anything. It is a pile of matchsticks and glue, and they tell you to build
your own boat.
[Tokio tutorial]:https://tokio.rs/tokio/tutorial/async
The [Tokio tutorial] discusses this and tells us how they dealt with it.
> Ideally, we want mini-tokio to only poll futures when the future is
> able to make progress. This happens when a resource that the task
> is blocked on becomes ready to perform the requested operation. If
> the task wants to read data from a TCP socket, then we only want
> to poll the task when the TCP socket has received data. In our case,
> the task is blocked on the given Instant being reached. Ideally,
> mini-tokio would only poll the task once that instant in time has
> passed.
> To achieve this, when a resource is polled, and the resource is not
> ready, the resource will send a notification once it transitions into a
> ready state.
The mini Tokio tutorial shows you how to implement your own efficient
futures in Rust, and, because at the bottom you are always awaiting an
efficient future, all your futures will be efficient. You have, however
all the tools to implement an inefficient future, and if you do, there will be a
lot of spinning. So if everyone is inefficiently waiting on a future that is
inefficiently waiting on a future that is waiting on a network event or
timeout, and the network events and timeout futures are implemented
efficiently, you are done.
If you cheerfully implement an inefficient future, which however calls an
efficient future, it stops spinning.
> When a future returns Poll::Pending, it must ensure that the wake
> is signalled at some point. Forgetting to do this results in the task
> hanging indefinitely
Multithreading, as implemented in C++, Rust and Julia do not scale to
huge numbers of concurrent processes the way Go does.
notglowing, a big fan of Rust, tells me,
> No, but like in most other languages, you can solve that with
> asynchronous code for I/O bound operations. Which is the kind
> of situation where youd consider Go anyways.
> With Tokio, I can spawn an obscene number of Tasks doing I/O
> work asynchronously, and only use a few threads.
> It works really well, and I have been writing async code using
> Tokio for a project I am working on.
> Async/await semantics are the next best thing after Goroutines.
> Frameworks like Actix-web leverage Tokio to get web
> server performance superior to any Go framework I know of.
> Gos concurrency model might be superior, but Rusts lightweight
> runtime, lack of GC pauses that can cause inconsistent
> performance, and overall low level control give it the edge it needs
> to beat Go in practical scenarios.
I looked up Tokio and Actix, looks like exactly what the doctor ordered.
So, if you need asynch, you need Rust. C++ is build your own boat out of
matchsticks and glue.
The await asynch syntax and semantics are, in effect, multithreading on cheap
threads that only have cooperative yielding.
So you have four real threads, and ten thousand tasks, the effective
equivalent of ten thousand cheap "threads".
I conjecture that the underlying implementation is that the asynch await
keywords turn your stack into a tree, and each time a branch is created and
destroyed, it costs a small memory allocation/deallocation.
With real threads, each thread has its own full stack, and stacks can be
costly, while with await/asynch, each task is just a small branch of the tree.
Instead of having one top of stack, you have a thousand leaves with one
root at start of thread, while having ten thousand full stacks would bring
your program to a grinding halt.
It works like an event oriented program, except the message pumps do not
have to wait for events to complete. Tasks that are waiting around, such as
the message pump itself, can get started on the next thing, while the
messages it dispatched are waiting around.
As recursing piles more stuff on the stack, asynching branches the stack,
while masses of threads give you masses of stacks, which can quickly
bring your computer to a grinding halt.
Resource acquisition, disposition, and release depend on network and timer
events.
RAII guarantees that the resource is available to any function that may
access the object (resource availability is a class invariant,
eliminating redundant runtime tests). It also guarantees that all
resources are released when the lifetime of their controlling object
ends, in reverse order of acquisition.
In a situation where a multitude of things can go wrong, but seldom do,
you would, without RAII, wind up with an exponentially large number of
seldom tested code paths for backing out of the situation. RAII means that
all the possibilities are automagically taken care of in a consistent way,
and you dont have to think about all the possible combinations and
permutations.
RAII plus exceptions shuts down an potentially exponential explosion of
code paths.
Our analog of the situation that RAII deals with is that we dispatch
messages A and B, and create the response handler for B in the response
handler for A. But A might fail, and B might get a response before A.
With await asynch, we await A, then await B, and if B has already arrived,
our await for B just goes right ahead, never calling yield, but removing
itself from the awake notifications.
Go already has technology and idiom for message handling. Maybe the
solution for this problem is not to re-invent Go technology in C++ using
[perfect forwarding] and lambda functions, but to provide a message
interface to Go in C.
But there is less language mismatch between Rust and C++ than between
Go and C++.
And maybe C++20 has arrived in time, have not checked the availability
of co_await, co_return, and co_yield.
[perfect forwarding]:https://cpptruths.blogspot.com/2012/06/perfect-forwarding-of-parameter-groups.html
On the other hand, Gos substitute for RAII is the defer statement, which
presupposes that a resource is going to be released at the same stack level
as it was acquired, whereas when I use RAII I seldom know, and it is often
impossible to predict, at what stack level a resource should be released,
because the resource is owned by a return value, typically created by a
constructor.
On checking out Gos implementation of message handling, it is all things
for which C++ provides the primitives, and Go has assembled the
primitives into very clean and easy to use facilities. Which facilities are
not hard to write in C++.
The clever solution used by Go is typed, potentially bounded [channels],
with a channel being able to transmit channels, and the select statement.
You can also do all the hairy shared memory things you do in C++, with
less control and elegance. But you should not.
[channels]:https://golang.org/doc/effective_go#concurrency
"concurrency"
What makes Go multithreading easy is channels and select.
This an implementation of Communicating Sequential Processes, which is
that input, output, and concurrency are useful and effective primitives, that
can elegantly and cleanly express algorithms, even if they are running on a
computer that physically can only execute a single thread, that
concurrency as expressed by channels is not merely a safe way of
multithreading, but a clean way of expressing intent and procedure to the
computer.
Goroutines are less than a thread, because they are using some multiplexed
threads stack. They live in an environment where a stable and small pool
of threads is despatched to function calls, and when a goroutine is stopped,
because it is attempting to communicate, its stack state, which is
usually quite small, is stashed somewhere without destroying and creating an
entire thread. They need to lightweight, because used to express
algorithms, with parallelism not necessarily being an intended side effect.
The relationship between goroutines and node.js continuations is that a
continuation is a small packet of state that will receive an event, and a
paused goroutine is a small packet of state that will receive an event. Both
approaches seem comparably successful in expressing concurrent algorithms,
though node.js is single threaded.
Node.js uses async/await, and by and large, more idiots are successfully
using node.js than successfully using Go, though go solutions are far more
lightweight than node.js solutions, and in theory Rust should be still
lighter.
But Rust solutions should be even lighter weight than Go solutions.
So maybe I do need to invent a C++ idiom for this problem. Well, a Rust
library already has the necessary idiom. Use Tokio in Rust. Score for
powerful macro language and sum types. Language is more expandable.
# Unit Test
It is hard to unit test a client server system, therefore, most people
unit test using mocks: Fake classes that do not really interact with the
external world replace the real classes, if you perform that part of the
unit test that deals with external interaction with clients and servers.
Your unit test runs against a dummy client and a dummy server thus the
unit test code necessarily differs from the real code.
But this will not detect bugs in the real class being mocked, which therefore
has to be relatively simple and unchanging not that it is necessarily all
that practical to keep it simple and unchanging, consider the messy
irregularity of TCP.
Any message is an event, and it is a message between an entity identified
by one elliptic point, and an entity identified by another elliptic point.
We intend that a process is identified by many elliptic points, so that it has
a stable but separate identity on every server. Which implies that it can
send messages to itself, and these will look like messages from outside.
The network address will be an opaque object. Which is not going to help
us unit test code that has to access real network addresses, though our
program undergoing unit test can perform client operations on itself,
assuming in process loopback is handled correctly. Or maybe we just have
to assume a test network, and our unit test program makes real accesses
over the real internet.
But our basic architecture is that we have an opaque object representing a
communication node, it has a method that creates a connection, and you
can send a message on a connection, and receive a reply event on that
connection.
Sending a message on a connection creates the local object that will handle
the reply, and this local objects lifetime is managed by hash code tables --
or else this local object is stored in the database, written to disk in the
event that sends the message, and read from disk in the event that handles
the reply to the message.
Object representing server 🢥 Object representing connection to server 🢥
object representing request-response.
We send messages between entities identifed by their elliptic points, we
get events on the receiving entity when these events arrive, generate
replies, and get an event on the sending entity when the reply is received.
And one of the things in these messages will be these entities and
information about these entities.
So we create our universal class, which may be mocked, whereby a client
takes an opaque data structure representing a server, and makes a request,
thereby creating a channel, on which channel it can create additional
requests. It can then receive a reply on this channel, and make further
requests, or replies to replies, sequentially on this channel.
We then layer this class on top of this class as for example setting up a
shared secret, timing out channels and establishing new ones, so we have
as much real code as possible, implementing request object classes in
terms of request object classes, so that we can mock any one layer in the
hierarchy,
At the top layer, we dont know we are re-using channels, and dont know we
are re-using secrets we dont even keep track of the transient secret
scalar and transient shared secret point, because that might be discarded and
reconstructed. All this stuff lives in an opaque object representing the
current state of our communication with the server, which is, at the topmost
level, identified a database record, and/or an objected instantiated from a
database record and/or a handle to that object and/or a hash code to that
handle.
Since we are using an opaque object of an opaque type, we can freely mix
fake objects with real ones. Unit test will result in fake communications
over fake channels with fake external clients and servers.
# Factorizing the problem
Why am I reinventing OMEMO, XMPP, and OTR?
These projects are quite large, and have a vast pile of existing code.
On the other hand OTR seems an unreasonably complicated way of
adding on what you get for free with perfect forward secrecy,
authentication without signing is just the natural default for perfect
forward secrecy, and signing has to be added on top. You get OTR (Off the
Record) for free just by leaving stuff out. XMPP is a presence protocol is
just name service, which is integral to any social networking system. Its
pile of existing code supports Jitsis wonderful video conferencing system,
which would be intolerably painful to reinvent.
And OMEMO just does not do the job. It guarantees you have a private
room with the people you think you have a private room with, but how did
you come to decide you wanted a private room with those people and not
others? It leaves the hard part of the problem out of scope.
The problem is factoring a heap of problems that lack obvious boundaries
between one problem and the next. You need to find the smallest factors
that are factors of all these big problems find a solution to your problems
that is a suitable module of a solution to all these big problems.
But you dont want to factorize all the way down,otherwise when you
want a banana, you will get a banana tree, a monkey, and a jungle. You
want the largest factors that are common factors of more than one problem
that you have to solve.
And a factor that we identify is that we create a shared secret with a
lifetime of around twenty minutes or so, longer than the lifetime of the
TCP connections and longer than the query-response interactions, that
ensures:
* Encryption (eavesdroppers learn almost nothing)
* Authentication (the stable identity of the two wallets, no man in the
middle attack)
* Deniability (the wallets have proof of authentication, but no proof
that they can show to someone else)
* Perfect forward secrecy (if the wallet secrets get exposed, their
communications remain secret)
Another factor we identify is binding a group of remote object method
calls together to a single one that must fail together, of which problem
a reliability layer on top of UDP is a special case. But we do not want
to implement our own UDP reliability layer, when [QUIC], has already been
developed and widely deployed. We notice that to handle this case, we
need not an event object referenced by an event handle and an event
hashcode, but rather an array of event objects referenced by an event
handle, an event hashcode, and the sequence number within that vector.
## streams, secrets, messages, and authentication
To leak minimal metadata, we should encrypt the packets with
XChaCha20-SIV, or use a random nonce. (Random nonces
are conveniently the default case libsodium's `crypto_box_easy`). The port
should be random for any one server and any one client, to make it slightly
more difficult to sweep up all packets using our encryption. Any time we
distribute new IP information for a server, also distribute new open port information.
XChaCha20-SIV is deterministic encryption, and deterministic encryption
will leak information unless every message sent with a given key is
guaranteed to be unique - in effect, we have the nonce inside the
encryption instead of outside. Each packet must contain a forever
incrementing packet number, which gets repeated but with a different send
time, and perhaps an incremented resend count, on reliable messaging
resends. This gets potentially complicated, hard to maintain, and easy to break.
Neither protocol includes authentication. The crypto_box wraps the
authentication with the encryption. You need to add the authenticator after
encryption and before decryption, as crypto_box does. The principle of
cryptographic doom is that if you don't, someone will find some clever
way of using the error messages your higher level protocol generates to
turn it into a decryption/encryption oracle.
Irritatingly, `crypto_box_easy_*` defaults to XSalsa, and I prefer XChaCha.
However `crypto_box_curve25519xchacha20poly1305.*easy.*` in
`crypto_box_curve25519xchacha20poly1305.h` wraps it all together. You
just have to call those instead of `crypto_box_easy.*` Which is likely to be
a whole lot easier and safer than wrapping XChaCha20-SIV.
For each `crypto_box` function, there is a corresponding
`crypto_box_curve25519xchacha20poly1305` function, apart from
some special cases that you probably should not be using anyway.
So, redefine each crypto_box function to use XChaCha20
```C++
namespace crypto_box{
const auto& «whatever» = crypto_box_curve25519xchacha20poly1305_«whatever»;
}
```
Nonces are intended to be communicated in the clear, thus sequential
nonces inevitably leak metadata. Don't use sequential nonces. Put the
packet number and message number or numbers inside the authenticated encryption.
Each packet of a packetized message will contain the windowed
message id of the larger message of which it is part, the id of the thread or
thread pool that will ultimately consume it, the size of the larger message
of which it is part, the number of packets in the larger message of which it
is part, and its packet and byte position within that larger message. The
repetition is required to handle out of order messages and messages with
lost packets.
For a single packet message, or a multi message packet, each message
similarly.
Message ids will be windowed sequential, and messages lost in entirety will be reliably resent because their packets will be reliably resent.
If we allocate each packet buffer from the heap, and free it when it is used,
this does not make much of a dent in performance until we are processing
well over a Gib/s.
So we can worry about efficient allocation after we have released software
and it is coming under heavy load.
Another more efficient way would be to have a pool of 16KiB blocks,
allocate one of them to a connection whenever that connection needs it,
allocate packet buffers sequentially in a 16KiB block, incrementing a
count, free up packet buffers in the bloc when a packet is dealt with,
decrementing the count. When the count returns to zero, it goes back to the
free pool, which is accessed in lifo order. Every few seconds the pool is
checked, and if there are number of buffers that have not been used in the
last few minutes, we free them. We organize things that inactive
connection has no packet buffers associated with it. But this is fine tuning
and premature optimization.
The recipient will nack the sender about any missing packets within a
multipacket message. The sender will not free up any memory containing
packets that have not been acked, and the receiver will not free up any
memory that has not been handled by the thread that ultimately receives
the data.
Experimenting with memory allocation and deallocation times, looks like
a sweet spot is to allocate in 16KiB blocks, with the initial fifo queue
being allocated with two 16KiB blocks as soon as activity starts, and the
entire fifo queue deallocated when it is empty. If we allocated, deallocated
when activity stops, and re-allocated every millisecond, it would not
matter much, and we will be doing it far less often than that, because we
will keeping the buffer around for at least one round trip time. If every
active queue has on average sixty four KiB, and we have sixteen thousand
simultaneous active connections, only costs a gigabyte. This rather
arbitrary guesstimated value seems good enough that it does not waste too
much memory, nor too much time. Memory for input output streams
seems cheap, might as well cheerfully spend plenty, perhaps a lot more
than necessary, so as to avoid hitting other limits.
We want connections, the shared secrets, identity data, and connection parameters, hanging around for a very long time of inactivity, because
they are something like logins. We don't want their empty data stream
buffers hanging around. Re-establishing a connection takes hundreds
of times longer that allocating and deallocating a buffer.
We also want, in a situation of resource starvation, to cut back the
connections that are the heaviest users to wait. They should not send, until
told space is available, and we just don't make it available, because their
buffer got emptied out, then thrown away, and they just have to wait their
turn till the server clears them to get a new one allocated when they send data.
If the server has too much work, a whole lot of connections get idled for
longer and longer periods, and while idled, their buffers are discarded.
When we have a real world application facing real world heavy load, then
we can fuss about fine tuning the parameters.
The packet stream that is being resolved (the packets, their time of arrival and
sending, that they were acked, nacked, ack status, and all that, goes into a
first in first out random access queue, composed of fixed size blocks larger than
the packet size.
We hope that C++ implements large random access fifo queues with
mmap. If it does not, will eventually have to write our own.
Each block starts with metadata that enables the stream of fixed sized
blocks to be interpreted as a stream of variable sized packets and the
metadata about those packets. The block size in bits, and the size of the
block and packet metadata, but initially only 4K byte, 32K kilobit blocks
will be supported. The format of metadata that is referenced or defined
within packets is also negotiated, though initially the only format will be
format number one. Obviously each side is free to define its own format for
the metadata outside of packets, but it has to be the same size at both
ends. Each party can therefore demand any metadata size it wants, subject
to some limit, for metadata outside the packets.
The packets are aligned within the blocks so that 512 bit blocks to be
encrypted or decrypted are aligned with the blocks of the queue so the
blocks of the queue are always a multiple of 512 bits, 32 bytes, and block
size is given as a multiple of 32 bytes. This will result in an average of
sixteen bytes of space wasted positioning each packet to a boundary.
The pseudo random streams of encrypting information are applied with an
offset that depends on the absolute position in the queue, which is why
the queues have to have packets in identical position in both queues.
Each block header contains unwindowing values for any windowed values in
the packets and packet metadata, which unwindowing data is a mere 64 bits,
but, since block and packet metadata size gets negotiated on each
connection, this can be expanded without breaking backwards
compatibility. The format number for packet references to metadata
implies an unwindow size, but we initially assume that any connection only
sends less that 2^64 512 bit packets, rather packets plus the metadata
required to describe those packets takes up less than 2^73 bits,
corresponding to a thousand Mbps
The packet position in the queue is the same at both ends, and is
unwindowed in the block header.
The fundamental architecture of QUIC is that each packet has its own
nonce, which is an integer of potentially sixty two bits, expressed in
a form that is short for smaller integers, which is essentially my
design, so I expect that I can use a whole lot of QUIC code.
It negotiates the AES session once per connection, and thereafter, it
is sequential nonces all the way.
Make a new one time secret from a new one time public key every time
you start a stream (pair of one way streams). Keeping one time secrets
around for multiple streams, although it can in theory be done safely, gets
startlingly complicated really fast, with the result that nine times out of ten
it gets done unsafely.
Each two way stream is a pair of one way streams. Each encryption packet
within a udp packet will have in the clear its stream number and a window
into its stream position, the window size being log base two of the position
difference between all packets in play, plus two, rounded up to the nearest
multiple of seven. Its stream number is an index into shared secrets and stream
states associated with this IP and port number.
If initiating a connection in the clear (and thus unauthenticated) Alice
sends Bob (in a packet that is not considered part of a stream) a konce (key used once, single use
elliptic point $A_o$). She follows it, in the same packet and in a new
encrypted but unauthenticated stream, proving knowledge of the scalar
corresponding to the elliptic point by using the the shared secret
$a_oB_d = b_dA_o$, where $B_d$ is Bobs durable public key and $b_d$ his
durable secret key. In the encrypted but unauthenticated stream, she sends
$A_d$, her durable public key, (which may only be durable until the
application is shut down) initiating a stream encrypted with
$(a_o+a_d)B_d =b_d(A_o+A_d)$, or more precisely, symmetrically encrypted
with the 384 bit hash of that elliptic point and one way stream number).
All this stuff happens during the handshake, and when we allocate a
receive buffer, we have a shared secret. The sender may only send up to
the size of the receive buffer, and has to wait for acks which will
announce more receive buffer.
There is no immediate reason to provide the capability to create a new
differently authenticated stream from within an authenticated stream, for
the use cases for that are better dealt with by sending authorizations for
them existing authentication signed by the other party. Hence one to one
mapping between port number and durable authenticating elliptic point,
with each authenticated stream within that port number deriving its shared
secret from a konce covers all the use cases that occur to
me. We dont care about making creating a login relationship efficient.
When the OS gives you a packet, it gives you the handle you associated
with that network address and port number, and the protocol layer of
application then has to expand that into the receive stream number and
packet position in the stream. After decrypting the streams within a packet,
it then maps stream id and message id to the application layer message
handler id. It passes the position of data within the message, but not the
position within the stream because you dont want too many copies of the
shared secret floating around, and because the application does not care.
Message data may arrive out of sequence within a message, but the
protocol layer always sends the data in sequence to the application, and
usually the application only wants complete messages, and does not
register a partial message handler anyway.
Each application runs its own instance of the protocol layer, and each
application is, as far as it knows or cares, sending messages identified by
their receiver message handler and reply message handler to a party
identified by its zooko id. A message always receives a reply, even if the
reply is only “message acknowledged”, “message abandoned”, “message
not acknowledged” “timeout”, “graceful shutdown of connection”, or
“ungraceful shutdown of connection”, The protocol layer maps these into
encrypted sequential streams and onto message numbers within the stream
when sending them out, and onto application ids, application message
handlers and receiving zooko ids when receiving them.
But, if a message always receives a reply, the sender may want to know
which message is being replied to. Which implies it always receives a
handle to the sent message when it gets the reply. Which implies that the
protocol layer has to provide unique reply ids for all messages in play
where a substantive reply is expected from the recipient. (“Message
received” does not need a reply id, because implicit in the reliable
transport layer, but special casing such messages to save a few bytes per
message adds substantially to complexity. Easier to have the recipient ack
all packets and all messages every round trip time, even though acking
messages is redundant, and identifying every message is redundant.)
This is implies that the protocol layer gives every message a unique sixty
four bit windowed id, with the window size sufficient to cover all
messages in play, all messages that have neither been acked nor
abandoned.
Suppose we are transferring one dozen eight terabyte disks in tiny fifty
byte messages. And suppose that all these messages are in play, which
seems unlikely unless we are communicating with someone on Pluto. Well,
then we will run out of storage for tracking every message in play,
but suppose we did not. Then forty bits would suffice, a sixty four bit
message id suffices. And, since it is windowed, using the same windowing
as we are using for stream packet 384 bit ids, we can always increase it
without changing the protocol on the wire when we get around to sending
messages between galaxies.
A windowed value represents an indefinitely large unsigned integer, but
since we are usually interested in tracking the difference between two such
values, we define substraction and comparison on windowed values to
give us ordinary signed integers, the largest precision integer than we can
conveniently represent on our machine. Which will always suffice, for by
the time we get around to enormous tasks, we will have enormous
machines.
Because each application runs its own protocol layer, it is simpler, though
not essential, for each application to have its own port number on its
network address and thus its own streams on that port number. All
protocol layers use a single operating system udp layer. All messages
coming from a single application in a single session are authenticated with
at least that session and application, or with an id durable between
sessions of the application, or with an id durable between the user using
different applications on the same machine, or with an id durable to the
user and used on different machines in different applications, though the
latter requires a fair bit of potentially hostile user interface.
If the application wants to use multiple identities during a session, it
initiates a new connection on a new port number in the clear. One session,
one port number, at most one identity. Multiple port numbers, however, do
not need nor routinely have, multiple identities for the same run of the
application.
[QUIC]: https://github.com/private-octopus/picoquic
If we implement a [QUIC] large object layer, (and we really should not do
this until we have working code out there that runs without it) it will
consist of reliable request responses on top of groups of unreliable request
responses, in which case the unreliable request responses will have a
group request object that maps from their UDP origin and port numbers,
and a sequence number within that group request object that maps to an
item in an array in the group request operator.
### speed
The fastest authenticated encryption algorithm is OCB - and on high end
hardware, AES256OCB.
AES256OCB, despite having a block cipher underneath, has properties
that make it possible to have the same API as xchacha20poly1305.
(Encrypts and authenticates arbitrary length, rather than block sized, messages.)
[OCB patents were abandoned in February 2021](https://www.metzdowd.com/pipermail/cryptography/2021-February/036762.html)
One of these days I will produce a fork of libsodium that supports ``crypto_box_ristretto25519aes256ocb.\*easy.\*`, but that is hardly urgent.
Just make sure the protocol negotiation allows new ciphers to be dropped in.
# Getting something up and running
I need to get a minimal system up that operates a database, does
encryption, has a gui, does unit test, and synchronizes data with other
system.
So we will start with a self licking icecream:
We aim for a system that has a per user database identifying public keys
related to user controlled secrets, and a local machine database relating
public keys to IP numbers and port numbers. A port and IP address
identifies a process, and a process may know the underlying secrets of
many public key.
The gui, the user interface, will allow you to enter a secret so that it is hot and online, optionally allow you to make a subordinate wallet, a ready wallet.
The system will be able to handle encryption, authentication, signatures,
and perfect forward secrecy.
The system will be able to merge and floodfill the data relating public
keys to IP addresses.
We will not at first implement capabilities equivalent to ready wallets,
subordinate wallets, and Domain Name Service. We will add that in once
we have flood fill working.
Floodfill will be implemented on top of a Merkle-patricia tree
implemented with, perhaps, grotesque inefficiency by having nodes in the
database where the address of each node consists of the bit length of the
address as the primary sort key, then the address, and then the record, the
content of the node identified by this is hashes, the type, and the addresses
of the two children, and the hashes of the two children. The hash of the
node is the hash of the hashes of its two children, ignoring its address.
(The hash of the leaf nodes take account of the leaf nodes address, but the
hashes of the tree nodes do not)
Initially we will get this working without network communication, merely
with copy paste communication.
An event always consists of a bitstream, starting with a schema identifier.
The schema identifier might be followed by a shared secret identifiers,
which identifies the source and destination key, or followed by direct
identification of the source and destination key, plus stuff to set up a
shared secret.
# Terminology
Handle:
: A handle is short opaque identifier that corresponds to quite small
positive integer that points to an object in an `std::vector` containing a
sequence of identically sized objects. A handle is reused almost
immediately. When a handle is released, the storage it references goes
into a `std::priority_queue` for reuse, with handles at the start of the
of the vector being reused first. If the priority queue is empty, the
vector grows to provide space for another handle. The vector never
shrinks, though unused space at the end will eventually get paged out. A
handle is a member of a class with a static member that points to that
vector, and it has a member that provides a reference to an object in the
vector. It is faster than fully dynamic allocation and deallocation, but
still substantially slower than static or stack allocation. It provides
the advantages of shared pointer, with far lower cost. Copying or
destroying handles has no consequences, they are just integers, but
releasing a handle still has the problem that there may be other copies of
it hanging around, referencing the same underlying storage. Handles are
nonowning they inherent from unsigned integers, they are just unsigned
integers plus some additional methods, and the static members
`std::vector<T>table;` and `std::priority_queue<handle<T>table;>unused_handles;`
Hashcode:
: A rather larger identifier that references an `std::unordered_map`,
which maps the hashcode to underlying storage, usually through a handle,
though it might map to an `std::unique_ptr`. Hashcodes are sparse, unlike
handles, and are reused infrequently, or never, so if your only reference
to the underlying storage is through a hashcode, you will not get
unintended re-use of the underlying storage, and if you do reference after
release, you get an error the hashcode will complain it no longer maps
to a handle. Hashcodes are owning, and the hashmap has the semantics of
`unique_ptr` or `shared_ptr`. When an event is fired, it supplies a
hashcode that will be associated with the result of that fire and
constructs the object that hashcode will reference. When the response
happens, the object referenced by the hashcode is found, and the command
corresponding to the event type executed. In a procedural program, the
stack is the root data structure driving RAII, but in an event oriented
program, the stack gets unwound between events, so for data structures
that persist between events, but do not persist for the life of the
program, we need some other data structure driving RAII, and that data
structure is the database and the hashtables, the hashtables being the
database for stuff that is ephemeral, so we dont want the overheads of
actually doing data to disk operations.
Hot Wallet:
: The wallet secret is in memory, or the secret from which it is derived and chain of links by which that secret is derived is in memory.
Cold Wallet, paper wallet:
: The wallet secret is not in memory nor on non volatile storage in a computer connected to the internet. High value that is intended to be kept for a long time should be controlled by a cold wallet.
Online Wallet:
: Hot and online. Should usually be a subordinate wallet for your cold wallet. Your online subordinate wallet will commonly recieve value for your cold wallet, and will only itself control funds of moderate value. An online wallet should only be online in one machine in one place at any one time, but many online wallets can speak on behalf of one master wallet, possibly a cold wallet, and receive value for that wallet
Ready Wallet:
: Hot and online, and when you startup, you dont have to perform the difficult task of entering the secret because when it is not running, the secret is on disk. The wallet secret remains in non volatile storage when you switch off the computer, and therefore is potentially vulnerable to theft. It is automatically loaded into memory as the wallet, the identity, with which you communicate.
Subordinate Wallet:
: Can generate public keys to receive value on behalf of another wallet,
but cannot generate the corresponding secret keys, while that other wallet,
perhaps currently offline, perhaps currently existing only in the form of a
cold wallet, that the other wallet has can generate the secret keys for.
Usually has an authorization lasting three months to speak in that other
wallets name, or until that other wallet issues a new authorization. A
wallet can receive value for any other wallet that has given it a secret and
authorization but only spend value for itself.
# The problem
Getting a client and a server to communicate is apt to be surprisingly complicated. This is because the basic network architecture for passing data around does not correspond to actual usage.
TCP-IP assumes a small computer with little or no non volatile storage, and infinite streams, but actual usage is request-response, with the requests and responses going into non volatile storage.
When a bitcoin wallet is synchronizing with fourteen other bitcoin wallets, there are a whole lot of requests and replies floating around all at the same time. We need a model based on events and message objects, rather than continuous streams of data.
IP addresses and port numbers act as handles and hashcodes to get data from one process on one computer to another process on another computer, but within the process, in user address space, we need a representation that throws away the IP address, the port number, and the positional information and sequence within the TCP-IP streams, replacing it with information that models the process in ways that are more in line with actual usage.
# Message objects and events
Any time we fire an event, send a request, we create a local data structure identified by a handle and by the twofiftysix bit hashcode of the request, the pair of entities communicating. The response to the event references either the hashcode, or the handle, or both. Because handles are local, transient, live only in ram, and are not POD, handles never form part of the hash describing the message object, even though the reply to a request will contain the handle.
We dont store a conversation as between me and the other guy. Rather, we
store a conversation as between Ann and Bob, with the parties in lexicographic
order. When Ann sees the records on her computer, she knows she is Ann, when
Bob sees the conversation on his computer, he knows he is Bob, and Carol sees
the records, because they have been made public as part of a review, she
knows that Ann is reviewing Bob, but the records have the same form, and lead
to the same Merkle root, on everyones computer.
Associated with each pair of communicating entities is a durable secret
elliptic point, formed from the wallet secrets of the parties communicating,
and a transient and frequently changing secret elliptic point. These secrets
never leave ram, and are erased from ram as soon as they cease to be
needed. A hash formed from the durable secret elliptic point is associated
with each record, and that hash goes into non volatile storage, where it is
unlikely to remain very secret for very long, and is associated with the
public keys, in lexicographic order, of the wallets communicating. The
encryption secret formed from the transient point hides the public key
associated with the durable point from eves droppers, but the public key
that is used to generate the secret point goes into nonvolatiles storage,
where it is unlikely to remain very secret for very long.
This ensures that the guy handing out information gets information about who is interested in his information. It is a privacy leak, but we observe that sites that hand out free information on the internet go to great lengths to get this information, and if the protocol does not provide it, will engage in hacks to get it, such as Google Analytics, which hacks lead to massive privacy violation, and the accumulation of intrusive spying data in vast centralized databases. Most internet sites use Google Analytics, which downloads an enormous pile of JavaScript on your browser, which systematically probes your system for one thousand and one privacy holes and weaknesses and reports back to Google Analytics, which then shares some of their spy data with the site that surreptitiously downloaded their enormous pile of hostile spy attack code onto your computer.
***[Block Google Analytics](./block_google_analytics.html)***
We can preserve some privacy on a client by the wallet initiating the connection deterministically generating a different derived wallet for each host that it wants to initate connection with, but if we want push, if we want peers that can be contacted by other peers, have to use the same wallet for all of them.
A peer, or logged in, connection uses one wallet for all peers. A client connection without login, uses an unchanging, deterministically generated, probabilistically unique, wallet for each server. If the client has ever logged in, the peer records the association between the deterministically generated wallet, and wallet used for peer or logged in connections, so that if the client has ever logged in, that widely used wallet remains logged in forever -albeit the client can throw away that wallet, which is derived from his master secret, and use a new wallet with a different derivation from his master secret.
The owner of a wallet has, in non volatile storage, the chain by which each wallet is derived from his master secret, and can regenerate all secrets from any link in that chain. His master secret may well be off line, on paper, while some the secrets corresponding to links in that chain are in non volatile storage, and therefore not very secure. If he wants to store a large amount of value, or final control of valuable names, he has them controlled by the secret of a cold wallet.
When an encrypted message object enters user memory, it is associated with a handle to a shared transient volatile secret, and its decryption position in the decryption stream, and thus with a pair of communicating entities. How this association is made depends on the details of the network connection, on the messy complexities of IP and of TCP-IP position in the data stream, but once the association is made, we ignore that mess, and treat all encrypted message objects alike, regardless of how they arrived.
Within a single TCP-IP connection, we have a message that says “subsequent
encrypted message objects will be associated with this shared secret and thus
this pair of communicating entities, with the encryption stream starting at
the following multiple of 4096 bytes, and subsequent encryption stream
positions for subsequent records are assumed to start at the next block of a
power of two bytes where the block is large enough to contain the entire
record.”, but on receiving records following that message, we associate it
with the shared secret and the encryption stream position, and pay no further
attention to IP numbers and position within the stream. Once the association
has been made, we dont worry which TCP stream or UDP port number the record
came in on or its position within the stream. We identify the communicating
entities involved by their public keys, not their IP address. When we decrypt
the message, if it is a response to a request, it has the handle and/or the
hash of the request.
A large record object could take quite a long time downloading. So when the
first part arrives, we decrypt the first part, to find the event handler,
and call the progress event of the handler, which may do nothing, every time
data arrives. This may cause the timeout on the handler to be reset.
If we are sending a message object after long delay, we construct a new shared secret, so the response to a request may come over a new TCP connection, different from the one on which it was sent, with a new shared secret, and a position in the decryption stream, unrelated to the shared secret, the position in the decryption stream, and the IP stream, under which a request was sent. Our message object identity is unrelated to the underlying internet protocol transport. Its destination is a wallet, and its ID on the process of the wallet is its hashtag.
# Handles
I have above suggested various ad hoc measures for preventing references to reused handles, but a more robust and generic solution is hash codes. You generate fresh hash codes cyclicly, checking each fresh hash code to see if it is already in use, so that each communication referencing a new event handle or new shared secret also references a new hash code. The old hash code is de-allocated when the handle is re-used, so a new hashcode will reference the new entity pointed to by the handle, and the old hashcode fail immediately and explicitly.
Make all hashcodes thirty two bits. That will suffice, and if scaling bites, we are going to have to go to multiple host processes anyway. Our planned protocol already allows you to be redirected to an arbitrary host wallet speaking on behalf of a master wallet that may well be in cold storage. When we have enormous peers on the internet hosting hundreds of millions of cients, they are going to have to run tens of thousands of processes. Our hashtags only have meaning within a single process and our wallet identifier address space is enormous. Further, a single process can have multiple wallets associated with it, and we could differentiate hashes by their target wallet.
Every message object has a destination wallet, which is an online wallet, which should only be online in one host process in one machine, and an encrypted destination event hashcode. The fully general form of a message object has a source public key, a hashcode indicating a shared secret plus a decryption offset, or is prefixed by data to generate a shared secret and decryption offset, and, if a response to a previous event, an event hashcode that has meaning on the destination wallet. However, on the wire, when the object is travelling by IP protocol, some of these values are redundant, because defaults will have already been created associated with the IP connection. On the disk and inside the host process, it is kept in the clear, so does not have the associated encryption data. At the user process level, and in the database, we are not talking to IP addresses, but to wallets. The connection between a wallet and an IP address is only dealt with when we are telling the operating system to put message objects on the wire, or they are being delivered to a user process by the operating system from the wire. On the wire, having found the destination IP and port of the target wallet, the public key of the target wallet is not in the clear, and may be implicit in the port (dry).
Any shared secret is associated with two hash codes, one being its value on the other machine, and two public keys. But under the dry principle, we dont keep redundant data around, so the redundant data is virtual or implicit.
# Very long lived events
If the event handler refers to a very long lived event (maybe we are waiting for a client to download waiting message objects from his host, email style, and expect to get his response through our host, email style) it stores its associated pod data in the database, deletes it from the database when the event is completed, and if the program restarts, the program reloads it from the database with the original hashtag, but probably a new handle. Obviously database access would be an intolerable overhead in the normal case, where the event is received or timed out quickly.
# Practical message size limits
Even a shitty internet connection over a single TCP-IP connection can usually manage 0.3Mbps, 0,035Mps, and we try to avoid message objects larger than one hundred KB. If we want to communicate a very large data structure, we use a lot of one hundred KB objects, and if we are communicating the blockchain, we are probably communicating with a peer who has at least a 10Mbps connection, so use a lot of two MB message objects.
1Mbps download, 0.3 Mbps upload, Third world cell phone connection, third world roach hotel connection, erratically usable.\
2-4 Mbps Basic Email Web Surfing Video Not Recommended\
4--6 Mbps Good Web Surfing Experience, Low Quality Video Streaming (720p)\
6--10 Mbps Excellent Web Surfing, High Quality Video Streaming (1080p)\
10-20 Mbps High Quality Video Streaming, High Speed Downloads / Business-Grade Speed
A transaction involving a single individual and a single recipient will at
a minimum have one signature (which identifies one UTXO, rhocoin, making it
a TXO, hence $4*32$ bytes, two utxos, unused rocoins, hence $2*40$ bytes, and
a hash referencing the underlying contract, hence 32 bytes say 256 bytes,
2048 bits. Likely to fit in a single datagram, and you can download six
thousand of them per second on a 12Mbs connection.
On a third world cell phone connection, downloading a one hundred kilobyte object has high risk of failure, and busy TCP_IP connection has short life expectancy.
For communication with client wallets, we aim that message objects received from a client should generally be smaller than 20KB, and records sent to a client wallet should generally be smaller than one hundred KB. For peer wallets and server wallets, generally smaller than 2MB. Note that bittorrent relies on 10KB message objects to communicate potentially enormous and complex data structures, and that the git protocol communicates short chunks of a few KB. Even when you are accessing a packed file over git, you access it in relatively small chunks, though when you access a git repository holding packed files over https protocol, you download the whole, potentially enormous, packed file as one potentially enormous object. But even with git over https, you have the alternative of packing it into a moderate number of moderately large packed files, so it looks as if there is a widespread allergy to very large message objects. Ten K is the sweet spot, big enough for context information overheads to be small, small enough for retries to be non disruptive, though with modern high bandwidth long fat pipes, big objects are less of a problem, and streamline communication overheads.
# How many shared secrets, how often constructed
The overhead to construct a shared secret is 256 bits and 1.25 milliseconds, so, on a ten Megabit per second connection, if the CPU spent half its time establishing shared secrets, it could establish one secret every three hundred microseconds, eg, one secret every three thousand bits.
Since a minimal packet is already a couple of hundred bits, this does not give a huge amount of room for a DDoS attack. But it does give some room. We really should be seriously DDoS resistant, which implies that every single incoming packet needs to be quickly testable for validity, or cheap to respond to. A packet that requires the generation of a shared secret it not terribly expensive, but it is not cheap.
So, we probably want to impose a cost on a client for setting up a shared
secret, And since the server could have a lot of clients, we want the cost
per server to be small, which means cost per client to be mighty small in
the legitimate non DDoS scenario it only is going to bite in the DDoS
scenario. Suppose the server might have a hundred thousand clients, each
with sixteen kilobytes of connection data, for a total of about two
gigabyes of ram in use managing client connections. Well then, setting up
shared secrets for all those clients is going to take twelve and a half
seconds, which is quite a bit. So we want a shared secret, once set up, to
last for at least ten to twenty minutes or so. We dont want clients
glibly setting up shared secrets at whim, particularly as this could be a
relatively high cost on the server for a relatively low cost on the
client, since the server has many clients, but the client does not have
many servers.
We want shared secrets to be long lived enough that the cost in memory is
roughly comparable to the cost in time to set them up. A gigabyte of
shared secrets is probably around ten million shared secrets, so would
take three hours to set up. Therefore, we dont need to worry about
throwing shared secrets away to save memory it is far more important to
keep them around to save computation time. This implies a system where we
keep a pile of shared secrets, and the accompanying network addresses in
memory. Hashtable that hashes wallets existing in other processes, to
handles to shared secrets and network addresses of existing in this
process. So each process has the ability to speak to a lot of other
processes cached, and probably has some durable connections to a few other
processes. Which immediately makes us think about flood filling data
through the system without being vulnerable to spam.
Setting up tcp connections and tearing them down is also costly, but it looks as though, for some reason, existing code can only handle a small number of tcp connections, so they encourage you to cotinually tear them down and recreate them. Maybe we should shut down a tcp connection after eighteen seconds of nonuse. Check them every multiple of 8 seconds past epoch, refrain from reuse twenth four seconds past the epoch, and shut them down altogether after thirty two seconds. (The reason for checking them at certain time since the epoch is that shutdown is apt to go more efficienty if initiated at both ends.
Which means it would be intolerable to have a shared secret generation in
every UDP packet, or even very many UDP packets, so to prevent DDoS attack,
and just to have efficient communications, have to have a deal where you
cheaply for the server, but potentially expensively for the client, establish
a connection before you construct a shared secret.
A five hundred and twelve bit hash however takes 1.5 microseconds which
is cheap. We can use hashes to resist dos attacks, making the client
return to us the state cookie unchanged. If we have a ten megabit
connection, then every packet is roughly the size of a hash, in which case
the hash time is roughly three hundred megabits per second, not that
costly to hash everything.
How big a hash code do we need to identify the shared secret? Suppose we generate one shared secret every millisecond microseconds. Then thirty two bit hashcodes are going to roll over in forty days. If we have a reasonable timeout on inactive shared secrets, reuse is never going to happen, and if it does happen, the connection fails, Well, connections are always failing for one reason or another, and a connection inappropriately failing is not likely to be very harmful, whereas a connection seemingly succeeding, while both sides make incorrect and different assumptions about it could be very harmful.
# Message UDP protocol for messages that fit in a single packet
When I look at [the existing TCP state machine](https://www.ietf.org/rfc/rfc0793.txt), it is hideously
complicated. Why am I thinking of reinventing that? [Syn cookies](http://cr.yp.to/syncookies.html) turn out
to be less tricky than I thought the server just sends a secret short hash of
the client data and the server response, which the client cannot predict, and
the client response to the server response has to be consistent with that
secret short hash.
Well, maybe it needs to be that complicated, but I feel it does not. If I find that it really does need to be that complicated, well, then I should not consider re-inventing the wheel.
Every packet has the source port and the destination port, and in tcp initiation, the client chooses its source port at random (bind with port zero) in order to avoid session hijacking attacks. Range of source ports up to 65535
Notice that this gives us $2^{64}$ possible channels, and then on top of that we have the 32 bit sequence number.
IP eats up twenty bytes, and then the source and destination ports eat four more bytes. I am guessing that NAT just looks at the port numbers and address of outgoing, and then if a packet comes in equivalent incoming, just cheerfully lets it through. TCP and UDP ports look rather similar, every packet has a specific server destination port, and a random client port. Random ports are sometimes restricted to 0xC000-0xFFFF, and sometimes mighty random (starting at 0x0800 and working upwards seems popular) But 0xC000-0xFFFF viewed as a hashcode seems ample scope. Bind for port 0 returns a random port that is not in use, use that as a hashcode.
The three phase handshake is:
1. Client: SYN my sequence number is X, my port number is random port A, and your port number is well known port B.
1. Server: ACK/SYN your sequence number is X, my Sequence number is Y, my Port number is well known B, and your port number is random port A.
1. Client: ACK your sequence number is Y, my sequence number is X+1, my port number is random port A, and your port number is well known port B.
Sequence number is something like your event hashcode or perhaps event
hashcode for grouped events, with the tcp header being the group.
Assume the process somehow has an accessible and somehow known open UDP port.
Client low level code somehow can get hold of the process port and IP
address associated with the target elliptic point, by some mechanism we are
not thinking about yet.
We dont want the server to be wide open to starting any number of new
shared secrets. Shared secrets are costly enough that we want them to last
as long as cookies. But at the same time, recommended practice is that
ports in use do not last long at all. We also might well want a redirect to
another wallet in the same process on the same server, or a nearby process
on a nearby server. But if so, let us first set up a shared secret that is
associated with the shared secret on this port number, and then we can talk
about shared secrets associated with other port numbers. Life is simpler if
a one to one mapping between access ports and durable public and private keys,
even if behind that port are many durable public and private keys.
# UDP protocol for potentially big objects
The tcp protocol can be thought of as the tcp header, which appears in every packet of the stream, being a hashcode event object, and the sequence number, which is distinct and sequential in every packet of the unidirectional stream, being a std:dequeue event object, which fifo queue is associated with hashcode event object.
This suggests that we handle a group of events, where we want to have an event that fires when all the members of the group have successfully fired, or one of them has unrecoverably failed, with the group being handled as one event by a hashcode event object, and the the members of the group with event objects associated with a fifo queue for the group.
When a member of the group is triggered, it is added to the queue. When it is fired, it is marked as fired, and if it is the last element of the queue, it is removed from the queue, and if the next element is also marked as fired, that also is removed from the queue, until the last element of the queue is marked as triggered but not yet fired.
In the common case where we have a very large number of members, which are fired in the order, or approximately the order, that they are triggered, this is efficient. When the group event is marked as all elements triggered and all elements fired, and the fifo queue empty then that fires the group event.
Well, that would be the efficient way to handle things if we were implementing TCP, a potentially infinite stream, all over again, but we are not.
Rather, we are representing a big object as a stream of objects, and we know the size in advance, so might as well have an array that remains fixed size for the entire lifetime of the group event. The member event identifiers are indexes into this one big fixed size array.
The event identifier is run time detected as a group event identifier, so it expects its event identifier to be followed by an index into the array, much as the sequence number immediately follows the TCP header.
I would kind of like to have a [QUIC] protocol eventually, but that can
wait.If we have a UDP protocol, the communicating parties will negotiate a
UDP port that uniquely identifies the processes on both computers. Associated
with this UDP port will be the default public keys and the hash of the
shared secret derived from those public keys, and a default decryption shared
secret. The connection will have a keep alive heartbeat of small packets,
and a data flow of standard sized large packets, each the same size. Each
message will have a sequence number identifying the message, and each UDP
packet of the message will have the sender sequence number of its message,
its position within the message, and, redundantly, the power of two size of
the encrypted message object. Each message object, but not each packet
containing a fragment of the message object, contains the unencrypted hashtag
of the shared secret, the hashtag of the event object of the sender, which
may be null if it is the final message, and, if it is a reply, the hashtag of
event object of the message to which it is a reply, and the position within
the decryption stream as a multiple of the power of two size of the encrypted
message. This data gets converted back into standard message format when it
is taken off the UDP stream.
Every data packet has a sequence number, and each one gets an ack, though only when the input queue is empty, so several data packets get a group ack. If an ack is not received, the sender sends a nack. If the sender responds with a nack (huh, what packets?) resends the packets. If the sender persistently fails to respond, sending the message object failed, and the connection is shut down. If the sender can respond to nacks, but not to data packets, maybe our data packet size is too big, so we halve it. If that does not work, sending the message object failed, and the connection is shut down.
[QUIC] streams will be created and shut down fairly often, each time with a new shared secret, and message object reply may well arrive on a new stream distinct from the stream on which it was sent.
Message objects, other than nacks and acks, intended to manage the UDP stream
are treated like any other message object, passed up to the message layer,
except that their result gets sent back down to the code managing the UDP
stream. A UDP stream is initiated by a regular message object, with its own
data to initiate a shared secret, small enough to fit in a single UDP packet,
it is just that this message object says “prepare the way for bigger message
objects” the UDP protocol for big message objects is built on top of a UDP
protocol for message objects small enough to fit in a single packet.