wallet/docs/generic_client_server_program.html

<!DOCTYPE html>
<html lang="en"><head>

		<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
		<style>
		body {
			max-width: 30em;
			margin-left: 2em;
			}
			p.center {
				text-align:center;
				}
		</style><title>generic client server program</title></head><body>

<p><a href="./index.html"> To Home page</a> </p>

<h1>Generic Client Server Program</h1><p>


Need a MIDL like language which specifies messages, and
generates skeleton program with key handling and
protocol negotiation, in which the distributed public
keys are hashes of rules for recognizing valid full
asymmetric encryption public keys, which interface with a
key distribution and key management system, and which
generates skeleton client server programs with full
protocol negotiation, DDoS resistance, end to end
encryption, support for once and only once messaging,
with or without in order messaging support for at least
once messaging, with or without in order messaging. 
</p><p>


A single endpoint is displayed to the user with a
single name, and for all communications and all
protocols with an endpoin under all applications that
that should conceived of by the particular user a single
site with a single name, and for all applications
displaying information from that site to a single user,
there should be a single encryption and authentication
key per session, many channels per protocol, many
protocols per session, one key for all – thereby avoiding
the security holes that result from the browser
recreating keys many times in what is perceived by the
user to be single session with a single entity. In the
brower, web pages nominally from the same origin can
modify web pages secured using a different certificate
and a different certificate authority.  To avoid the
endless security complications that ensue, all
connections in a single session should rest on a single
shared secret – we should never recreate shared secrets
from public keys in the course of a session or a
transaction.  </p><p>

The keying mechanism should support secret handshakes,
for the core revolutionary problem is the long march
through the institutions.  </p><p>

The keying mechanism should also support the corporate
form – not sure if anything special needs to be done to
support signatures that represent shareholder votes.
</p><p>

The protocol negotiation mechanism should enable anyone
to add their own protocol without consulting anyone and
without disruption or significant inefficiency
ensuing, while guaranteeing that once a connection is
set up, both ends are talking the same protocol. 
</p><p>

The keying mechanism shall start with a hash type
identifier, followed by the hash.  The hash is a hash of
a rule prescribing a valid key in full binary. 
Permissible rules are</p><ul><li>

The hash is the hash of full key itself.  </li><li>

The hash is the hash of a full key plus a name path.
This rule implies that a valid key consists of that key
signing another key, which signs another key, which signs
another key, which keys are named according to the name
path, and have expiry dates that are mutually consistent
(a key cannot sign a key that expires beyond its own
expiry) and have not expired yet.   Each key in the
sequence except the first is declared by the preceding
signature to have a name, creation date, and expiry date,
and that name agrees with the name path in the hash. 
The highest key in the chain has a name and creation
dates specified by the hash, but no expiry date. Where
successive keys in the chain of signatures have the same
name, it corresponds to a single link.

The full key itself with an expiry date and preceding
hash key.   We can then change short keys – which
implies many short keys can correspond to the same
identity.  </li></ul><p>


.  </p><p>

.  </p><p>


Whenever one writes a client server program, and whenever
one writes multi threaded software on the shared-nothing
model, for example python with threads connected by
queues, one tends to wind up violating the <a href="http://c2.com/cgi/wiki?DontRepeatYourself">Don’t
Repeat Yourself</a> principle, and the <a href="http://c2.com/cgi/wiki?OnceAndOnlyOnce">Once And
Only Once</a> principle, eventually leading to total code
meltdown in any very large and very long lived project, so
that it eventually becomes impossible to make any further
substantial changes to the project that involve changing
the interface between client and server, even when it
becomes apparent that the architecture contains utterly
disastrous failings that are complete show stoppers.
</p><p>

The present browser crisis with https is an example of
such an insoluble show stopper, one example of a great
many.  </p><p>

Many of the requirements for a generic server program
are discussed in Beowulf’s <a href="http://qmail_security.pdf">retrospective on
Qmail</a>.  The Unix generic server program is inetd.
</p><p>

A server program has to recover from crashes, and give
performance that degrades gracefully in a resource
limited situation rather than crashing and burning. 
</p><p>

Inetd listens for network connections.  When a
connection is made, inetd runs another program to handle
the connection.  For example, inetd can run qmail-smtpd
to handle an incoming SMTP connection.  The qmail-smtpd
program doesn’t have to worry about networking,
multitasking, etc.; it receives SMTP commands from one
client on its standard input, and sends the responses to
its standard output.  </p><p>

If a particular instance of qmail-smtpd crashes or
hangs, it is not a problem.  Each instance does limited
processing, stuffs the result into a queue, activates the
program that processes the queue if it is not presently
running, and shuts down.  The queue is processed with
limited parallelism, thus is somewhat resistance resource
limited crash and burn.  </p><p>

I envisage something much more capable – that instead
of specifying an IO stream, one specifies an interface,
which gets compiled into message crackers, a skeleton
program with protocol negotiation, and unit tests. 
</p><p>

In accordance with the <a href="http://c2.com/cgi/wiki?DontRepeatYourself">Don’t
Repeat Yourself</a> principle, the specification is human
readable, and records the history of changes and variants
in the specification.  The history of changes and
variants is compiled into protocol negotiation code for
both sides of the interface, and the specification itself
is compiled into message generators and message crackers
for both sides of the interface.	 The generated code
is regenerated every major make, and is never checked
into source code control, nor edited by humans, though
the generated skeleton code may be copied by humans as a
starting point.  </p><p>

We use a compiler compiler such as CoCo/R to generate a
compiler that compiles the interface specification to the
program.  An interface specification specifies several
interface versions.  In the absence of C code, it
generates a skeleton program.  In the presence of C code
for the previous version of the protocol, it adds a
skeleton for the new version of the protocol to the
existing program.  (I favor CoCo, because it can compile
itself, though it does not seem a very popular choice.)
</p><p>

This functionality is akin to MIDL.  MIDL was a
Microsoft compiler that compiled interface description
language into C++ code that implemented that interface
and interface negotiation – thereby ensuring that
everyone used the same methods to find out what
interfaces were available, and to advertise what
interfaces they made available.

MIDL/IDL/COM was designed for calls within a single
address space and a single thread, and worked great for
this problem, but their efforts to extend it to inter
thread, inter process, and across network calls varied
from bad to catastrophic.  </p><p>

Google’s protobuf is designed to work between threads -
it is a message specification protocl, but lacks the
key feature of MIDL:  Protocol negotiation – no run time
guarantee that data was serialized the way it will be
deserialized.  </p><p>

We can also use a meta programming system such as Boost,
which gives C++ lisp like meta programming
capabilities.  Unfortunately Boost, though a no doubt
excellent language, runs on the virtual machine provided
by compilers to implement templates, and the most trivial
operations suck up large amounts of stack space and
symbol space, so despite the coolness of the language, I
expect it to die horribly in real world applications.
</p><p>

We want message crackers, so as to protect against the
buffer overflow problem.  But what about the resource
limit problem?  </p><p>

Launching a new program for every connection is costly,
even in Unix, and much more costly for Windows.  I
envisage that the server program will use the TBB,
creating a new thread for each connection.  That is
efficient, but it means that a failure in one connection
can inconvenience all others, that a bug can allow one
thread to access information from other threads.  For
the latter problem, I think the answer is just "no bugs"
- or at least no bugs that allow access to random memory,
but there is one bug we are bound to have:  Resource
exhaustion.  </p><p>

How does the generic server program, the program
generated for a particular interface specification,
handle resource exhaustion? </p><p>

We need our program to be neutral to DDoS – does not
allow anything that is cheap for an anonymous
attacker’s machine but expensive for the server
machine, and we need our program to degrade gracefully
when legitimate usage exceeds its capability.  </p><p>

First, when establishing new connections, we have a
limited cache for connections in the process of being
established.  If that cache is exceeded, we send an
encrypted cookie to the client, and stop holding state
for connections in progress – see the discussion of
defenses against a distributed denial of service attack
on TCP – syn flooding and TCP cookies.  </p><p>

Our address space for connections is large and variable
precision.  Each incoming packet contains a clear index
to a shared secret, which is used to decrypt the first
block in the incoming packet, which has to be correct
format and in window for the connection stream, or the
packet gets discarded.  We now, after the decryption,
have the connection stream identifier, which may contain
an index to a larger set of shared secrets, a set of
connections and streams larger than the in memory list
for shared secrets for the initial block.  </p><p>

Having identified that the stream is legit, we then
check if the packet of the stream corresponds to a
presently running thread of a presently running
protocol interpreter.  If it is, we dispatch the packet
to that program.  If it is not, but the protocol
interpreter is running, we dispatch the packet, and the
protocol interpreter recovers the state from the database
and launches a new thread for that stream.  Similarly,
if the protocol interpreter is not running .  .  .
</p><p>

Each thread in the protocol interpreter times out after
a bit of inactivity, and saves its state to the
database.  Persistently busy threads time out
regardless, to discriminate against needy clients.  When
no threads remain, the protocol interpreter shuts down.
From time to time we launch new instances of the protocol
interpreter, (we being the master interpreter that
handles all protocols) and failure of the old instance to
shut down within a reasonable time is detected and
presented as an error.  </p><p>

The master interpreter monitors resource usage, and
gracefully declines to open new connections, and shuts
down hoggish old connections, when resources get
sparse.  The skeleton interpreter generated for each
protocol has a cache limit and database limit for the
number of connections, and an idle and absolute time
limit on cache and database connections – when the cache
limit is exceeded, the connection information goes to
database, when the database limit is exceeded, the
connection is terminated.  </p><p>

Internet facing programs will always encounter malicious
data.  The two huge problems in C and C++ internet
facing programs are buffer overflow and malloc failure.
It is possible to take care of buffer overflow just by
tightly vetting all string inputs.  All string inputs
have to have a defined format and defined maximum length,
and be vetted for conformity.  This is doable, problem
solved – and the protocol definition should specify
restraints on strings, with default restraints if none
specified, and the code generated by the protocol
interpreter should contain such checks, guaranteeing that
all input strings have defined maximums, and defined
forbidden or permitted characters.  Malloc, however is a
harder problem.  No one is going to write and test error
recovery from every malloc.  We therefore have to
redefine malloc in the library as malloc_nofail.   If
someone <em>is</em> going to write error recovery code,
he can explicitly call malloc_can_fail.  If
malloc_no_fail fails, program instance shuts down,
thereby relieving resource shortage by degrading service
if the malloc failure is caused by server overload, or
frustrating the attack if the malloc failure is caused by
some clever attack.</p><p>

We cannot ensure that nothing can ever go wrong, therefore
we must have a crash-and-restart process, that detects
process failure, and auto relaunches.  Unix encourages
this code pattern, by providing Inetd.  This is perhaps,
overly costly, but continually spawning and killing off
new processes is inherently robust.  Therefore, ever so
often, we must spawn a new instance, and every so often,
old instances must be destroyed.  </p><p>

The protocol interpreter should automatically generate
such a robust architecture – a working skeleton program
that is internet facing and architected in ways that make
programs derived from it unlikely to fail under attack.
The experience with web servers is that the efficient and
robust solution is multiple instances, each with multiple
threads.  For robustness, retire an instance after a
while.  One thread per client session, new sessions with
same client in the same instance where possible (what is
a session is sometimes unclear) and after a while, an old
instance gets no new threads, and is eventually forcibly
shut down, if it does not shut itself down when out of
client threads and no prospect of getting new ones. 
</p>

<p style="background-color : #ccffcc;  font-size:80%">These documents are
licensed under the <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Creative
Commons Attribution-Share Alike 3.0 License</a></p>
 </body></html>