2022-02-16 00:53:01 -05:00
|
|
|
|
---
|
|
|
|
|
title: Replacing TCP, SSL, DNS, CAs, and TLS
|
2023-10-26 03:12:35 -04:00
|
|
|
|
sidebar: true
|
2022-05-06 22:49:33 -04:00
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
# related
|
|
|
|
|
|
|
|
|
|
[Client Server Data Representation](client_server.html){target="_blank"}
|
|
|
|
|
|
|
|
|
|
# Existing work
|
|
|
|
|
|
|
|
|
|
[µTP]:https://github.com/bittorrent/libutp
|
|
|
|
|
"libutp - The uTorrent Transport Protocol library"
|
|
|
|
|
{target="_blank"}
|
|
|
|
|
|
|
|
|
|
[µTP], Micro Transport Protocol has already been written, and it is just a
|
|
|
|
|
matter of copying it and embedding it where possible, and forking it if
|
|
|
|
|
unavoidable. DDOS resistance looks like it is going to need forking.
|
|
|
|
|
|
2022-05-09 08:08:13 -04:00
|
|
|
|
It implements ledbat, a protocol designed for applications that download
|
|
|
|
|
bulk data in the background, pushing the network close to its limits, while
|
|
|
|
|
still playing nice with TCP.
|
|
|
|
|
|
2022-05-06 22:49:33 -04:00
|
|
|
|
Implementing consensus over [µTP] is going to need [QUIC] style streams,
|
|
|
|
|
that can slow down or fail without the whole connection slowing down or
|
|
|
|
|
failing, though it might be easier to implement consensus that just calls
|
|
|
|
|
µTP for some tasks.
|
|
|
|
|
|
2022-05-09 08:08:13 -04:00
|
|
|
|
I have not investigated what implementing short fixed length streams over
|
|
|
|
|
[µTP] would involve. Bittorrent already necessarily does something mighty
|
|
|
|
|
like that. Maybe it just sequentializes everything. Which kind of makes
|
|
|
|
|
sense, a single concurrent process managing each connection is easier to
|
|
|
|
|
program and comprehend, even if it cannot give optimal performance.
|
|
|
|
|
Obviously it must have a request response layer, documented only in
|
|
|
|
|
source code. The question then is how it maps that layer onto a µTP
|
|
|
|
|
connection. You are going to have to copy, not just µTP, but that layer,
|
|
|
|
|
which should be part of µTP, but probably is not. You will have to
|
|
|
|
|
factorize that they probably not cleanly factorized.
|
|
|
|
|
|
|
|
|
|
Their request response layer is probably somewhat documented in
|
|
|
|
|
[BEP0055] I suspect that what I need is not just µTP, but the largest common factors of [BEP0055]
|
|
|
|
|
|
2022-05-06 22:49:33 -04:00
|
|
|
|
[BEP0055]:https://www.bittorrent.org/beps/bep_0055.html
|
|
|
|
|
"BEP0055"
|
|
|
|
|
{target="_blank"}
|
|
|
|
|
|
|
|
|
|
[`ut_holepunch` extension message]:http://bittorrent.org/beps/bep_0010.html
|
|
|
|
|
"BEP0010"
|
|
|
|
|
{target="_blank"}
|
|
|
|
|
|
|
|
|
|
[libtorrent source code]:https://github.com/arvidn/libtorrent/blob/c1ade2b75f8f7771509a19d427954c8c851c4931/src/bt_peer_connection.cpp#L1421
|
|
|
|
|
"bt_peer_connection.cpp"
|
|
|
|
|
{target="_blank"}
|
|
|
|
|
|
|
|
|
|
µTP does not itself implement hole punching, but interoperates smoothly
|
|
|
|
|
with libtorrents's [BEP0055]'s [`ut_holepunch` extension message], which is
|
|
|
|
|
only documented in [libtorrent source code].
|
|
|
|
|
|
|
|
|
|
A tokio-rust based µTP system is under development, but very far from
|
|
|
|
|
complete last time I looked. Rewriting µTP in rust seems pointless. Just
|
|
|
|
|
call it from a single tokio thread that gives effect to a hundred thousand
|
|
|
|
|
concurrent processes. There are several projects afoot to rewrite µTP in
|
|
|
|
|
rust, all of them stalled in a grossly broken and incomplete state.
|
|
|
|
|
|
|
|
|
|
[QUIC has grander design objectives]:https://docs.google.com/document/d/1RNHkx_VvKWyWg6Lr8SZ-saqsQx7rFV-ev2jRFUoVD34/edit
|
|
|
|
|
{target="_blank"}
|
|
|
|
|
|
|
|
|
|
[QUIC has grander design objectives],and is a well thought out, well
|
|
|
|
|
designed, and well tested implementation of no end of very good and
|
|
|
|
|
much needed ideas and technologies, but relies heavily on enemy
|
|
|
|
|
controlled cryptography.
|
|
|
|
|
|
|
|
|
|
Albeit there are some things I want to do, consensus between a small
|
|
|
|
|
number of peers, by invitation and each peer directly connected to each of
|
|
|
|
|
the others, the small set of peers being part of the consensus known to all
|
|
|
|
|
peers, and all peers always online and responding appropriately, or els
|
|
|
|
|
they get kicked out. (Practical Byzantine Fault *In*tolerant consensus)
|
|
|
|
|
which it really cannot do, though it might be efficient to use a different
|
|
|
|
|
algorithm to construct consensus, and then use µTP to download the bulk data.
|
|
|
|
|
|
|
|
|
|
# Existing documentation
|
|
|
|
|
|
|
|
|
|
There is a great pile of RFCs on issues that arise with using udp and icmp
|
|
|
|
|
to communicate, which contain much useful information.
|
|
|
|
|
|
|
|
|
|
[RFC5405](https://datatracker.ietf.org/doc/html/rfc5405#section-3), [RFC6773](https://datatracker.ietf.org/doc/html/rfc6773), [datagram congestion control](https://datatracker.ietf.org/doc/html/rfc5596), [RFC5595](https://datatracker.ietf.org/doc/html/rfc5595), [UDP Usage Guideline](https://datatracker.ietf.org/doc/html/rfc8085)
|
|
|
|
|
|
|
|
|
|
There is a formalized congestion control system `ECN` explicit congestion
|
|
|
|
|
control. Most severs ignore ECN. On a small proportion of routes, 1%,
|
|
|
|
|
ECN tagged packets are dropped
|
|
|
|
|
|
|
|
|
|
Raw sockets provide greater control than UDP sockets, and allow you to
|
|
|
|
|
do ICMP like things through ICMP.
|
|
|
|
|
|
2022-07-08 03:05:24 -04:00
|
|
|
|
I also have a discussion on NAT hole punching, [peering through nat](nat.html), that
|
2022-05-06 22:49:33 -04:00
|
|
|
|
summarizes various people's experience.
|
|
|
|
|
|
|
|
|
|
To get an initial estimate of the path MTU, connect a datagram socket to
|
|
|
|
|
the destination address using connect(2) and retrieve the MTU by calling
|
|
|
|
|
getsockopt(2) with the IP_MTU option. But this can only give you an
|
|
|
|
|
upper bound. To find the actual MTU, have to have a don't fragment field
|
|
|
|
|
(which is these days generally set by default on UDP) and empirically
|
|
|
|
|
track the largest packet that makes it on this connection. Which TCP does.
|
|
|
|
|
|
2024-06-15 22:28:08 -04:00
|
|
|
|
MTU (packet size) and MSS (data size, $MTU-40$) is a
|
2024-02-06 00:32:05 -05:00
|
|
|
|
[messy problem](https://www.cisco.com/c/en/us/support/docs/ip/generic-routing-encapsulation-gre/25885-pmtud-ipfrag.html)
|
|
|
|
|
Which can be side stepped by always sending packets
|
|
|
|
|
of size 576 contiaing 536 bytes of data.
|
|
|
|
|
|
2022-05-06 22:49:33 -04:00
|
|
|
|
## first baby steps
|
|
|
|
|
|
|
|
|
|
To try and puzzle this out, I need to build a client server that can listen on
|
|
|
|
|
an arbitrary port, and tell me about the messages it receives, and can send
|
|
|
|
|
messages to an arbitrary hostname:port or network address:port, and
|
|
|
|
|
which, when it receives a packet that is formatted for it, will display the
|
|
|
|
|
information in that packet, and obey the command in that packet, which
|
|
|
|
|
will typically be a command to send a reply that depicts what is in the
|
|
|
|
|
packet it received, which probably got transformed by passing through
|
|
|
|
|
multiple nats, and/or a command to display what is in the packet, which is
|
|
|
|
|
typically a depiction of how the packet to which this packet is a reply got
|
|
|
|
|
transformed
|
|
|
|
|
|
|
|
|
|
This test program sounds an awful lot like ICMP, which is best accessed
|
|
|
|
|
through raw sockets. Might be a good idea to give it the capability to send
|
|
|
|
|
ICMP, UDP, and fake TCP.
|
|
|
|
|
|
|
|
|
|
Raw sockets provide the lowest level access to the network available from
|
|
|
|
|
userspace. An immense pile of obscure and complicated stuff is in kernel.
|
|
|
|
|
|
2022-02-16 00:53:01 -05:00
|
|
|
|
# What the API should look like
|
|
|
|
|
|
2022-05-06 22:49:33 -04:00
|
|
|
|
It should be a consensus API for consensus among a small number of
|
|
|
|
|
peers, rather than message API, message response being the special case
|
|
|
|
|
of consensus between two peers, and broad consensus being constructed\
|
|
|
|
|
out of a large number of small invitation based consensi.
|
|
|
|
|
|
|
|
|
|
A peer explicitly joins the small group when its request is acked by a
|
|
|
|
|
majority, and rejected by no one.
|
2022-02-16 00:53:01 -05:00
|
|
|
|
|
|
|
|
|
On the other hand this involves re-inventing networking from scratch, as
|
|
|
|
|
compared to simply copying http/2, or some other reliable UDP system.
|
|
|
|
|
|
|
|
|
|
Total rewrites, however desirable and necessary, always fail
|
|
|
|
|
|
|
|
|
|
So on reflection this is a blue sky proposal - likely to involve immense delay:
|
|
|
|
|
|
|
|
|
|
I need to think about the way things should be done - but I don't want to
|
|
|
|
|
get lost in the weeds. I have repeatedly wasted a great deal of time
|
|
|
|
|
re-inventing stuff from scratch, only to find that when I was finished, I had
|
|
|
|
|
something vastly inferior to what already existed, so I wound up tossing
|
|
|
|
|
my work, and using someone else's library with minimum adaptation.
|
|
|
|
|
|
|
|
|
|
Many a time I see something is encrusted with ancient history, backward
|
|
|
|
|
compatibility means they cannot fix old mistakes, I design something new
|
|
|
|
|
and fresh, and vastly superior, and discover that there were one hundred
|
|
|
|
|
and one issues that old history encrusted thing had encountered and dealt
|
|
|
|
|
with, and I had not foreseen, that not all of that mighty pile of code is crap
|
|
|
|
|
to work around past mistakes which must continue to be supported, but a
|
|
|
|
|
lot of it is issues I had not foreseen having to deal with, and had not
|
|
|
|
|
planned a path to dealing with them.
|
|
|
|
|
|
|
|
|
|
When implementing stuff from scratch, all too often one discovers there
|
|
|
|
|
are no end of reasons for all the stuff one thought bad and unnecessary in
|
|
|
|
|
existing libraries.
|
|
|
|
|
|
|
|
|
|
But on with the vision. Though it will likely be vastly faster to just fix
|
|
|
|
|
someone else's library to have real security.
|
|
|
|
|
|
|
|
|
|
Although the api represents messages, rather than connections, it will
|
|
|
|
|
implicitly have a very large number of connections, in that a connection is
|
|
|
|
|
your current state with a counterparty, expected protocols (message types) and all that.
|
|
|
|
|
|
|
|
|
|
For an app to poll a very large number of connections over the network,
|
|
|
|
|
`select` does not cut the mustard. Network apis have been evolving, each in
|
|
|
|
|
its own idiosyncratic way, to the app making O(1) additions and deletions to
|
|
|
|
|
list of counterparties on the network whose messages it is listening to,
|
|
|
|
|
and getting notifications that are O(number of events) rather than
|
|
|
|
|
O(number of counterparties).
|
|
|
|
|
|
|
|
|
|
The way this should be done is a linked list of data structures containing
|
|
|
|
|
events, which the app can poll locklessly, or wait on (with a timer event
|
|
|
|
|
guaranteed to appear in the list eventually if it is waiting on it). If the app
|
|
|
|
|
fails to free anything from the list after an unreasonably long time,
|
|
|
|
|
suggesting that the app has shut down ungracefully or crashed, and there
|
|
|
|
|
are rather too many things on the list, the process that is putting things on
|
|
|
|
|
the list will start by pushing back on the parties sending messages to the
|
|
|
|
|
app, and end by shutting down their connections and discarding their data.
|
|
|
|
|
The network events live entirely in memory and are volatile. If they
|
|
|
|
|
represent long lived relationships, it is up to the app to commit the
|
|
|
|
|
information that they represent to disk.
|
|
|
|
|
|
|
|
|
|
Every message has a public key of sender, a public key of recipient, an
|
|
|
|
|
potentially an in-regards-to hash, a reply-to hash, and an in-reply-to hash.
|
|
|
|
|
Some or all of these hashes may be null. It seldom makes sense for all of
|
|
|
|
|
them to be null, and it seldom makes sense for all of them to be non null.
|
|
|
|
|
Usually reply-to is null, and it does not always make sense for it to be non
|
|
|
|
|
null.
|
|
|
|
|
|
|
|
|
|
The reply-to field opens up a very large can of worms, in that its main use
|
|
|
|
|
is to reference a third party message that came from a third party server,
|
|
|
|
|
with its own type information and sender public key, and the how does the
|
|
|
|
|
sender know the recipient has or can obtain that message?
|
|
|
|
|
|
|
|
|
|
Every hash and every public key represents a potential endpoint, and thus
|
|
|
|
|
represents an additive type, or rather gives the system potential clues on
|
|
|
|
|
how to discover a mutually known additive type. (Reflect on the slow and
|
|
|
|
|
chaotic semi automated complexity of how the many protocols involved in
|
|
|
|
|
sending and receiving an email message are discovered, every time, for
|
|
|
|
|
every email message.)
|
|
|
|
|
|
|
|
|
|
Some of the time, the message type is only known from one of these
|
|
|
|
|
hashes – they imply the type information, without which the recipient
|
|
|
|
|
would not know how to parse the message, and the recipient has to be able
|
|
|
|
|
to recognize them before he can recognize anything else. And some of the
|
|
|
|
|
time, figuring out the message type from these hashes is non trivial or just
|
|
|
|
|
flat out fails. No general automatic one size fits all procedure can work on
|
|
|
|
|
every mysterious second party hash. This is a problem that has to be dealt
|
|
|
|
|
with ad hoc use case by use case, protocol by protocol, message type by
|
|
|
|
|
message type.
|
|
|
|
|
|
|
|
|
|
Not all messages can be sent reliably, but the sender gets a notification
|
|
|
|
|
event – failed, succeeded, replied to, or unlikely to be known, and the
|
|
|
|
|
sender can immediately find out either the likely timing of such
|
|
|
|
|
notification, or that the likely timing of such notification is unknown – and
|
|
|
|
|
usually that the likely timing of such notification is unknown generates an
|
|
|
|
|
exception.
|
|
|
|
|
|
|
|
|
|
The api is potentially multilayered – the message may well get translated
|
|
|
|
|
to a multitude of similarly structured messages, that set up the connection,
|
|
|
|
|
find out information about the recipient, all that stuff, and when those
|
|
|
|
|
messages go on the wire, they do not necessarily have any of this stuff –
|
|
|
|
|
commonly they just have the network, the port address, and some numbers
|
|
|
|
|
that uniquely identify the context, which numbers are unique to the
|
|
|
|
|
connection, but unlike the hashes from which they are derived, not
|
|
|
|
|
globally unique, are sequential identifiers, not hashes. But at the top level,
|
|
|
|
|
the network address, the port, and all that stuff is just not represented,
|
|
|
|
|
except implicitly in that the public key of the recipient may well get
|
|
|
|
|
looked up in a hash table that may well have the network address and the port.
|
|
|
|
|
|
|
|
|
|
On the wire, network address and port serves the function of in-regards-to,
|
|
|
|
|
and will wrap stuff that provides a finer grained function of in-regards-to
|
|
|
|
|
and in-reply-to -- as I said, multilayered, with the hashes being internally
|
|
|
|
|
mapped to to data that serves equivalent functionality. Network address
|
|
|
|
|
and port being the outermost layer on the wire.
|
|
|
|
|
|
|
|
|
|
On the wire, once a connection is established, the sender and recipient
|
|
|
|
|
public keys are implicit in the ip header, and rest is opaque payload,
|
|
|
|
|
maximum payload being 1kiB. Inside the payload, the representation
|
|
|
|
|
depends on the message type, which was established when the connection
|
|
|
|
|
was established – the in-reply-to of the contained message is the unique
|
|
|
|
|
sequential nonce of the message being replied to, rather than the hash of
|
|
|
|
|
that message.
|
|
|
|
|
|
|
|
|
|
In the api, the application and api know the message type, because
|
|
|
|
|
otherwise the api just would not work. But on the rare occasions when the
|
2022-02-18 15:59:12 -05:00
|
|
|
|
message is represented globally, outside the api, *then* it needs a message type header.
|
2022-02-16 00:53:01 -05:00
|
|
|
|
|
|
|
|
|
# TCP is broken
|
|
|
|
|
|
|
|
|
|
TCP was designed in more trusting times, when the name system
|
|
|
|
|
consisted of a widely shared hosts file, and everyone trusted everyone.
|
|
|
|
|
|
|
|
|
|
Over the years people have piled warts on top of TCP and warts on top of
|
|
|
|
|
warts to fix one problem after another, and every fix results in additional round trips
|
|
|
|
|
|
|
|
|
|
Thus “Cloudfare is checking your browser, you will be redirected shortly”
|
|
|
|
|
|
|
|
|
|
Every additional round trip before a web page comes up results in a
|
|
|
|
|
significant loss of viewers. Hence http2. Which fails to fix the DDOS and
|
|
|
|
|
cloudfare problem.
|
|
|
|
|
|
|
|
|
|
TCP is a major problem, which is slowing down the internet. DDoS
|
|
|
|
|
protection and the certificate mess are warts growing on top of warts.
|
|
|
|
|
|
|
|
|
|
Any business that resists corporate cancer is going to come under DDoS,
|
|
|
|
|
and if it employs a DDoS resistance service, that service is likely to place
|
|
|
|
|
pressure on the business to do political stuff that is counterproductive to
|
|
|
|
|
pursuing a profit. And even if it does not, the DDoS service slows down
|
|
|
|
|
people trying to view the business website.
|
|
|
|
|
|
|
|
|
|
If the TCP replacement fixes those warts, you get more views.
|
|
|
|
|
|
|
|
|
|
# Domain name system and SSL is broken
|
|
|
|
|
|
|
|
|
|
Any organization that has a certificate authority in its pocket can perform
|
|
|
|
|
a man in the middle attack on an SSL connection, though the CAA domain
|
|
|
|
|
name record somewhat mitigates this problem.
|
|
|
|
|
|
|
|
|
|
We need to also need to replace the TCP/SSL/CA/DNS system because
|
|
|
|
|
there is money in it. A great deal of money.
|
|
|
|
|
|
|
|
|
|
The trouble with an ICO (initial coin offering), is that the issuer has no
|
|
|
|
|
obligation to do anything other than take the money and run. We are
|
|
|
|
|
moving to an economy where much of the value is “goodwill”, “goodwill”
|
|
|
|
|
being names with reputations and relationships. The blockchain (or
|
|
|
|
|
blockdag, since blockdags theoretically have better scaling than
|
|
|
|
|
blockchains) could be used to render this value liquid in IPOs by having
|
|
|
|
|
both names and money on the blockchain.
|
|
|
|
|
|
|
|
|
|
Atomic transactions between blockchains, plus names on the blockchain
|
|
|
|
|
with money, a replacement for TCP/SSL/CAs/DNS could support sovereign
|
|
|
|
|
corporations on the blockchain, so that an ICO could be an IPO (Initial
|
|
|
|
|
Public Offering). If the blockchain is a name service as well as a money
|
|
|
|
|
service, it could give the investors ownership of the name. The owners of
|
|
|
|
|
examplecorp shares get to designate the board public key, and the board gets to
|
|
|
|
|
designate the public key of CEO@examplecorp from time to time, thus
|
|
|
|
|
rendering the value of a name potentially liquid.
|
|
|
|
|
|
|
|
|
|
Cryptocurrency exchanges are run by crooks, and are full of crooks each
|
|
|
|
|
trying to scam all the other crooks.
|
|
|
|
|
|
|
|
|
|
If you don’t know who the pigeon is, you are the pigeon.
|
|
|
|
|
|
|
|
|
|
A healthy cryptocurrency market needs to leave the cryptocurrency
|
|
|
|
|
exchanges behind, replacing them with atomic blockchain transactions
|
|
|
|
|
between separate blockchains. They are dangerously centralized, and
|
|
|
|
|
linked to a corruptly regulated finance and accounting system, which
|
|
|
|
|
corruption we saw with Great Minority Mortgage Meltdown and the
|
|
|
|
|
Mortgage backed Security market from 2005 November to 2007, and saw
|
|
|
|
|
with MF Global. Jon Corzine did worse than embezzle client funds. He
|
|
|
|
|
embezzled client funds legally.
|
|
|
|
|
|
|
|
|
|
Demand for crypto currencies is driven in substantial part by the fact that
|
|
|
|
|
recent regulations have cheerfully set aside laws on fiduciary duty that are
|
|
|
|
|
millennia old. The exchanges cheerfully adhere to such regulations as they
|
|
|
|
|
find dangerously convenient, while taking advantage of cryptocurrency to
|
|
|
|
|
avoid those regulations that they find inconvenient.
|
|
|
|
|
|
|
|
|
|
The banks, the stock exchanges, and the big accounting firms are regulated
|
|
|
|
|
agencies whose regulators are in their pocket. The crypto currency exchanges
|
|
|
|
|
are semi regulated, taking advantage of regulations written for those who
|
|
|
|
|
have regulators in their pocket.
|
|
|
|
|
|
|
|
|
|
The cryptocurrency market needs to get rid of exchanges, starting with
|
|
|
|
|
cryptocurrency exchanges, and proceeding to get rid of stock exchanges.
|
|
|
|
|
|
|
|
|
|
An exchange exists to provide an escrow that faithfully observes
|
|
|
|
|
its fiduciary duty. And there have been a great many recent examples of such
|
|
|
|
|
entities getting up to no good, and in the case of the mortgage backed
|
|
|
|
|
security market, up to no good with enormous amounts of money.
|
|
|
|
|
|
|
|
|
|
A cryptocurrency with a name system could eat their lunch, greatly enriching
|
|
|
|
|
its founders in the process.
|
|
|
|
|
|
2022-05-06 22:49:33 -04:00
|
|
|
|
# Networking itself is broken
|
|
|
|
|
|
|
|
|
|
But that is too hard a problem to fix.
|
|
|
|
|
|
|
|
|
|
I had to sweat hard setting up Wireguard, because it pretends to be just
|
|
|
|
|
another `network adaptor` so that it can sweep away a pile of issues as out
|
|
|
|
|
of scope, and reading up posts and comments referencing these issues, I
|
|
|
|
|
suspect that almost no one understands these issues, or at least no one who
|
|
|
|
|
understands these issues is posting about them. They have a magic
|
|
|
|
|
incomprehensible incantation which works for them in their configuration,
|
|
|
|
|
and do not understand why it does not work for someone else in a subtly
|
|
|
|
|
different configuration.
|
|
|
|
|
|
|
|
|
|
## Internet protocol too many layer of abstraction
|
|
|
|
|
|
|
|
|
|
I have to talk internet protocol to reach other systems over the internet, but
|
|
|
|
|
internet protocol is a messy pile of ad hoc bits of software built on top of
|
|
|
|
|
ad hoc bits of software, and the reason it is hard to understand the nuts and
|
|
|
|
|
bolts when you actually try to do anything useful is that you do not
|
|
|
|
|
understand, and indeed almost no one understands, what is actually going
|
|
|
|
|
on at the level of network adaptors and internet switches. When you send a
|
|
|
|
|
udp packet, you are already at a high level of abstraction, and the
|
|
|
|
|
complexity that these abstractions are intended to hide leaks.
|
|
|
|
|
|
|
|
|
|
And because you do not understand the intentionally hidden complexity
|
|
|
|
|
that is leaking, it bites you.
|
|
|
|
|
|
|
|
|
|
### Adaptors and switches
|
|
|
|
|
|
|
|
|
|
A private network consists of a bunch of `network adaptors` all connected to
|
|
|
|
|
one `ethernet switch` and its configuration consists of configuring
|
|
|
|
|
the software on each particular computer with each particular `network adaptor`
|
|
|
|
|
to be consistent with the configuration of each of the others connected to
|
|
|
|
|
the same `ethernet switch`, unless you have a `DHCP server` attached to the
|
|
|
|
|
network, in which case each of the machines gets a random, and all too
|
|
|
|
|
often changing, configuration from that `DHCP server`, but at least it is
|
|
|
|
|
guaranteed to be consistent with the configuration of each of the other
|
|
|
|
|
`network adaptors` attached to that one `ethernet switch`. Why do DHCP
|
|
|
|
|
configurations not live forever, why do they not acknowledge the machine
|
|
|
|
|
human readable name, why does the ethernet switch not have a human
|
|
|
|
|
readable name, and why does the DHCP server have a network address
|
|
|
|
|
related to that of the ethernet switch, but not a human readable name
|
|
|
|
|
related to that of the ethernet switch?
|
|
|
|
|
|
|
|
|
|
What happens when you have several different network adaptors in one computer?
|
|
|
|
|
|
|
|
|
|
Obviously an IP address range has to be associated with each network
|
|
|
|
|
adaptor, so that the computer can dispatch packets to the correct adaptor.
|
|
|
|
|
And when the network adaptor receives a packet, the computer has to
|
|
|
|
|
figure out what to do with it. And what it does with it is the result of a pile
|
|
|
|
|
of undocumented software executing a pile of undocumented scripts.
|
|
|
|
|
|
|
|
|
|
If you manually configure each particular machine connected to an
|
|
|
|
|
ethernet switch, the configuration consists of arcane magic formulae
|
|
|
|
|
interpreted by undocumented software that differs between one system and the next.
|
|
|
|
|
|
|
|
|
|
As rapidly becomes apparent when you have to deal with more than one
|
|
|
|
|
adaptor, connected to more than one switch.
|
|
|
|
|
|
|
|
|
|
Each physical or virtual network adaptor is driven by a device driver,
|
|
|
|
|
which is different for each physical device and operating system. From the
|
|
|
|
|
point of view of the software, the device driver api *is* the network adaptor
|
|
|
|
|
programmer interface, and it does not care about which device driver it is,
|
|
|
|
|
so all network adaptors must have the same programmer interface. And
|
|
|
|
|
what is that interface?
|
|
|
|
|
|
|
|
|
|
Networking is a wart built on top of warts built on top of warts. IP6 was
|
|
|
|
|
intended to clean up this mess, but kind of collapsed under rule by
|
|
|
|
|
committee, developing a multitude of arcane, overly complicated, and overly
|
|
|
|
|
clever cancers of its own, different from, and in part incompatible
|
|
|
|
|
with, the vast pile of cruft that has grown on top of IP4.
|
|
|
|
|
|
|
|
|
|
The committee wanted to throw away the low order sixty four bits of
|
|
|
|
|
address space to use to post information for the NSA to mop up, and then
|
|
|
|
|
other people said to themselves, "this seems like a useless way to abuse
|
|
|
|
|
the low order sixty four bits, so let us abuse it for something else. After all,
|
|
|
|
|
no one is using it, nor can they use it because it is being abused". But
|
|
|
|
|
everyone whose internet facing host has been assigned a single address,
|
|
|
|
|
which means has actually been assigned $2^{64}$ addresses because he has
|
|
|
|
|
sixty four bits of useless address space, needs to use it, since he probably
|
|
|
|
|
wants to connect a private in house network through his single internet
|
|
|
|
|
facing host, and would like to be free to give some of his in house hosts
|
|
|
|
|
globally routable addresses.
|
|
|
|
|
|
|
|
|
|
In which case he has a private network address space, which is a random
|
|
|
|
|
subnet of fd::/8, and a 64 bit subnet of the global address space, and what
|
|
|
|
|
he wants is that he can assign an in house computer a globally routable
|
|
|
|
|
address, whereupon anything it sends that has a destination that is not on
|
|
|
|
|
his private network address space, nor his subnet of the globally routable
|
|
|
|
|
address space, gets sent to the internet facing network interface.
|
|
|
|
|
|
|
|
|
|
Further, he would like every computer on his network to be automatically
|
2022-05-23 00:05:10 -04:00
|
|
|
|
assigned a globally routable address if it uses a name in the global system,
|
2022-05-06 22:49:33 -04:00
|
|
|
|
or a private fd:: address if it is using a name not in the global system, so
|
|
|
|
|
that the first time his computer tries to access the network with the domain
|
|
|
|
|
name he just assigned, it gets a unique network address which will never
|
|
|
|
|
change, and a reverse dns that can only be accessed through an address on
|
|
|
|
|
his private network. And if he assigns it a globally accessible name, he
|
|
|
|
|
would like the global dns servers and reverse dns servers to automatically
|
|
|
|
|
learn that address.
|
|
|
|
|
|
|
|
|
|
This is, at present, doable by the DDI, which updates both your DHC
|
|
|
|
|
server and your DNS server. Except that hardly anyone has an in house
|
|
|
|
|
DNS server that serves up his globally routable addresses. The I in DDI
|
|
|
|
|
stands for IP Address Manager or IPAM. In practice, everyone relies on
|
|
|
|
|
named entities having extremely durable network addresses which are a
|
|
|
|
|
pain and a disaster to dynamically update, or they use dynamic DNS, not IPAM.
|
|
|
|
|
|
|
|
|
|
What would be vastly more useful and usable is that your internet facing
|
|
|
|
|
peer routed globally routable packets to and from your private network,
|
|
|
|
|
and machines booting up on your private network automatically received
|
|
|
|
|
addresses static addresses corresponding their name.
|
|
|
|
|
|
|
|
|
|
Globally routable subnets can change, because of physical changes in the
|
|
|
|
|
global network, but this happens so rarely that a painful changeover is
|
|
|
|
|
acceptable. The IP6 fix for automatically accommodating this issue is a
|
|
|
|
|
cumbersome disaster, and everyone winds up embedding their globally
|
|
|
|
|
routable IP6 subnet address in a multitude of mystery magic incantations,
|
|
|
|
|
which, in the event of a change, have to be painstakingly hunted down and
|
|
|
|
|
changed one by one, so the IP6 automatic configuration system is just a
|
|
|
|
|
great big wart in a dinosaur's asshole. It throws away half the address
|
|
|
|
|
space, and seldom accomplishes anything useful.
|
|
|
|
|
|
2022-02-16 00:53:01 -05:00
|
|
|
|
# Distributed Denial of Service attack
|
|
|
|
|
|
|
|
|
|
At present, resistance to Distributed Denial of Service attacks rests on
|
|
|
|
|
dangerously powerful central authorities, in particular Cloudfare, whose
|
|
|
|
|
service in addition to being dangerously centralized, is expensive and poor.
|
|
|
|
|
|
|
|
|
|
The TCP replacement needs an adjustable proof of work (pow) handshake
|
|
|
|
|
as the first part of the connection handshake, the proof of work request
|
|
|
|
|
being first server packet in the four packet handshake.
|
|
|
|
|
|
|
|
|
|
First packet, client requests connection, second packet, server requests
|
|
|
|
|
work,and supplies a durable and a short lived public key, third packet,
|
|
|
|
|
client supplies work and offers transient public key, making
|
|
|
|
|
communication possible, plus the message it is trying to send the server, or
|
|
|
|
|
the first part of that message.
|
|
|
|
|
|
|
|
|
|
The work demanded goes up as the server load increases, thus fixing the
|
|
|
|
|
horrors of DDoS protection.
|
|
|
|
|
|
|
|
|
|
## Key agreement
|
|
|
|
|
|
|
|
|
|
Key agreement needs to be part of the the TCP replacement handshake, rather
|
|
|
|
|
than a layer on top, to reduce round tripping.
|
|
|
|
|
|
|
|
|
|
The name system needs to be integrated with the key system, so that you get
|
|
|
|
|
the key when when you get the network address associated with the name, and
|
|
|
|
|
the key/name pairing needs to be blockchain secured, so you don’t have one
|
|
|
|
|
thousand certificate authorities each with the authority to mount a man in the middle attack.
|
|
|
|
|
|
|
|
|
|
## replacement handshake for publicly identified server
|
|
|
|
|
|
|
|
|
|
The the TCP replacement handshake needs to be a four phase handshake.
|
|
|
|
|
|
|
|
|
|
1. Client->Server: Give me a connection, here are my parameters, here is my
|
|
|
|
|
session key.
|
|
|
|
|
|
|
|
|
|
1. Server->Client: Here is a proof of work request, my parameters, and a keyed
|
|
|
|
|
hash of your and my parameters. Ask again with proof of work, the same
|
|
|
|
|
parameters, and the keyed hash.
|
|
|
|
|
|
|
|
|
|
Server then throws away the request, allocating no memory.
|
|
|
|
|
|
|
|
|
|
1. Client->Server: OK, here I am again, with all that stuff you asked for.
|
|
|
|
|
|
|
|
|
|
This includes a konce (key used once,single use elliptic point), and
|
|
|
|
|
assumes that the client reliably knows the server public key i
|
|
|
|
|
advance. This protocol is inappropriate to signons that are restricted
|
|
|
|
|
to identified entities, because we probably do not want everyone to
|
|
|
|
|
know who is
|
|
|
|
|
identified.
|
|
|
|
|
|
|
|
|
|
1. Server checks the poly1305 authentication to ensure that this is a
|
|
|
|
|
real client reply to a real and recent server reply. Then it checks the
|
|
|
|
|
proof of work.
|
|
|
|
|
|
|
|
|
|
If the proof of work passes, Server allocates memory, generates and stores a
|
|
|
|
|
session key, and stores connection parameters, the client and server
|
|
|
|
|
session keys among them.
|
|
|
|
|
|
|
|
|
|
1. Server->Client: OK, here is my session key, authenticated but not
|
|
|
|
|
signed by my permanent key, and stuff, now you can start sending
|
|
|
|
|
actual data.
|
|
|
|
|
|
|
|
|
|
Thus we can integrate TCP handshake and encryption hand shake and the
|
|
|
|
|
innumerable DDoS protection handshakes “Cloudfare is checking your browser,
|
|
|
|
|
oops, your browser did not pass, here is a captcha” at the cost of one single
|
|
|
|
|
additional trip, half a round trip.
|
|
|
|
|
|
|
|
|
|
Instead of the person establishing the connection fuming while round trip
|
|
|
|
|
after round trip goes through, we get all that stuff at the cost of one
|
|
|
|
|
additional half round trip.
|
|
|
|
|
|
|
|
|
|
### pow implementation
|
|
|
|
|
|
|
|
|
|
Each sequential proof of work request contains a 64 bit sequential integer.
|
|
|
|
|
The integer starts at random 63 bit value, to ensure that every possible
|
|
|
|
|
successful proof of work ever used is unique in the universe. The
|
|
|
|
|
sequential integer is treated as a windowed value into a 512 bit integer,
|
|
|
|
|
whose high order part is an unshared secret that remains unchanged for the
|
|
|
|
|
duration.
|
|
|
|
|
|
|
|
|
|
From that 512 bit value, the server generates a unique XChaCha20 512 bit
|
|
|
|
|
value, 256 bits of which are used to generate a Poly1305 authenticator for
|
|
|
|
|
the proof of work request. If it receives a completed proof of work request
|
|
|
|
|
containing the authentication, it knows it comes from an entity at that
|
|
|
|
|
network address that was able to receive the proof of work request.
|
|
|
|
|
Knowing it is talking to real network addresses, it can derank network
|
|
|
|
|
addresses that create excessive burdens, so that they cannot slow down
|
|
|
|
|
everyone else, only themselves.
|
|
|
|
|
|
|
|
|
|
When it receives the completed proof of work, it first checks the sequence
|
|
|
|
|
number to ensure it is a recently issued request for work, then checks if
|
|
|
|
|
there is already a channel allocated for that pow, using a table of doubly
|
|
|
|
|
linked lists of recently allocated channels.indexed by the low order part of
|
|
|
|
|
the pow sequence number If it discovers it has already passed that proof of
|
|
|
|
|
work and allocated a channel, moves that proof of work to the head of list,
|
|
|
|
|
so that the next check will be instant, just in case it is about to receive a
|
|
|
|
|
million copies of that proof of work. Then it checks for revealed bits from
|
|
|
|
|
those generated by XChaCha20. Then it checks the work and the
|
|
|
|
|
Poly1305 authentication.
|
|
|
|
|
|
|
|
|
|
Checking if there is already a channel allocated overlaps and intersects
|
|
|
|
|
with presence notification protocol. We want to have a very large number
|
|
|
|
|
of inactive presences without secrets or network addresses in the database,
|
|
|
|
|
a large number of long lived active presences in memory, with secrets that
|
|
|
|
|
are not paged to disk (`sodium_allocarray`), and considerably smaller
|
|
|
|
|
number of considerably shorter lived channels with flow control and
|
|
|
|
|
buffering. A presence can only exchange short messages that fit in one
|
|
|
|
|
packet, and only one message can be active in any round trip time. You
|
|
|
|
|
open a presence, and the presence can then open a channel.
|
|
|
|
|
|
|
|
|
|
We probably want to do the checks in whatever order is empirically most
|
|
|
|
|
efficient for type of DDoS attacks that we encounter in practice, the most
|
|
|
|
|
common probably being garbage random values that bear no particular
|
|
|
|
|
resemblance to valid connection attempts.
|
|
|
|
|
|
|
|
|
|
The next problem will valid connections that then make excessive
|
|
|
|
|
demands. These get deranked by the next layer, and they will then have to
|
|
|
|
|
make a new connection, which will face increasing pow and discrimination
|
|
|
|
|
against their network address.
|
|
|
|
|
|
|
|
|
|
## replacement handshake for limited circulation server
|
|
|
|
|
|
|
|
|
|
In this case the server is the gateway for a group, possibly many groups,
|
|
|
|
|
whose unique id is not widely known. It is analogous to a closely kept email address.
|
|
|
|
|
|
|
|
|
|
The the TCP replacement handshake needs to be a four phase handshake.
|
|
|
|
|
|
|
|
|
|
1. Client->Server: Give me a connection, here are my parameters,
|
|
|
|
|
here is a clue about what private group I want to connect to.
|
|
|
|
|
|
|
|
|
|
1. Server->Client: Here is a proof of work request, my parameters,
|
|
|
|
|
including a use once elliptic point, and a keyed hash of your and
|
|
|
|
|
my parameters. Ask again with proof of work, the same parameters,
|
|
|
|
|
and the keyed hash.
|
|
|
|
|
|
|
|
|
|
Server then throws away the request, allocating no memory.
|
|
|
|
|
|
|
|
|
|
1. Client->Server: OK, here I am again, with all that stuff you asked for.
|
|
|
|
|
|
|
|
|
|
At this point, client has given server a clue about which private
|
|
|
|
|
group it wants to connect to, and server has given client a clue
|
|
|
|
|
about which private group it expects membership of, and therefore
|
|
|
|
|
what public key the client should attempt to communicate with.
|
|
|
|
|
|
|
|
|
|
1. Server checks the keyed hash to ensure that this is a real client
|
|
|
|
|
reply to a real and recent server reply. Then it checks the proof of
|
2022-02-18 15:59:12 -05:00
|
|
|
|
work.
|
2022-02-16 00:53:01 -05:00
|
|
|
|
|
|
|
|
|
If the proof of work passes, Server allocates memory
|
|
|
|
|
|
|
|
|
|
Then it generates a transient secret from the konces (keys used
|
|
|
|
|
once, single use elliptic points), and uses it to decrypt the clien
|
|
|
|
|
durable public key, verifying that the client does indeed know the
|
|
|
|
|
transient scalar. If the client durable key is OK, sign on allowed, it
|
|
|
|
|
constructs a shared secret from all four keys, the sum of two secrets
|
|
|
|
|
multiplying the sum of two elliptic points, and we now have an
|
|
|
|
|
encrypted stream associated with the port number and network addresses.
|
|
|
|
|
|
|
|
|
|
# Summary of the replacement
|
|
|
|
|
|
|
|
|
|
Thus we can integrate TCP handshake and encryption hand shake and the
|
|
|
|
|
innumerable DDoS protection handshakes “Cloudfare is checking your browser,
|
|
|
|
|
oops, your browser did not pass, here is a captcha” at the cost of one single
|
|
|
|
|
additional trip, half a round trip.
|
|
|
|
|
|
|
|
|
|
Instead of the person establishing the connection fuming while round trip
|
|
|
|
|
after round trip goes through, we get all that stuff at the cost of one
|
|
|
|
|
additional half round trip.
|
|
|
|
|
|
|
|
|
|
# messages, not streams
|
|
|
|
|
|
|
|
|
|
TCP sockets are designed for synchronous procedural programming, on
|
|
|
|
|
machines with very limited memory processing limitless streams. They are
|
|
|
|
|
now almost always used for message processing from event oriented
|
|
|
|
|
asynchronous code, with a messaging layer on top of the endless stream
|
|
|
|
|
layer. The replacement needs to have application layer sending messages
|
|
|
|
|
and receiving messages in events. The application layer should not have
|
|
|
|
|
to deal with sockets and streams. Rather, it sends a message to destination
|
|
|
|
|
identified by its durable public key, and gets a reply, where the reply
|
|
|
|
|
might be that the socket could not be opened, or that the socket was open but
|
|
|
|
|
the reply timed out, among other things. When sending a message, there is a
|
|
|
|
|
time to wait for response before giving up, and a time for the socket that
|
|
|
|
|
may be created to live idle.
|
|
|
|
|
|
|
|
|
|
# Proposed replacement
|
|
|
|
|
|
|
|
|
|
[QUIC] is the current TCP replacement. Also known as HTTP/3
|
|
|
|
|
|
|
|
|
|
[QUIC]: https://github.com/private-octopus/picoquic
|
|
|
|
|
|
|
|
|
|
We have no alternative but to interface to the vast HTTP/2 HTTP/3
|
|
|
|
|
ecosystem. The wallet is going to have to talk as a client to legacy server
|
|
|
|
|
http/3 devices, and accept their CA certificates, preferably subject to
|
|
|
|
|
Zooko scrutiny, and legacy http/3 client devices are going to have to talk
|
|
|
|
|
to our wallet (after their wallet has downloaded a zooko based certificate
|
|
|
|
|
from the server wallet).
|
|
|
|
|
|
|
|
|
|
Talking HTTP/3 means being wide open to DDOS attack, so that you are
|
|
|
|
|
forced to use cloudfare. When a device with our version of QUIC talks to
|
|
|
|
|
another device with our version of QUIC, it has to implement our DDOS
|
|
|
|
|
resistance, and Zooko in place of CA. But when it talks to a legacy
|
|
|
|
|
HTTP/3 device, it has to lay itself wide open to DDOS attack and CA
|
|
|
|
|
interception.
|
|
|
|
|
|
|
|
|
|
Backwards compatibility with insecure systems always creates a massive
|
|
|
|
|
security hole. On the one hand, every build from scratch project dies. On
|
|
|
|
|
the gripping hand, every attempt to do fax over the internet failed and was
|
|
|
|
|
eventually replaced by pdf attachments to email. Backwards compatibility
|
|
|
|
|
was simply too crippling, and backwards compatibility with QUIC is
|
|
|
|
|
going to cripple security.
|
|
|
|
|
|
|
|
|
|
Instead of putting the secure system transparently as an alternate protocol
|
|
|
|
|
within the insecure system, you non transparently put the insecure system
|
|
|
|
|
as a downgrade protocol within the secure system, which means our
|
|
|
|
|
version of QUIC simply is not going to talk to older versions of QUIC
|
|
|
|
|
unless you take some special measures to tell it to do so or enable it to do
|
|
|
|
|
so for that particular communication end point.
|
|
|
|
|
|
|
|
|
|
The least friction interface would be that every time a new SSL name is
|
|
|
|
|
encountered, we get a window saying "This authority claims that this is
|
|
|
|
|
this entity. Trust this authority for this entity?" And if there is a change of
|
|
|
|
|
authority, complain. Wrap backwards compatibility in Zooko vouched
|
|
|
|
|
certificates, pinned certificates, and the CAA record indicating who is the
|
|
|
|
|
right issuer for the SSL certificate
|
|
|
|
|
|
|
|
|
|
We have to have downgrade capability, but it has to be an afterthought,
|
|
|
|
|
slipped in as a special path and special case, as user friendly as possible,
|
|
|
|
|
but no friendlier.
|
|
|
|
|
|
|
|
|
|
QUIC's one way streams are messages.
|
|
|
|
|
|
|
|
|
|
Its two way streams are backwards compatibility with TCP
|
|
|
|
|
|
|
|
|
|
It solves the long fat pipe problem with flexible window size.
|
|
|
|
|
|
|
|
|
|
It puts multiple objects and messages in one stream, so that one message
|
|
|
|
|
does not have to wait for lost packets in another message to be resolved.
|
|
|
|
|
|
|
|
|
|
TCP flow control is constructed around pushback - that the sender should
|
|
|
|
|
not send data faster than the receiver is able and willing to handle it.
|
|
|
|
|
Normally there is one thread, or pool of of threads, handling the data
|
|
|
|
|
received. To prevent DDoS, we should probably only have one unit of
|
|
|
|
|
pushback per pair of network addresses. If someone has a slow receiver
|
|
|
|
|
thread pool, and a fast receiver thread pool communicating with the same
|
|
|
|
|
machine, he needs to break the slow receiver communication into lots of
|
|
|
|
|
small requests and replies, hence one channel per pair of network
|
|
|
|
|
addresses.
|
|
|
|
|
|
|
|
|
|
Quic implements everything you need to have one channel per pair of
|
|
|
|
|
network addresses, multiplexing many request-replies into a single stream,
|
|
|
|
|
many channels in one channel, but does not in fact implement one channel
|
|
|
|
|
per pair of network addresses in the sense of one unit of packet flow
|
|
|
|
|
control and one unit of DDoS monitoring, per pair of network addresses.
|
|
|
|
|
|
|
|
|
|
Finer grained flow control should be implemented as request reply on
|
|
|
|
|
messages that may well be much larger than a packet, but much smaller than
|
|
|
|
|
memory
|
|
|
|
|
|
|
|
|
|
In the request reply model, if the requests and replies are reasonably short,
|
|
|
|
|
pushback does not matter, and becomes a representation of flow control. It
|
|
|
|
|
is seldom sane to download enormous blocks of data as a single message,
|
|
|
|
|
and we probably just should not do it - restrict replies to what can
|
|
|
|
|
reasonably fit into memory, so that a very large message that the receiver
|
|
|
|
|
is processing one chunk at a time has to get acks of its submessages,
|
|
|
|
|
separate from the flow control system.
|
|
|
|
|
|
|
|
|
|
What the LEMP stack does with request headers is dynamically allocate
|
|
|
|
|
8KiB buffers, stuff headers into a part or whole of at 8KiB buffer, and if a
|
|
|
|
|
header is bigger than 8KiB, arbitrarily truncates it, which suggests that this
|
|
|
|
|
is a tactic to minimize the overheads of dynamically allocating many
|
|
|
|
|
moderate sized buffers of variable size. Experimenting, I find that
|
|
|
|
|
dynamic allocation tends to be the major cost in many programs, but if
|
|
|
|
|
you do it LEMP style, dynamic allocation is unlikely to be a significant cost.
|
|
|
|
|
|
|
|
|
|
QUIC has a pile of feature bloat:
|
|
|
|
|
|
|
|
|
|
+ The push feature is married to html, and belongs in the webserver
|
|
|
|
|
and the browser, not in the protocol. Something sending a request
|
|
|
|
|
message should be aware it might have several messages in reply,
|
|
|
|
|
depending on the kind of the request, and simply have a message
|
|
|
|
|
handler that can deal with many messages.
|
|
|
|
|
|
|
|
|
|
+ We don’t really need the unique and sequential message id if finding and
|
|
|
|
|
interpreting the message id is part of how to response handler handles the
|
|
|
|
|
messages – best to hand that as far down into the endpoints as possible.
|
|
|
|
|
|
|
|
|
|
+ its data format, header and frames, is married to html, which is
|
|
|
|
|
always sending repetitious and redundant information, treating
|
|
|
|
|
related fragments of html as absolutely distinct.
|
|
|
|
|
it implements html specific compression, HPACK.
|
|
|
|
|
|
|
|
|
|
It suffers from the SSL/TLS problem of a thousand CA authorities, NSA
|
|
|
|
|
friendly encryption, and, being funded in large part by Cloudfare, has no
|
|
|
|
|
substantial defense against DDoS.
|
|
|
|
|
|
|
|
|
|
It fails to support rendezvous routing.
|
|
|
|
|
|
|
|
|
|
But, it has already struggled with and solved a thousand problems whose
|
|
|
|
|
solutions I have been confusedly struggling with. So the obvious solution
|
|
|
|
|
is to adopt Quic, rip out the domain name system, add DDoS resistance,
|
|
|
|
|
rip out NSA friendly encryption in favour of the standard and
|
|
|
|
|
recommended Libsodium packet encryption. (XChaCha20-Poly1305), for
|
|
|
|
|
immortality rip out the 62 bit compressed integers in favour of unlimited
|
|
|
|
|
precision windowed integers (With a negotiated limit on precision that
|
|
|
|
|
will in practice always be 64 bits for the next several centuries.)
|
|
|
|
|
|
|
|
|
|
XChaCha20 is not the fastest on a long stream, but it has key agility, can
|
|
|
|
|
encrypt arbitrary length values, including a single bit, and is as
|
|
|
|
|
fast as ChaCha20 without any limits on the nonce.
|
|
|
|
|
|
|
|
|
|
Quic’s messaging is excessively married to HTTP. We need a generic
|
|
|
|
|
messaging system where every message has an short number indicating
|
|
|
|
|
destination handler, and you can generate a handler, code continuation,
|
|
|
|
|
and get number assigned to it on the fly, so that you can send a message,
|
|
|
|
|
and the reply goes to your code continuation.
|
|
|
|
|
|
|
|
|
|
We need to lift as much of the [QUIC] design as possible, and also make things
|
|
|
|
|
act much like TCP, so that existing NATs will not notice anything has
|
|
|
|
|
changed. Thus packets will continue to be sent to and from a widely known
|
|
|
|
|
port that is usually below 1024 on the server, from a random port on the
|
|
|
|
|
client in the range 49152--65535. A connection will continue to require a
|
|
|
|
|
three phase handshake which creates a socket, albeit our sockets will be very
|
|
|
|
|
different.
|
|
|
|
|
|
|
|
|
|
With a rendezvous, both peers will use the same socket in the range
|
|
|
|
|
1024-49151
|
|
|
|
|
|
|
|
|
|
The rendezvous handshake will look like the TCP handshake Syn Syn-Ack Ack,
|
|
|
|
|
but they will both send syn packets, both send syn-ack packets, and both
|
|
|
|
|
send ack packets. Their syn packets will be timed so that, if the timing
|
|
|
|
|
is done right, both are sent just before the other peer’s packet is
|
|
|
|
|
expected to be received.
|
|
|
|
|
|
|
|
|
|
Our sockets will always have a shared secret associated, which proves
|
|
|
|
|
identity and enables encrypted communication, but which cannot be used to
|
|
|
|
|
prove identity to a third party. The initial handshake will exchange
|
|
|
|
|
transient secret keys, which will generate a transient durable secret,
|
|
|
|
|
which is used to encrypt the exchange of durable secret keys, which
|
|
|
|
|
establish a shared secret based on the both the durable and transient key,
|
|
|
|
|
establishing forward secrecy, and failing to establish identity to third
|
|
|
|
|
parties.
|
|
|
|
|
|
|
|
|
|
Since setting up a shared secret is costly, this creates the opportunity to
|
|
|
|
|
syn flood attacks, therefore the syn-ack will always be a syn cookie,
|
|
|
|
|
structured rather like existing syn cookies, a cryptographic hash of the syn
|
|
|
|
|
based on an unshared secret known only to the server, plus it will always
|
|
|
|
|
have a proof of work request, which may be zero, and it will have a list of
|
|
|
|
|
supported protocols if the protocol proposed in the initial syn cookie is
|
|
|
|
|
unacceptable. The proof of work will be that the hash of the client ack
|
|
|
|
|
must have a certain number of zeros, and the ack
|
|
|
|
|
must contain the cryptographic cookie, and the data that the server checks
|
|
|
|
|
the cookie against.
|
|
|
|
|
|
|
|
|
|
TCP was designed around the case of the client sending an endless stream of
|
|
|
|
|
characters, typed with one finger, to a program on the server. We are
|
|
|
|
|
going to design around message response, with responses not necessarily
|
|
|
|
|
returning in order.
|
|
|
|
|
|
|
|
|
|
The client sends a message from a durable public key to a to a durable
|
|
|
|
|
public key. The creation and destruction of such connections is not
|
|
|
|
|
tightly linked to messaging. If connection exists, it is used. If it does
|
|
|
|
|
not exist, it is created. It may be torn down after a while of being
|
|
|
|
|
unused, but the tear down is not tightly linked to message completion
|
|
|
|
|
|
|
|
|
|
In TCP a count is kept of bytes sent and bytes received, with an ack
|
|
|
|
|
counting as one byte.
|
|
|
|
|
|
|
|
|
|
We need a count for each packet, since packets can arrive out of order,
|
|
|
|
|
repeated, or missing. The count values will be sequential nonces for the
|
|
|
|
|
encryption, and will start at one. As the count can potentially grow
|
|
|
|
|
quite large, the count value will be windowed, but, unlike TCP, the
|
|
|
|
|
windowed count represents a potentially much larger absolute count known
|
|
|
|
|
by both ends.
|
|
|
|
|
|
|
|
|
|
Negotiating a window size is hard, since you do not really know in advance
|
|
|
|
|
what window size will be needed. The thirty two bit window is adequate for
|
|
|
|
|
all normal uses, but fails in special and important uses.
|
|
|
|
|
|
|
|
|
|
We will specify the window size in each packet, with the high order bit of
|
|
|
|
|
each byte in the nonce indicating whether there is another seven bits in
|
|
|
|
|
the nonce window, so that we can dynamically adjust the window size. We
|
|
|
|
|
dynamically adjust the window size to big enough to exclude ambiguity.
|
|
|
|
|
Which for the first 128 packets, and on a connection that is not very busy,
|
|
|
|
|
all packets, will be seven windowed count bits and one window size bit.
|
|
|
|
|
|
|
|
|
|
The window needs to be large enough to exclude the ambiguity of delayed
|
|
|
|
|
and duplicated packets wandering in late, so has to be several times
|
|
|
|
|
larger than the difference between the most recently acked value, and the
|
|
|
|
|
the value that will fill the reception window. Thirty two times larger
|
|
|
|
|
should be ample. At the start, there are no early packets capable of
|
|
|
|
|
wandering in late, so big enough to hold the full count always suffices.
|
|
|
|
|
|
|
|
|
|
If `a` represents a recent nonce, `n`
|
|
|
|
|
represents the nonce, `w` represents the windowed nonce. and
|
|
|
|
|
`M` represents the window mask, communicated in each packet in
|
|
|
|
|
unary, then:
|
|
|
|
|
|
|
|
|
|
`w = n&M`
|
|
|
|
|
|
|
|
|
|
`n = (w − a)&M + a`
|
|
|
|
|
|
|
|
|
|
We use a window large enough to give the same answer on both the most
|
|
|
|
|
recently acked nonce, and the most recently sent nonce.
|
|
|
|
|
|
|
|
|
|
The nonce will serve the dual purpose of enabling the decryption of each
|
|
|
|
|
packet, and flow control. Each packet has a sequential nonce, we make sure
|
|
|
|
|
all packets are acked. Nonces on packets coming from the client refer to a
|
|
|
|
|
different shared secret than nonces on packets coming from
|
|
|
|
|
|
|
|
|
|
## API
|
|
|
|
|
|
|
|
|
|
To send a message, you will construct a response handler if you are
|
|
|
|
|
expecting a response, and then call the api with a network address, a
|
|
|
|
|
public key of the recipient, an identifying secret key and public key of
|
|
|
|
|
the sender, a timeout for attempting to connect, and flags permitting for
|
|
|
|
|
direct connection, rendezvous connection, retransmit, and store and
|
|
|
|
|
forward. If a response is expected for the message, give the expected
|
|
|
|
|
lifetime for the response handler, a nonce for the response handler and a
|
|
|
|
|
class identifier for the nonce. (the nonce only has to be unique within
|
|
|
|
|
the class). You will probably use a different nonce population for
|
|
|
|
|
messages that have to be handled promptly, messages that have to be
|
|
|
|
|
handled within a session, and non volatile nonces that survive between
|
|
|
|
|
sessions. Nonce populations can be windowed per class identifier, with a
|
|
|
|
|
window large enough to accommodate the timeout, and a different class
|
|
|
|
|
identifier for volatile and non volatile nonces. The nonce is used once
|
|
|
|
|
within a window and within a class, but can be re-used in another class
|
|
|
|
|
and another window.
|
|
|
|
|
|
|
|
|
|
The application code is event oriented, like gui code. It is driven by a
|
|
|
|
|
message pump, with constructors creating event handlers, and the events
|
|
|
|
|
driving the event handler through the message pump, and event handler, on
|
|
|
|
|
being fired, creates new event handlers and fires old event handlers.
|
|
|
|
|
|
|
|
|
|
When the application needs to perform a task that spans many events, it does
|
|
|
|
|
not call `yield` or `await,` but instead the event handler for each event
|
|
|
|
|
constructs or enables the next event handler. If it needs to push information
|
|
|
|
|
onto a stack between events, has its own explicit stack for its own multi
|
|
|
|
|
event task, or creates a linked list of event handlers. Non volatile event
|
|
|
|
|
handlers must be trivial C+ classes, therefore cannot contain an `std::stack`,
|
|
|
|
|
|
|
|
|
|
State that would be on the stack in synchronous code is in the event
|
|
|
|
|
handler in asynchronous code. This potentially gets messy if you are
|
|
|
|
|
processing an endless stream of structured data whose structure is
|
|
|
|
|
orthogonal to message boundaries. Since we allow arbitrary length
|
|
|
|
|
messages, don’t do that.
|
|
|
|
|
|
|
|
|
|
Notification of message failure may occur any time within the lifetime of
|
|
|
|
|
the response handler, but will mostly happen within the timeout for
|
|
|
|
|
attempting to connect.
|
|
|
|
|
|
|
|
|
|
The usual flow of control will be create an event handler, assign a nonce
|
|
|
|
|
to it (fire it) and then it gets triggered when the event actually
|
|
|
|
|
happens, and is then usually destroyed. Events will usually create and
|
|
|
|
|
fire new events and trigger events that existed before they were created,
|
|
|
|
|
rather than changing their state.
|
|
|
|
|
|
|
|
|
|
Below the api, additional messages, using low numbered message response
|
|
|
|
|
classes, may be constructed for encryption and flow control. If an
|
|
|
|
|
encrypted connection exists, it will use that without constructing
|
|
|
|
|
additional messages. If it does not exist, will construct it.
|
|
|
|
|
|
|
|
|
|
Constructing a encrypted connection provides perfect forward secrecy
|
|
|
|
|
between one connection and the next by generate new random session keys
|
|
|
|
|
each time.
|
|
|
|
|
|
|
|
|
|
## Reliability and flow control
|
|
|
|
|
|
|
|
|
|
TCP achieves reliable transmission with acks and nacks.
|
|
|
|
|
|
|
|
|
|
The original design simply acked that all bytes (not exactly bytes, because
|
|
|
|
|
acks and nacks are counted) had been received up to a certain byte. If the
|
|
|
|
|
transmitter has transmitted stuff, and not received an ack for what it
|
|
|
|
|
transmitted it sends a nack, after a timeout. The receiver may resend acks.
|
|
|
|
|
|
|
|
|
|
This mechanism worked fine on short thin pipes, but if you have a million
|
|
|
|
|
packets in flight, and packet three hundred thousand gets lost, you then
|
|
|
|
|
then have to send seven hundred thousand to replace one packet. So the
|
|
|
|
|
duplicate ack possibility was tortured to create a half assed version of
|
|
|
|
|
selective acknowledgment. If the receiver receives packet 100, and 101,
|
|
|
|
|
but not packet 99, it sends duplicate acks for packet 98. If the receiver
|
|
|
|
|
receives three duplicate acks for packet 98, it retransmits packet 99. (two
|
|
|
|
|
duplicate acks could be just the normal randomness.)
|
|
|
|
|
|
|
|
|
|
[QUIC], however, has fix for this built in.
|
|
|
|
|
|
|
|
|
|
Obviously true selective acknowledgment is better. The receiver acks the
|
|
|
|
|
most recent received packet, and sends a list of missing packets prior to
|
|
|
|
|
this (acks a windowed value for the most recent packet, and the difference
|
|
|
|
|
between packet nonces for missing packets) The sender resends the missing
|
|
|
|
|
packets, except for the most recent missing packets. If they are still
|
|
|
|
|
missing, they will be caught on the next ack.
|
|
|
|
|
|
|
|
|
|
In each ack, the receiver tells the sender how much more data it can
|
|
|
|
|
receive before it sends the next ack. This prevents the receiver from
|
|
|
|
|
being flooded, but a more common problem is the pipe being flooded.
|
|
|
|
|
|
|
|
|
|
To handle pipe flooding, the sender has a timer. If it sends stuff, and
|
|
|
|
|
does not get an ack, it backs off, it sets the timer to a slower rate, and
|
|
|
|
|
retransmits with a nack. The initial value of the timer is the initial
|
|
|
|
|
timer value is smoothed $RTT + max(G,4*RTT variance)$
|
|
|
|
|
|
|
|
|
|
TCP flow control focuses on getting a segment complete and acknowledged,
|
|
|
|
|
so it can move on to the next segments. It may have a great many packets
|
|
|
|
|
in flight, but does not have too many segments in flight. The backoff
|
|
|
|
|
algorithm is linked with the push segments algorithm. You only push the
|
|
|
|
|
segment the receiver has asked for in his previous acknowledgment. So you
|
|
|
|
|
typically have the segment you are finalizing, the segment that is in
|
|
|
|
|
flight, and the segment that the receiver asked for.
|
|
|
|
|
|
|
|
|
|
The algorithm is that the sender gets an ack that acknowledges what the
|
|
|
|
|
receiver has received, and tells the sender how much more the receiver can
|
|
|
|
|
receive. Whereupon the sender resends anything missing, and resumes pushing
|
|
|
|
|
new stuff up to the limit that the receiver has specified, spread out
|
|
|
|
|
roughly evenly over the timer period. Which implies that the receiver
|
|
|
|
|
should ask wisely, as well as the sender send wisely.
|
|
|
|
|
|
|
|
|
|
Implementing our own flow control sounds like a lot of work. Need to lift
|
|
|
|
|
[QUIC]’s flow control, and drop our own encryption and attack resistance
|
|
|
|
|
into it, while letting it worry about flow control. I can hack into its library,
|
|
|
|
|
while I cannot hack into the TCP library.
|
|
|
|
|
|
|
|
|
|
I have been analysing how TCP works, with a view to what needs fixing. Time to
|
|
|
|
|
analyse how something works for which I have a library and example code.
|
|
|
|
|
|
|
|
|
|
Best (because smallest and least married to HTTP3) is [picoquic].
|
|
|
|
|
|
|
|
|
|
[picoquic]: https://github.com/private-octopus/picoquic
|
|
|
|
|
|
|
|
|
|
The TCP state machine assumes that the server opens a connection on receiving
|
|
|
|
|
a syn, sends an ack-syn to the client, whereupon the client acks the
|
|
|
|
|
connection. But if we are using syn cookies, we are using a different state
|
|
|
|
|
machine, where the connection is in fact only opened on receiving the server
|
|
|
|
|
syn-ack cookie in the client ack. So the server has to acknowledge the
|
|
|
|
|
connection, which would make it a four step handshake instead of a three step
|
|
|
|
|
handshake. To avoid this, we have a rule that the client only opens a
|
|
|
|
|
connection when it has data ready to send. It then gets a server cookie, and
|
|
|
|
|
sends the cookie-ack with some data, which data the server acks.
|
|
|
|
|
|
|
|
|
|
With the cookie ack, we get a round trip time and offset between server
|
|
|
|
|
steady time and client steady time. If we see unstable round trip times,
|
|
|
|
|
we suspect the pipe is overloaded, and back off our estimate of max
|
|
|
|
|
bandwidth. For flow control, we maintain an estimate of pipe length and
|
|
|
|
|
width. Sudden pipe widenings indicate an overflow condition, because pipes
|
|
|
|
|
may respond to overflow by massively discarding packets, or massively
|
|
|
|
|
backing up packets, or quite possibly both. We maintain a probability
|
|
|
|
|
estimate of the pipe behaviour.
|
|
|
|
|
|
|
|
|
|
## Outline protocol
|
|
|
|
|
|
|
|
|
|
A packet protocol that establishes an encrypted connection on top of
|
|
|
|
|
unreliable packets with minimal round trips without increasing fragility to
|
|
|
|
|
DoS.
|
|
|
|
|
|
|
|
|
|
For servers, public keys, globally human readable names, the key owning the
|
|
|
|
|
name, and the temporary key signed by the key owning the name, will usually
|
|
|
|
|
be public and widely known, but this also supports the case of
|
|
|
|
|
communication where this information is only known to the parties, and the
|
|
|
|
|
server does not want to make the connection between a network address and a
|
|
|
|
|
public key widely known.
|
|
|
|
|
|
|
|
|
|
To establish a connection, we need to set a bunch of values specific to
|
|
|
|
|
this particular channel, and also create a shared secret that
|
|
|
|
|
eavesdroppers and active attackers cannot discover.
|
|
|
|
|
|
|
|
|
|
The client is the part that initiates the communication, the server is
|
|
|
|
|
the party that responds.
|
|
|
|
|
|
|
|
|
|
I assume a mode that provides both authentication and encryption – if a
|
|
|
|
|
packet decrypts into a valid message, this shows it originated from an
|
|
|
|
|
entity possessing the shared secret. This does not provide signing – the
|
|
|
|
|
recipient cannot prove to a third party that he received it, rather than
|
|
|
|
|
making it up.
|
|
|
|
|
|
|
|
|
|
For the moment I ignore the hard question of server key distribution,
|
|
|
|
|
glibly invoking Zooko’s triangle without proposing an implementation of
|
|
|
|
|
the other two points and three sides of the triangle or a solution to the
|
|
|
|
|
problem of managing distributed reputations in Zooko’s triangle. (Be
|
|
|
|
|
warned that whenever people charge ahead without solving the key
|
|
|
|
|
distribution problem, the result is a disaster.)
|
|
|
|
|
|
|
|
|
|
Client 🠆 Server: Equivalent to the syn of the three phase TCP
|
|
|
|
|
handshake.
|
|
|
|
|
|
|
|
|
|
> Client’s network address and port on which client will receive
|
|
|
|
|
> packets, protocol identifier, and client steady time that the
|
|
|
|
|
> message was sent.
|
|
|
|
|
|
|
|
|
|
If the requested protocol is not OK, we go into protocol negotiation,
|
|
|
|
|
server responds with a list of protocols and protocol versions that it will
|
|
|
|
|
accept, in the form of a list of lists of numbers.
|
|
|
|
|
|
|
|
|
|
Assuming it is OK, which it probably will be, server allocates nothing,
|
|
|
|
|
prepares nothing, but sends the equivalent of a TCP ack-syn cookie,
|
|
|
|
|
containing, among other things, a cryptographic hash of the information
|
|
|
|
|
that was received and sent, based on a private secret known only to the
|
|
|
|
|
server. It sends a transient public key, which changes every few minutes
|
|
|
|
|
or so, plus a short windowed id for that transient public key, and a demand
|
|
|
|
|
for proof of work, which may be zero. The proof of work is that the
|
|
|
|
|
client’s ack, equivalent of the third phase of the TCP handshake, has to
|
|
|
|
|
hash to a value ending in `n` zero bits, where `n`
|
|
|
|
|
may be zero.
|
|
|
|
|
|
|
|
|
|
This cryptographic hash based on an unshared secret will be sent to client,
|
|
|
|
|
and then back to server, unchanged. Its function is to avoid the necessity for
|
|
|
|
|
the server to allocate memory or perform asymmetric cryptographic operations
|
|
|
|
|
for a client that has not yet validated. Instead the state information is sent
|
|
|
|
|
back and forth.
|
|
|
|
|
|
|
|
|
|
1. Server 🠆 Client: Equivalent to the syn-ack of the three phase TCP handshake.
|
|
|
|
|
|
|
|
|
|
Cryptographic hash based on unshared secret, server steady time,
|
|
|
|
|
transient public key, server windowed identifier of server transient
|
|
|
|
|
public key, proof of work demand, and any channel parameters.
|
|
|
|
|
|
|
|
|
|
The proof of work is trivial if the server is not under load, but is
|
|
|
|
|
increased as the server load approaches the maximum the server is
|
|
|
|
|
capable of, in order to throttle demand.
|
|
|
|
|
|
|
|
|
|
Client computes transient handshake shared secret as its transient private
|
|
|
|
|
key times the server shared transient public key. It returns in the clear
|
|
|
|
|
a copy of the cryptographic hash that the server sent to it, the data in
|
|
|
|
|
the clear needed to validate the hash, performs the proof of work, and
|
|
|
|
|
sends its public key, which may be a per server durable public key, always
|
|
|
|
|
used when accessing this server on this identity, encrypted using the
|
|
|
|
|
transient key, and the public key it wants to talk to on the server.
|
|
|
|
|
|
|
|
|
|
Subsequent information is not encrypted using the transient keys, but using
|
|
|
|
|
the sum of transient plus secret keys.
|
|
|
|
|
|
|
|
|
|
This implies that the client has to know the public key that the server is
|
|
|
|
|
using, which may be a key signed by the master public key that owns the
|
|
|
|
|
name authorizing that new key, which key changes about as often as the
|
|
|
|
|
server IP changes, and is therefore distributed in the same channel as the
|
|
|
|
|
network address associated with global human names is distributed. If the
|
|
|
|
|
client gets it wrong, then the server ignores the information encrypted to
|
|
|
|
|
the wrong public key, and responds with the authentication of its new
|
|
|
|
|
public key, signed by the master public key of its globally unique name,
|
|
|
|
|
encrypted using the transient secret – this is usually public information,
|
|
|
|
|
but since by this point we have established a shared secret and allocated
|
|
|
|
|
memory, might as well send it securely, for sometimes it is going to be
|
|
|
|
|
private information.
|
|
|
|
|
|
|
|
|
|
1. Client 🠆 Server: Equivalent to the final ack of the three phase TCP
|
|
|
|
|
handshake.
|
|
|
|
|
|
|
|
|
|
Sends in the clear server hash as received, any data needed to
|
|
|
|
|
reconstruct the hash, and transient secret key. Then, encrypted to
|
|
|
|
|
transient keys, the hash of the identifier of the public key it wants to
|
|
|
|
|
talk to, its durable public key, and client steady time at which this was
|
|
|
|
|
sent, so that both sides have an estimate of the round trip time and the
|
|
|
|
|
offset between server steady time and client steady time.
|
|
|
|
|
|
|
|
|
|
Server checks the proof of work, checks the cryptographic hash against the
|
|
|
|
|
data in the clear, *then* creates an entry in its hash table for this
|
|
|
|
|
connection, with the shared secret being the transient keys plus the public
|
|
|
|
|
keys.
|
|
|
|
|
|
|
|
|
|
We have two protocols, one for the authenticated phase, and one for
|
|
|
|
|
unauthenticated phase. The client has to know one of the unauthenticated
|
|
|
|
|
protocols offered by the server, or else protocol negotiation will fail in
|
|
|
|
|
the abnormal case that protocol negotiation is needed. Normally there will
|
|
|
|
|
only be one protocol for secured but unauthenticated communication during
|
|
|
|
|
setup, but we make provision by having two protocols, trivially different,
|
|
|
|
|
and three protocols, trivially different for the authenticated phase.
|
|
|
|
|
|
|
|
|
|
You will notice that the server only allocates memory and and asymmetric
|
|
|
|
|
encryption computation *after* the client has successfully performed proof of
|
|
|
|
|
work and shown that it is indeed capable of receiving data sent to the
|
|
|
|
|
advertised network address.
|
|
|
|
|
|
|
|
|
|
In the normal case, the client requests one way authenticated encryption in
|
|
|
|
|
the syn, where the server authenticates but the server does not, and the
|
|
|
|
|
server may, and usually will, offer in the syn-ack only two way
|
|
|
|
|
authenticated encryption, where the client provides an identity unique to
|
|
|
|
|
that server and user’s current default name, but which cannot be used to
|
|
|
|
|
identify the default name, nor the same user accessing a different
|
|
|
|
|
website. This allows the server to see that the same user is accessing
|
|
|
|
|
different resources, how many uniques the server has, and what each unique
|
|
|
|
|
is doing, but does not enable the server’s to put their heads together and
|
|
|
|
|
see that the same user is doing things on one server, and also on another
|
|
|
|
|
server.
|
|
|
|
|
|
|
|
|
|
Now we have a shared secret, protocol negotiated, client logged in, in
|
|
|
|
|
one round trip plus the third one way trip carrying the actual data – the
|
|
|
|
|
same number of round trips as when setting up an unencrypted
|
|
|
|
|
unauthenticated TCP connection.
|
|
|
|
|
|
|
|
|
|
You will notice there is no explicit step checking that both have the
|
|
|
|
|
same shared secret – This is because we assume that each packet sent is
|
|
|
|
|
also authenticated by the shared secret, so if they do not have the same
|
|
|
|
|
secret, nothing will authenticate.
|
|
|
|
|
|
|
|
|
|
# Critiques of TCP/SSL
|
|
|
|
|
|
|
|
|
|
Does the job so badly that using a different method is just as plausible.
|
|
|
|
|
People fight to avoid TLS already, they’d rather send stuff in the clear if
|
|
|
|
|
they could. So just solve the problems they have.
|
|
|
|
|
|
|
|
|
|
In Web Services we frequently require message layer security in addition to
|
|
|
|
|
transport layer security because a Web Service transaction might involve more
|
|
|
|
|
than two endpoints and messages that are stored and forwarded etc. This is why
|
|
|
|
|
WS-\* is not TLS. (It is unfortunately horribly baroque but that was not my
|
|
|
|
|
doing).
|
|
|
|
|
|
|
|
|
|
Problem that occurred with TLS was that there was an assumption that the job\
|
|
|
|
|
was to secure the reliable stream connection mechanics of TCP. False
|
|
|
|
|
assumption.
|
|
|
|
|
|
|
|
|
|
Pretty much nobody uses streams by design, they use datagrams. And they use
|
|
|
|
|
them in a particular fashion: request-response. Where we went wrong with TCP
|
|
|
|
|
was that this was the easiest way to handle the mechanics of getting the
|
|
|
|
|
response back to the agent that sent the request. Without TCP, one had to deal
|
|
|
|
|
with the raw incoming datagrams and allocate them to the different sending
|
|
|
|
|
agents.
|
|
|
|
|
|
|
|
|
|
A second problem was that the design was too intertwined with commercial PKI
|
|
|
|
|
so certs were hung on the side as a millstone for server authentication and
|
|
|
|
|
discarded as client side, leaving passwords to fill that gap. A mess, which
|
|
|
|
|
is an opportunity for redesign, frequently exploited by many designs already.
|
|
|
|
|
|
|
|
|
|
SSL came at this and built a message (record) interface on top of TCP (because
|
|
|
|
|
that was convenient for defining a crypto layer), and then a (mainly) stream
|
|
|
|
|
interface on top of its message interface – because programmers were by now
|
|
|
|
|
familiar with streams, not records.
|
|
|
|
|
|
|
|
|
|
And so … here we are. Living in a city built on top of generations of
|
|
|
|
|
older cities. Dig down and see the accreted layers.
|
|
|
|
|
|
|
|
|
|
What *is* the “right” (easiest to use correctly, hardest to use
|
|
|
|
|
incorrectly, with good performance, across a large number of distinct
|
|
|
|
|
application APIs) underlying interface for a secure network link? The fact
|
|
|
|
|
that the first thing pretty much all APIs do is create a message structure
|
|
|
|
|
on top of TCP makes it clear that “pure stream” isn’t it. Record-oriented
|
|
|
|
|
designs derived from 80-column punch cards are unlikely to be the answer
|
|
|
|
|
either. What a “clean slate” interface would look like is an interesting
|
|
|
|
|
question, and perhaps it’s finally time to explore it.
|
|
|
|
|
|
|
|
|
|
# General and unorganized comments
|
|
|
|
|
|
|
|
|
|
µTP, Micro Transport Protocol is a Bittorrent near drop in replacement for TCP
|
|
|
|
|
that provides lower priority bulk downloads in the background. The library is
|
|
|
|
|
not well documented, (header file plus examples) but as far as I can see,
|
|
|
|
|
provides a reasonably clean separation between Bittorrent and the transport
|
|
|
|
|
mechanism.
|
|
|
|
|
|
|
|
|
|
Google has a TCP/SSL replacement, [QUIC], which avoids round tripping and
|
|
|
|
|
renegotiation by integrating the security layer with the reliability layer,
|
|
|
|
|
and by supporting multiple asynchronous streams within a stream
|
|
|
|
|
|
|
|
|
|
Layering a new peer-to-peer packet network over the Internet is simply
|
|
|
|
|
what the Internet is designed for. UDP is broken in a few ways, but not
|
|
|
|
|
that can’t be fixed. It’s simply a matter of time before a new virtual
|
|
|
|
|
packet layer is deployed – probably one in which authentication and
|
|
|
|
|
encryption are inherent.
|
|
|
|
|
|
|
|
|
|
For authentication and encryption to be inherent, needs to connect
|
|
|
|
|
between public keys, needs to be based on Zooko’s triangle. Also
|
|
|
|
|
needs to penetrate firewalls, and do protocol negotiation with an
|
|
|
|
|
unlimited number of possible protocols – avoiding that internet names and
|
|
|
|
|
numbers authority.
|
|
|
|
|
|
|
|
|
|
Ian Grigg “Good protocols divide into two parts, the first of which says
|
|
|
|
|
to the second, trust this key completely!”.
|
|
|
|
|
|
|
|
|
|
This might well be the basis of a better problem factorization than the
|
|
|
|
|
layer factorization – divide the task by the way trust is embodied, rather
|
|
|
|
|
than the basis of layered communication.
|
|
|
|
|
|
|
|
|
|
Trust is an application level issue, not a communication layer issue,
|
|
|
|
|
but neither do we want each application to roll its own trust cryptography
|
|
|
|
|
– which at present web servers are forced to do. (Insert my standard rant
|
|
|
|
|
against SSL/TLS).
|
|
|
|
|
|
|
|
|
|
Most web servers are vulnerable to attacks akin to session cookie
|
|
|
|
|
fixation attack, because each web page reinvents session cookie handling,
|
|
|
|
|
and even experts in cryptography are apt to get it wrong.
|
|
|
|
|
|
|
|
|
|
The correct procedure is to generate and issue a strongly unguessable
|
|
|
|
|
random https only cookie on successful login, representing the fact that
|
|
|
|
|
the possessor of this cookie has proven his association with a particular
|
|
|
|
|
database record, but very few people, including very few experts in
|
|
|
|
|
cryptography, actually do it this way. Association between a client
|
|
|
|
|
request and a database record needs to be part of the security system. It
|
|
|
|
|
should not something each web page developer is expected to build on top
|
|
|
|
|
of the security system.
|
|
|
|
|
|
|
|
|
|
TCP constructs a reliable pipeline stream connection out of unreliable
|
|
|
|
|
packet connections.
|
|
|
|
|
|
|
|
|
|
There are a bunch of problems with TCP. No provision was made for
|
|
|
|
|
protocol negotiation and so any upgrade has to be fully backwards
|
|
|
|
|
compatible. A number of fixes have been made, for example the long
|
|
|
|
|
fat pipe problem has been fixed by window size negotiation, which is semi
|
|
|
|
|
incompatible and leads to flaky behaviour with old style routers, but the
|
|
|
|
|
transaction problem remains intolerable. The transaction problem has
|
|
|
|
|
been reduced by protocol level workarounds, such as “Keep alive” for HTTP,
|
|
|
|
|
but these are not entirely satisfactory. The fix for syn flooding
|
|
|
|
|
works, but causes some minor unnecessary degradation of performance under
|
|
|
|
|
syn flood attacks, because the syn cookie is limited to 48 bits – needs to
|
|
|
|
|
be 128 bits both to deal with the syn flood attack, and to prevent TCP
|
|
|
|
|
hijacking.
|
|
|
|
|
|
|
|
|
|
TCP is inefficient over wireless, because interference problems are
|
|
|
|
|
rather different to those provided for in the TCP model. This
|
|
|
|
|
problem is pretty much insoluble because of the lack of protocol
|
|
|
|
|
negotiation.
|
|
|
|
|
|
|
|
|
|
There are cases intermediate between TCP and UDP, which require
|
|
|
|
|
different balances of timeliness, reliability, streaming, and record
|
|
|
|
|
boundary distinction. DCCP and SCTP have been introduced to deal with
|
|
|
|
|
these intermediate cases, SCTP for when one has many independent
|
|
|
|
|
transactions running over a single connection, and DCCP for data where
|
|
|
|
|
time sensitivity matters more than reliability such as voice over
|
|
|
|
|
IP. SCTP would have been better for HTML and HTTP than TCP is,
|
|
|
|
|
though it is a bit difficult to change now. Problems such as
|
|
|
|
|
password-authenticated key agreement transaction to a banking site require
|
|
|
|
|
something that resembles encrypted SCTP, analogous to the way that TLS is
|
|
|
|
|
encrypted TCP, but nothing like that exists as yet. Standards exist for
|
|
|
|
|
encrypted DCCP, though I think the standards are unsatisfactory and
|
|
|
|
|
suspect that each vendor will implement his own incompatible version, each
|
|
|
|
|
of which will claim to conform to the standard.
|
|
|
|
|
|
|
|
|
|
But a new threat has arrived: TCP man in the middle forgery.
|
|
|
|
|
|
|
|
|
|
Connection providers, such as Comcast, frequently sell more bandwidth
|
|
|
|
|
than they can deliver. To curtail customer demands, they forge
|
|
|
|
|
connection shutdown packets (reset packets), to make it appear that the
|
|
|
|
|
nodes are misbehaving, when in fact it is the connection between nodes,
|
|
|
|
|
the connection that Comcast provides, that is misbehaving. Similarly, the
|
|
|
|
|
great firewall of China forges reset packets when Chinese connect to web
|
|
|
|
|
sites that contain information that the Chinese government does not
|
|
|
|
|
approve of. Not only does the Chinese government censor, but it is able to
|
|
|
|
|
use a mechanism that conceals the fact of censorship.
|
|
|
|
|
|
|
|
|
|
The solution to all these problems is to have protocol negotiation,
|
|
|
|
|
standard encryption, and flow control inside the encryption.
|
|
|
|
|
|
|
|
|
|
A problem with the OSI Layer model is that as one piles one layer on top
|
|
|
|
|
of another, one is apt to get redundant round trips.
|
|
|
|
|
|
|
|
|
|
According to [google research] 400
|
|
|
|
|
milliseconds reduces usage by 0.76%, or roughly two percent per second of delay.
|
|
|
|
|
|
|
|
|
|
[google research]: http://googleresearch.blogspot.com/2009/06/speed-matters.html
|
|
|
|
|
|
|
|
|
|
Redundant round trips become an ever more serious problem as bandwidths
|
|
|
|
|
and processor speeds increase, but round trip times reminds constant,
|
|
|
|
|
indeed increase as we become increasingly global and increasingly rely on
|
|
|
|
|
space based communications.
|
|
|
|
|
|
|
|
|
|
Used to be that the biggest problem with encryption was the asymmetric
|
|
|
|
|
encryption calculations – the PKI model has lots and lots of redundant and
|
|
|
|
|
excessive asymmetric encryptions. It also has lots and lots of redundant
|
|
|
|
|
round trips. Now that we can use the NVIDIA GPU with CUDA as a very high
|
|
|
|
|
speed cheap massively parallel cryptographic coprocessor, excessive PKI
|
|
|
|
|
calculations should become less of a problem, but excess round trips are
|
|
|
|
|
an ever increasing problem.
|
|
|
|
|
|
|
|
|
|
Any significant authentication and encryption overhead will result in
|
|
|
|
|
people being too clever by half, and only using encryption and
|
|
|
|
|
authentication where it is needed, with the result that they invariably
|
|
|
|
|
screw up and fail to use it where it is needed – for example the login on
|
|
|
|
|
the http page. So we have to lower the cost of encrypted authenticated
|
|
|
|
|
communications, so that people can simply encrypt and authenticate
|
|
|
|
|
everything without needing to think about it.
|
|
|
|
|
|
|
|
|
|
To get stuff right, we have to ditch the OSI layer model, but simply
|
|
|
|
|
ditching it without replacement will result in problems. It exists for a
|
|
|
|
|
reason, and we have to replace it with something else.
|