7.0 KiB
lang | title |
---|---|
en | Peering through NAT |
A library to peer through NAT is a library to replace TCP, the domain name system, SSL, and email.
The NAT mapping timeout is officially 20 seconds, but I have no idea what this means in practice. I suspect each NAT discards port mappings according to its own idiosyncratic rules, but 20 seconds may be a widely respected minimum.
An experiment on hole punching showed that most NATs had a way longer timeout, and concluded that the way to go was to just repunch as needed. They never bothered with keep alive. They also found that a lot of the time, both parties were behind the same NAT, sometimes because of NATs on top of NATs
{target="_blank"}
Another source says that "most NAT tables expire within 60 seconds, so NAT keepalive allows phone ports to remain open by sending a UDP packet every 25-50 seconds".
The no brainer way is that each party pings the other at a mutually agreed time every 15 seconds. Which is a significant cost in bandwidth. But if a server has 4Mib/s of internet bandwidth, can support keepalives for couple of million clients. On the other hand, someone on cell phone data with thirty peers is going to make a significant dent in his bandwidth.
With client to client keepalives, probably a client will seldom have more than dozen peers. Suppose each keepalive is sent 15 seconds after the counterparty's previous packet, or an expected keepalive is not received, and each keepalive acks received packets. If not receiving expected acks or expected keepalives, we send nack keepalives (hello-are-you-there packets) one per second, until we give up.
This algorithm should not be baked in stone, but rather should be an option in the connection negotiation, so that we can do new algorithms as the NAT problem changes, as it continually does.
If two parties are trying to setup a connection through a third party broker, they both fire packets at each other (at each other's IP as seen by the broker) at the same broker time minus half the broker round trip time. If they don't get a packet in the sum of the broker round trip times, keep firing with slow exponential backoff until connection is achieved,or until exponential backoff approaches the twenty second limit.
Their initial setup packets should be steganographed as TCP startup handshake packets.
We assume a global map of peers that form a mesh whereby you can get connections, but not everyone has to participate in that mesh. They can be clients of such a peer, and only inform selected counterparties as to whom they are a client of.
The protocol for a program to open port forwarding is part of Universal Plug and Play, UPnP, which was invented by Microsoft but is now ISO/IEC 29341 and is implemented in most SOHO routers.
But is it generally turned off by default, or manually. Needless to say, if relatively benign Bitcoin software can poke a hole in the firewall and set up a port forward, so can botnet malware.
The standard for poking a transient hole in a NAT is STUN, which only works for UDP – but generally works – not always, but most of the time. This problem everyone has dealt with, and there are standards, but not libraries, for dealing with it. There should be a library for dealing with it – but then you have to deal with names and keys, and have a reliability and bandwidth management layer on top of UDP.
But if our messages are reasonably short and not terribly frequent, as client messages tend to be, link level buffering at the physical level will take care of bandwidth management, and reliability consists of message received, or message not received. For short messages between peers, we can probably go UDP and retry.
STUN and ISO/IEC 29341 are incomplete, and most libraries that supply implementations are far too complete – you just want a banana, and you get the entire jungle.
Ideally we would like a fake or alternative TCP session setup, and then you get a regular standard TCP connection on a random port, assuming that the target machine has that service running, and the default path for exporting that service results in window with a list of accessible services, and how busy they are. Real polish would be hooking the domain name resolution so that names in the peer top level domain return a true IP, and and then intercepts TCP session setup for that IP so that it will result in TCP session setup going through the NAT penetration mechanism if the peer is behind a NAT. One can always install one’s own OSI layer three or layer two, as a vpn does or the host for a virtual machine. Intercept the name lookup, and then tell the layer three to do something special when a tcp session is attempted on the recently acquired IP address, assuming the normal case where an attempt to setup a TCP session on an IP address follows very quickly after a name lookup.
Note that the internet does not in fact use the OSI model though everyone talks as if it did. Internet layers correspond only vaguely to OSI layers, being instead:
- Physical
- Data link
- Network
- Transport
- Application
And I have no idea how one would write or install one’s own network or transport layer, but something is installable, because I see no end of software that installs something, as every vpn does.
Assume an identity system that finds the entity you want to talk to.
If it is behind a firewall, you cannot notify it, cannot send an interrupt, cannot ring its phone.
Assume the identity system can notify it. Maybe it has a permanent connection to an entity in the identity system.
Your target agrees to take the call. Both parties are informed of each other’s IP address and port number on which they will be taking the call by the identity system.
Both parties send off introduction UDP packets to the other’s IP address and port number – thereby punching holes in their firewall for return packets. When they get a return packet, an introduction acknowledgement, the connection is assumed established.
It is that simple.
Of course networks are necessarily non deterministic, therefore all beliefs about the state of the network need to be represented in a Bayesian manner, so any assumption must be handled in such a manner that the computer is capable of doubting it.
We have finite, and slowly changing, probability that our packets get into the cloud, a finite and slowly changing probability that our messages get from the cloud to our target. We have finite probability that our target has opened its firewall, finite probability that our target can open its firewall, which transitions to extremely high probability when we get an acknowledgement – which prior probability diminishes over time.
As I observe in Estimating Frequencies from Small Samples any adequately flexible representation of the state of the network has to be complex, a fairly large body of data, more akin to a spam filter than a Boolean.