wallet/docs/design/nat.md

---
lang: en
title: Peering through NAT
...
A library to peer through NAT is a library to replace TCP, the domain
name system, SSL, and email.  This is covered at greater length in
[Replacing TCP](TCP.html)

# Implementation issues

There is a great [pile of RFCs on issues that arise with using udp and icmp
to communicate.
[Peer-to-Peer Communication Across Network Address Translators]
(https://bford.info/pub/net/p2pnat/){target="_blank"}

## timeout

The NAT mapping timeout is officially 20 seconds, but I have no idea
what this means in practice.  I suspect each NAT discards port mappings
according to its own idiosyncratic rules,  but 20 seconds may be a widely respected minimum.

The official maximum time that should be assumed is two minutes, but
this is far from widely implemented, so keep alives often run faster.

Minimum socially acceptable keep alive time is 15 seconds.  To avoid
synch loops, random jitter in keep alives is needed.  This is discussed at
length in [RFC2450](https://datatracker.ietf.org/doc/html/rfc5405)

An experiment on [hole punching] showed that most NATs had a way
longer timeout, and concluded that the way to go was to just repunch as
needed.  They never bothered with keep alive.  They also found that a lot of
the time, both parties were behind the same NAT, sometimes because of
NATs on top of NATs

[hole punching]:https://tailscale.com/blog/how-nat-traversal-works
"How to communicate peer-to-peer through NAT firewalls"
{target="_blank"}

Another source says that "most NAT tables expire within 60 seconds, so
NAT keepalive allows phone ports to remain open by sending a UDP
packet every 25-50 seconds".

The no brainer way is that each party pings the other at a mutually agreed
time every 15 seconds.  Which is a significant cost in bandwidth.  But if a
server has 4Mib/s of internet bandwidth, can support keepalives for couple
of million clients.  On the other hand, someone on cell phone data with thirty
peers is going to make a significant dent in his bandwidth.

With client to client keepalives, probably a client will seldom have more
than dozen peers.  Suppose each keepalive is sent 15 seconds after the
counterparty's previous packet, or an expected keepalive is not received,
and each keepalive acks received packets. If not receiving expected acks
or expected keepalives, we send nack keepalives (hello-are-you-there
packets) one per second, until we give up.

This algorithm should not be baked in stone, but rather should be an
option in the connection negotiation, so that we can do new algorithms as
the NAT problem changes, as it continually does.

If two parties are trying to setup a connection through a third party broker,
they both fire packets at each other (at each other's IP as seen by the
broker) at the same broker time minus half the broker round trip time.  If
they don't get a packet in the sum of the broker round trip times, keep
firing with slow exponential backoff until connection is achieved,or until
exponential backoff approaches the twenty second limit.

Their initial setup packets should be steganographed as TCP startup
handshake packets.

We assume a global map of peers that form a mesh whereby you can get
connections, but not everyone has to participate in that mesh.  They can be
clients of such a peer, and only inform selected counterparties as to whom
they are a client of.

The protocol for a program to open port forwarding is part of Universal Plug and Play, UPnP, which was invented by Microsoft but is now ISO/IEC 29341 and is implemented in most SOHO routers.

But is it generally turned off by default, or manually. Needless to say, if relatively benign Bitcoin software can poke a hole in the
firewall and set up a port forward, so can botnet malware.

The standard for poking a transient hole in a NAT is STUN, which only works for UDP – but generally works – not always, but most of the time. This problem everyone has dealt with, and there are standards, but not libraries, for dealing with it. There should be a library for dealing with it – but then you have to deal with names and keys, and have a reliability and bandwidth management layer on top of UDP.

But if our messages are reasonably short and not terribly frequent, as client messages tend to be, link level buffering at the physical level will take care of bandwidth management, and reliability consists of message received, or message not received. For short messages between peers, we can probably go UDP and retry.

STUN and ISO/IEC 29341 are incomplete, and most libraries that supply implementations are far too complete – you just want a banana, and you get the entire jungle.

Ideally we would like a fake or alternative TCP session setup, using raw
sockets and then you get a regular standard TCP connection on a random
port, assuming that the target machine has that service running, and the
default path for exporting that service results in window with a list of
accessible services, and how busy they are. Real polish would be hooking
the domain name resolution so that looking up the names in the peer top
level domain create a a hole, using fake TCP packets sent through a raw
socket. then return the ip of that hole.  One might have the hole go through
wireguard like network interface, so that you can catch them coming and
going.

Note that the internet does not in fact use the OSI model though everyone talks as if it did. Internet layers correspond only vaguely to OSI layers, being instead:

1.  Physical
2.  Data link
3.  Network
4.  Transport
5.  Application

And I have no idea how one would write or install one’s own network or
transport layer, but something is installable, because I see no end of
software that installs something, as every vpn does, wireguard being the simplest.

------------------------------------------------------------------------

Assume an identity system that finds the entity you want to
talk to.

If it is behind a firewall, you cannot notify it, cannot
send an interrupt, cannot ring its phone.

Assume the identity system can notify it. Maybe it has a
permanent connection to an entity in the identity system.

Your target agrees to take the call. Both parties are
informed of each other’s IP address and port number on which
they will be taking the call by the identity system.

Both parties send off introduction UDP packets to the
other’s IP address and port number – thereby punching holes
in their firewall for return packets. When they get
a return packet, an introduction acknowledgement, the
connection is assumed established.

It is that simple.

Of course networks are necessarily non deterministic,
therefore all beliefs about the state of the network need to
be represented in a Bayesian manner, so any
assumption must be handled in such a manner that the
computer is capable of doubting it.

We have finite, and slowly changing, probability that our
packets get into the cloud, a finite and slowly changing
probability that our messages get from the cloud to our
target. We have finite probability that our target
has opened its firewall, finite probability that our
target can open its firewall, which transitions to
extremely high probability when we get an
acknowledgement – which prior probability diminishes over
time.

As I observe in [Estimating Frequencies from Small Samples](./estimating_frequencies_from_small_samples.html) any adequately flexible representation of the state of
the network has to be complex, a fairly large body of data,
more akin to a spam filter than a Boolean.
-												leaving potentially inconvenient history behind

											
										
										
											2022-02-16 00:53:01 -05:00
+								---
 								lang: en
 								title: Peering through NAT
-												Updated to current pandoc format

Which affected all documentation files.

											
										
										
											2022-05-06 22:49:33 -04:00
+								...
-												leaving potentially inconvenient history behind

											
										
										
											2022-02-16 00:53:01 -05:00
+								A library to peer through NAT is a library to replace TCP, the domain
-												Updated to current pandoc format

Which affected all documentation files.

											
										
										
											2022-05-06 22:49:33 -04:00
+								name system, SSL, and email.  This is covered at greater length in
-												shortening names

Preparatory to creating a proper link browse structure

											
										
										
											2022-07-08 03:05:24 -04:00
+								[Replacing TCP](TCP.html)
-												Updated to current pandoc format

Which affected all documentation files.

											
										
										
											2022-05-06 22:49:33 -04:00
 								# Implementation issues
-												Added discussion for implementing peer to peer.  It is harder than it
seems, because you typically want to communicate with multiple peers at
the same time.

Minor updates, and moved files to more meaningful locations,
which required updating links.

											
										
										
											2023-12-19 23:08:52 -05:00
+								There is a great [pile of RFCs on issues that arise with using udp and icmp
-												Updated to current pandoc format

Which affected all documentation files.

											
										
										
											2022-05-06 22:49:33 -04:00
+								to communicate.
-												Added discussion for implementing peer to peer.  It is harder than it
seems, because you typically want to communicate with multiple peers at
the same time.

Minor updates, and moved files to more meaningful locations,
which required updating links.

											
										
										
											2023-12-19 23:08:52 -05:00
+								[Peer-to-Peer Communication Across Network Address Translators]
 								(https://bford.info/pub/net/p2pnat/){target="_blank"}
-												Updated to current pandoc format

Which affected all documentation files.

											
										
										
											2022-05-06 22:49:33 -04:00
 								## timeout
-												leaving potentially inconvenient history behind

											
										
										
											2022-02-16 00:53:01 -05:00
 								The NAT mapping timeout is officially 20 seconds, but I have no idea
 								what this means in practice.  I suspect each NAT discards port mappings
 								according to its own idiosyncratic rules,  but 20 seconds may be a widely respected minimum.
-												Updated to current pandoc format

Which affected all documentation files.

											
										
										
											2022-05-06 22:49:33 -04:00
+								The official maximum time that should be assumed is two minutes, but
 								this is far from widely implemented, so keep alives often run faster.
 								Minimum socially acceptable keep alive time is 15 seconds.  To avoid
 								synch loops, random jitter in keep alives is needed.  This is discussed at
 								length in [RFC2450](https://datatracker.ietf.org/doc/html/rfc5405)
-												leaving potentially inconvenient history behind

											
										
										
											2022-02-16 00:53:01 -05:00
+								An experiment on [hole punching] showed that most NATs had a way
 								longer timeout, and concluded that the way to go was to just repunch as
 								needed.  They never bothered with keep alive.  They also found that a lot of
 								the time, both parties were behind the same NAT, sometimes because of
 								NATs on top of NATs
-												Added discussion for implementing peer to peer.  It is harder than it
seems, because you typically want to communicate with multiple peers at
the same time.

Minor updates, and moved files to more meaningful locations,
which required updating links.

											
										
										
											2023-12-19 23:08:52 -05:00
+								[hole punching]:https://tailscale.com/blog/how-nat-traversal-works
-												leaving potentially inconvenient history behind

											
										
										
											2022-02-16 00:53:01 -05:00
+								"How to communicate peer-to-peer through NAT firewalls"
 								{target="_blank"}
 								Another source says that "most NAT tables expire within 60 seconds, so
 								NAT keepalive allows phone ports to remain open by sending a UDP
 								packet every 25-50 seconds".
 								The no brainer way is that each party pings the other at a mutually agreed
 								time every 15 seconds.  Which is a significant cost in bandwidth.  But if a
 								server has 4Mib/s of internet bandwidth, can support keepalives for couple
 								of million clients.  On the other hand, someone on cell phone data with thirty
 								peers is going to make a significant dent in his bandwidth.
 								With client to client keepalives, probably a client will seldom have more
 								than dozen peers.  Suppose each keepalive is sent 15 seconds after the
 								counterparty's previous packet, or an expected keepalive is not received,
 								and each keepalive acks received packets. If not receiving expected acks
 								or expected keepalives, we send nack keepalives (hello-are-you-there
 								packets) one per second, until we give up.
 								This algorithm should not be baked in stone, but rather should be an
 								option in the connection negotiation, so that we can do new algorithms as
 								the NAT problem changes, as it continually does.
 								If two parties are trying to setup a connection through a third party broker,
 								they both fire packets at each other (at each other's IP as seen by the
 								broker) at the same broker time minus half the broker round trip time.  If
 								they don't get a packet in the sum of the broker round trip times, keep
 								firing with slow exponential backoff until connection is achieved,or until
 								exponential backoff approaches the twenty second limit.
 								Their initial setup packets should be steganographed as TCP startup
 								handshake packets.
 								We assume a global map of peers that form a mesh whereby you can get
 								connections, but not everyone has to participate in that mesh.  They can be
 								clients of such a peer, and only inform selected counterparties as to whom
 								they are a client of.
 								The protocol for a program to open port forwarding is part of Universal Plug and Play, UPnP, which was invented by Microsoft but is now ISO/IEC 29341 and is implemented in most SOHO routers.
 								But is it generally turned off by default, or manually. Needless to say, if relatively benign Bitcoin software can poke a hole in the
 								firewall and set up a port forward, so can botnet malware.
 								The standard for poking a transient hole in a NAT is STUN, which only works for UDP – but generally works – not always, but most of the time. This problem everyone has dealt with, and there are standards, but not libraries, for dealing with it. There should be a library for dealing with it – but then you have to deal with names and keys, and have a reliability and bandwidth management layer on top of UDP.
 								But if our messages are reasonably short and not terribly frequent, as client messages tend to be, link level buffering at the physical level will take care of bandwidth management, and reliability consists of message received, or message not received. For short messages between peers, we can probably go UDP and retry.
 								STUN and ISO/IEC 29341 are incomplete, and most libraries that supply implementations are far too complete – you just want a banana, and you get the entire jungle.
-												Updated to current pandoc format

Which affected all documentation files.

											
										
										
											2022-05-06 22:49:33 -04:00
+								Ideally we would like a fake or alternative TCP session setup, using raw
 								sockets and then you get a regular standard TCP connection on a random
 								port, assuming that the target machine has that service running, and the
 								default path for exporting that service results in window with a list of
 								accessible services, and how busy they are. Real polish would be hooking
 								the domain name resolution so that looking up the names in the peer top
 								level domain create a a hole, using fake TCP packets sent through a raw
 								socket. then return the ip of that hole.  One might have the hole go through
 								wireguard like network interface, so that you can catch them coming and
 								going.
-												leaving potentially inconvenient history behind

											
										
										
											2022-02-16 00:53:01 -05:00
 								Note that the internet does not in fact use the OSI model though everyone talks as if it did. Internet layers correspond only vaguely to OSI layers, being instead:
 .  Physical
 .  Data link
 .  Network
 .  Transport
 .  Application
-												Updated to current pandoc format

Which affected all documentation files.

											
										
										
											2022-05-06 22:49:33 -04:00
+								And I have no idea how one would write or install one’s own network or
 								transport layer, but something is installable, because I see no end of
 								software that installs something, as every vpn does, wireguard being the simplest.
-												leaving potentially inconvenient history behind

											
										
										
											2022-02-16 00:53:01 -05:00
 								------------------------------------------------------------------------
 								Assume an identity system that finds the entity you want to
 								talk to.
 								If it is behind a firewall, you cannot notify it, cannot
 								send an interrupt, cannot ring its phone.
 								Assume the identity system can notify it. Maybe it has a
 								permanent connection to an entity in the identity system.
 								Your target agrees to take the call. Both parties are
 								informed of each other’s IP address and port number on which
 								they will be taking the call by the identity system.
 								Both parties send off introduction UDP packets to the
 								other’s IP address and port number – thereby punching holes
 								in their firewall for return packets. When they get
 								a return packet, an introduction acknowledgement, the
 								connection is assumed established.
 								It is that simple.
 								Of course networks are necessarily non deterministic,
 								therefore all beliefs about the state of the network need to
 								be represented in a Bayesian manner, so any
 								assumption must be handled in such a manner that the
 								computer is capable of doubting it.
 								We have finite, and slowly changing, probability that our
 								packets get into the cloud, a finite and slowly changing
 								probability that our messages get from the cloud to our
 								target. We have finite probability that our target
 								has opened its firewall, finite probability that our
 								target can open its firewall, which transitions to
 								extremely high probability when we get an
 								acknowledgement – which prior probability diminishes over
 								time.
 								As I observe in [Estimating Frequencies from Small Samples](./estimating_frequencies_from_small_samples.html) any adequately flexible representation of the state of
 								the network has to be complex, a fairly large body of data,
 								more akin to a spam filter than a Boolean.