wallet/docs/libraries/review_of_crypto_libraries.md
reaction.la fdb84ac880
modified: style.css
modified:   ../libraries/review_of_crypto_libraries.md
2022-06-16 18:15:25 +10:00

309 lines
17 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Review of Cryptographic libraries
...
# Noise Protocol Framework
The Noise Protocol Framework matters because used by Wireguard to do something related to what we intend to accomplish.
Noise is an already existent messaging protocol, implemented in
Wireguard as a UDP only protocol.
My fundamental objective is to secure the social net, particularly the
social net where the money is, the value of most corporations being
the network of customer relationships, employee relationships,
supplier relationships, and employee roles.
This requires that instead of packets being routed to network
addresses identified by certificate authority names and the domain
name system, they are routed to public keys that reflect a private
key derived from the master secret of a wallet.
## Wireguard Noise
Wireguard maps network addresses to public keys, and then to the
possessor of the secret key corresponding to that public key. We
need a system that maps names to public keys, and then packets to
the possessor of the secret key. So that you can connect to service
on some port of some computer, which you locate by its public key.
Existing software looks up a name, finds an thirty two bit or one
twenty eight bit value, and then connects. We need that name to
map through software that we control to a durable and attested
public key, which is then, for random strangers not listed in the conf
file, locally, arbitrarily and temporarily mapped into Wireguard
subnets , which mapping is actually a local and temporary handle to
that public key, which is then mapped back to the public key, which
is then mapped to the network address of the actual owner of that
secret key by software that we control. So that software that we do
not control thinks it is using network addresses, but is actually using
local handles to public keys which are then mapped to network
address supported by our virtual network card, which sends them off,
encapsulated in Wireguard style packets identified by the public
key of their destination to a host in the cloud identified by its actual
network address, which then routes them by public key, either to a
particular local port on that host itself by public key, or to another
host by public key which then routes them eventually by public key
to a particular port.
For random strangers on the internet, we have to in effect NAT
them into our Wireguard subnets, and we don't want them able to
connect to arbitrary ports, so we in effect give them NAT type port forwarding.
It will frequently be convenient to have only one port forwarded
address per public key, in which case our Wireguard fork needs to
accept several public keys, one for each service.
The legacy software process running on the client initiates a
connection to a name and a port, from a random client port. The
legacy server process receives it on the whitelisted port ignoring
the port requested, if only one incoming port is whitelisted for
this key, or to the requested whitelisted port if more than one port
is whitelisted. It replies to the original client port, which was
encapsulated, with the port being replied to encapsulated in the
message secured and identified by public key, and the receiving
networking software on the client has temporarily whitelisted that
client port for messages coming from that server key. Such
"temporary" white listing should last for a very long time, since we
might have quiet but very long lived connections. We do not want
random people on the internet messaging us, but we do want people
that we have messaged to randomly messaging at random times the
service that message them.
One confusing problem is that stable ports are used to identify a
particular service, and random ports a particular connection, and we
have to disentangle this relationship and distinguish connection
identifiers, from service identifiers. We would like public keys to
identify services, rather than hosts but sometimes, they will not.
Whitelist and history helps us disentangle them when connecting to
legacy software, and, within the protocol, they need to be
distinguished even though they will be lumped back together when
talking to legacy software. Internally, we need to distinguish
between connections and services. A service is not a connection.
Note that the new Google https allows many short lived streams,
hence many connections, identified by a single server service port
and a single random client port, which ordinarily would identify a
single connection. A connection corresponds to a single concurrent
process within client software, and single concurrent process within
server software, and many messages may pass back and forth between
these two processes and are handled sequentially by those
processes, who have retrospective agreement about their total shared state.
So we have four very different kinds of things, which old type ports
mangle together
1. a service, which is always available as long as the host is up
and the internet is working, which might have no activity for
a very long time, or might have thousands of simultaneous
connections to computers from all over the internet
1. a connection, which might live while inactive for a very long time,
or might have many concurrent streams active simultaneously
1. a stream which has a single concurrent process attached to it
at both ends, and typically lives only to send a message and
receive a reply. A stream may pass many messages back and
forth, which both ends process sequentially. If a stream is
inactive for longer than a quite short period, it is likely to be
ungracefully terminated. Normally, it does something, and
then ends gracefully, and the next stream and the next
concurrent process starts when there is something to do. While a stream lives, both ends maintain state, albeit in a
request reply, the state lives only briefly.
1. A message.
Representing all this as a single kind of port, and packets going
between ports of a single kind, inherently leads to the mess that we
now have. They should have been thought of as different derived
classes with from a common base class.
[Endpoint-Independent Mapping]:https://datatracker.ietf.org/doc/html/rfc4787
{target="_blank"}
Existing software is designed to work with the explicit white listing
provided by port forwarding through NATs with [Endpoint-Independent Mapping],
and the implicit (but inconveniently
transient) white listing provided by NAT translation, so we make it
look like that to legacy software. To legacy client software, it is as if
sending its packets through a NAT, and to legacy server software, it
is sending its packets through a NAT with port forwarding. Albeit
we make the mapping extremely long lived, since we can rely on
stable identities and have no shortage of them. And we also want
the port mappings (actually internal port whitelistings, they would
be mappings if this was actual NAT) associated with each such
mapping to be extremely stable and long lived.
[Endpoint-Independent Mapping] means that the NAT reuses the
address and port mapping for subsequent packets sent from the
same internal port (X:x) to any external IP address and port (Y:y).
X1':x1' equals X2':x2' for all values of Y2:y2, which our architecture
inherently tends to force unless we do something excessively clever,
since we should not muck with ports randomly chosen. For us, [Endpoint-Independent Mapping] means that the mapping between
external public keys of random strangers not listed in our
configuration files, and the internal ranges of the Wireguard fork
interface is stable, very long lived and *independent of port numbers*.
## Noise architecture
[Noise](https://noiseprotocol.org/) is an architecture and a design document, not source code.
Example source code exists for it, though the [C example]
(https://github.com/rweather/noise-c) uses a build architecture that
may not fit with what I want, and uses protobuf, enemy software. It
also is designed to use several different implementations of the
core crypto protocols, one of them being libsodium, while I want a
pure libsodium only version. It might be easier to implement my
own version, using the existing versions as a guide, in particular and
especially Wireguard's version, since it is in wide use. Probably have
to walk through the existing version.
Noise is built around the ingenious central concept of using as the
nonce the hash of past shared and acknowledged data, which is
AEAD secured but sent in the clear. Which saves significant space
on very short messages, since you have to secure shared state
anyway. It regularly and routinely renegotiates keys, thus has no $2^{64}$
limit on messages. A 128 bit hash sample suffices for the nonce,
since the nonce of the next message will reflect the 256 bit hash of
the previous message, hence contriving a hash that has the same
nonce does the adversary no good. It is merely a denial of service.
I initially thought that this meant it had to be built on top of a
reliable messaging protocol, and it tends to be described as if it did,
but Wireguard uses a bunch of designs and libraries in its protocol,
with Noise pulling most of them together, and I need to copy,
rather than re-invent their work.
On the face of it, Wireguard does not help with what I want to do.
But I am discovering a whole lot of low level stuff related to
maintaining a connection, and Wireguard incorporates that low level stuff.
Noise goes underneath, and should be integrated with, reliable
messaging. It has a built in message limit of 2^16 bytes. It is not
just an algorithm, but very specific code.
Noise is messaging code. Here now, and present in Wireguard,
as a UDP only cryptographic protocol. I need to implement my
messaging system as a fork of Wireguard.
Wireguard uses base64, and my bright idea of slash6 gets in the
way. Going to use base52 for any purposes for which my bright idea
would have been useful, so should be rewritten to base64 regardless.
Using the hash of shared state goes together with immutable
append only Merklepatricia trees like ham and eggs, though you
don't need to keep the potentially enormous data structure around.
When a connection has no activity for a little while, you can discard
everything except a very small amount of data, primarily the keys,
the hash, the block number, the MTU, and the expected timings.
The Noise system for hashing all past data is complicated and ad
hoc. For greater generality and more systematic structure, for a
simpler fundamental structure with fewer arbitrary decisions about
particular types of data, needs to be rewritten as hashing like an
immutable append only Patricia Merkle tree. Which instantly and
totally breaks interoperability with existing Wireguard, so to talk
to the original Wireguard, has to know what it is talking to.
Presumably Wireguard has a protocol negotiation mechanism, that
you can hook. If it does not, well, it breaks and the nature of the
thing that public key addresses has to be flagged anyway, since I
am using Ristretto public keys, and they are not. Also, have to move
Wireguard from NACL encryption to Libsodium encryption, because
NACL is an attack vector.
Wireguard messages are distinguishable on the wire, which is odd,
because Noise messages are inherently white noise, and destination
keys are known in advance. Looks like enemy action by the bad guys at NACL.
I think a fork that if a key is an legacy key type, talks legacy
wireguard, and if a new type (probably coming from our domain
name system), though it can also be placed in `.conf` files) talks
with packets indistinguishable from white noise to an adversary that
does not know the key.
Old type session initiation messages are distinguishable from
random noise. For new type session initiation messages to a server
with an old type id and a new type id on the same port, make sure
that the new type session initiation packet does not match, which
may require both ends to try a variety of guesses if its expectations
are violated. Which opens a DOS attack, but that is OK. You just
shut down that connection. DOS resistance is going to require
messages readily distinguishable from random noise, but we don't
send those messages unless facing workloads suggestive of DOS,
unless under heavy session initiation load.
Ristretto keys are uncommon, and are recognizable as ristretto
keys, but not if they are sent in unreduced form.
Build on top a fork of Wireguard a messaging system that delivers
messages not to network addresses, but to Zooko names (which
might well map to a particular port on a particular host, but whose
network address and port may change without people noticing or caring.)
Noise is a messaging protocol. Wireguard is a messaging protocol
built on top of it that relies on public keys for routing messages.
Most of the work is done. It is not what I want built, but it has an
enormous amount of commonality. I plan a very different
architecture, but that is a re-arrangement of existing structures
already done. I am going to want Kademlia and a blockchain for the
routing, rather than a pile of local text files mapping IPs to nameless
public keys. Wireguard is built on `.conf` text files the way the
Domain name system was built on `host` files. It almost does the job,
needs a Kamelia based domain name system on top and integrated with it.
# [Libsodium](./building_and_using_libraries.html#instructions-for-libsodium)
# I2P
The [Invisible Internet Project](https://geti2p.net/en/about/intro) does a great deal of the chat capability that you want. You need to interface with their stuff, rather than duplicate it. In particular, your wallet identifiers need to be I2P identifiers, or have corresponding I2P identifiers, and your anonymized transactions should use the I2P network.
They have a substitute for UDP, and a substitute for TCP, and your anonymized transactions are going to use that.
# Amber
[Amber](https://github.com/bernedogit/amber)
Not as fast and efficient as libsodium, and further from Bernstein. Supports base 58, but [base58check](https://en.bitcoin.it/wiki/Base58Check_encoding#Base58_symbol_chart) is specifically bitcoin protocol, supporting run time typed checksummed cryptographically strong values. Note that any value you are displaying in base 58 form might as well be bitstreamed, for the nearest match between base 58 and base two is that 58^7^ is only very slightly larger than 2^41^, so you might as well use your prefix free encoding for the prefix.
[Curve25519](https://github.com/msotoodeh/curve25519)
Thirty two byte public key, thirty two byte private key.
Key agreement is X25519
Signing is ED25519. Sixty four byte signature.
Trouble is that amber does not include Bernsteins assembly language optimizations.
[ED25519/Donna](https://github.com/floodyberry/ed25519-donna) does include Bernsteins assembly language optimizations, but is designed to compile against OpenSSL. Probably needs some customization to compile against Amber. Libsodium is designed to be uncontaminated by NSA.
ED25519 does not directly support [Schnorr signatures](schnorr-signatures.pdf), being nonprime. Schnorr signatures can do multisig, useful for atomic exchanges between blockchains, which are multisig, or indeed arbitary algorithm sig. With some cleverness and care, they support atomic exchanges between independent block chains.
explanation of how to do [Schnorr multisignatures](https://www.ietf.org/archive/id/draft-ford-cfrg-cosi-00.txt) [using ED25519](https://crypto.stackexchange.com/questions/50448/schnorr-signatures-multisignature-support#50450)
Amber library packages all these in what is allegedly easy to incorporate form, but does not have Schnorr multisignatures. 
[Bernstein paper](https://ed25519.cr.yp.to/software.html). 
The fastest library I can find for pairing based crypto is [herumi](https://github.com/herumi/mcl). 
How does this compare to [Curve25519](https://github.com/bernedogit/amber)?
There is a good discussion of the performance tradeoff for crypto and IOT in [this Internet Draft](https://datatracker.ietf.org/doc/draft-ietf-lwig-crypto-sensors/), currently in IETF last call: 
From the abstract:. 
> This memo describes challenges associated with securing resource-
> constrained smart object devices. The memo describes a possible
> deployment model where resource-constrained devices sign message
> objects, discusses the availability of cryptographic libraries for
> small devices and presents some preliminary experiences with those
> libraries for message signing on small devices. Lastly, the memo
> discusses trade-offs involving different types of security
> approaches.
The draft contains measurement and evaluations of libraries, allegedly
including herumi.  But I dont see any references to the Herumi library in
that document, nor any evaluations of the time required for pairing based
cryptography in that document. Relic-Toolkit is not Herumi and is supposedly
markedly slower than Herumi. 
Looks like I will have to compile the libraries myself and run tests on them.