wallet/docs/generating_numbers_unpredictable_to_an_attacker.md

151 lines
6.1 KiB
Markdown
Raw Permalink Normal View History

---
title: Generating numbers unpredictable to an attacker
...
```default
From: Kent Borg <kentborg@borg.org> 2021-03-30
To: Cryptography Mailing List
```
Entropy is important to RNGs but unfortunately RNG people are at risk of
devoutly worshiping at the alter of "Countable Entropy", blinded to
realities, ready with jeers for anyone who does not share their extreme
theology.
These people are so in the thrall of the theoretical that they are blinded to
the practical and any other theories.
And for practical purposes, it is the unguessability of the RNG that
matters. Any source of unguessable data is a good thing to use to drive an
RNG. Even sources that are dismissed as "squish" by the most devout and
most blinded can be good. And there is a great example that these disciples
can't see.
# Time Distribution is Hard
NTP is really great, I love it, so cool. It can set my computer's clock with a
precision measured in milliseconds. Very impressive it can do this just by
applying algorithms to hardware that is already present.
If one wants better, the next option is to get time from GPS. According to
gps.gov a specialized receiver, at a fixed location, can know the time
within 40ns. This is pretty cool, too. It is good enough to synchronize RF
signals between CDMA cell sites so phones can communicate with more
than one site as the same time.
GPS also depends on billions of dollars of infrastructure with an annual
budget that must be in the millions. People think GPS is about location,
but at its core it is really about time distribution. From end-to-end a design
where every component is doing its best to carefully keep and distribute
precise time. If one pays attention to details and has the money for good
hardware (far more than just a smartphone), to get 40ns is very cool.
# Guessing Time is Even Harder
With all that money can constructive effort, one can do 40ns. What are you
going to do to do better? Get specific. (Warning, it's not going to be easy.)
# Cheap Unguessability
A 1 GHz clock has a cycle time of 1ns. (Is it even possible to buy an Intel-based
machine that runs slower than 1GHz these days?) 1ns is a lot
smaller than 40ns. You don't know the value of my clock.
Intel chips have a timestamp counter that increments with every tick of the
system clock. You don't know the value of my counter.
The system clock isn't fed to the CPU, the CPU is fed a much lower
frequency, that it then multiplies up using analog on-chip PLL circuitry.
That clock is then (carefully) distributed on-chip. And even then, different
parts of the chip are in different clock domains, because clock distribution
and synchronization is hard.
So the "system clock" doesn't exist outside the CPU, it is only a "CPU clock",
and not all parts of the CPU are even privy to it.
No one at any distance outside that chip knows the value of the timestamp
counter. A program might contain the instruction to read the timestamp
counter, but by the time anything is done with that value, it will have
changed.
Is there "entropy" in that system clock? Some, but only some. The PLL will
have some jitter, the precision of the lower frequency input clock will have
iffy precision and be subject to drift.
Is there "unguessability" in that system clock? Plenty! At least to any
observer at any distance (i.e., outside the computer).
Remember, it takes billions of dollars and lots of careful design and
cooperation to distribute 40ns. time. No such effort nor expense has been made
to tell the world the precise value of my 1ns period (or less) CPU clock.
No one outside my computer knows its precise value.
# Back on Topic
Intel hardware has a great source of unguessability in its timestamp
counter. All you need is an uncorrelated sampling of this clock. Say, a
network interrupt.
I know the squish patrol is now all upset, because external observers can
be the one's sending these packets with careful timing. So what? The
timing can't be careful enough. The value that is read from the timestamp
counter in servicing that interrupt depends on knowing edge timings far
more closely than 1ns, for every time the observer guesses a value on the
wrong side of one of these edges, one bit of unguessability slips by.
# RNGs are Still Hard
A (1) uncorrelated sampling of a (2) fast clock is, indeed, a good source of
unguessability.
But, make sure both those things be true.
Is virtualization messing with how these things work? Is variable clock
scaling messing with it? Have interrupts been virtualized in some
predictable way? Is the timestamp counter being messed with in an
attempt to have it not appear to be warped by clock scaling and effectively
running much slower? Is some OS scheduling algorithm synchronizing
interrupt servicing with timestamp values?
Just because there is an underappreciated way to feed an RNG doesn't
mean there aren't plenty of ways to still mess it up. ("Um, it turns out the
RNG isn't in production builds." Who will notice?)
Implementation matters.
But the fact remains time distribution is hard, the period of a gigahertz clock is small. No one at any distance knows its value. An awful lot of computers out there can use this to drive their RNGs.
-kb, the Kent who laments that Arm CPUs didn't have something like a timestamp counter last he looked.
# Attacks
```default
From: Barney Wolff <barney@databus.com> 2021-05-31
To: Cryptography Mailing List
```
Surely this depends on how many guesses an attacker is
allowed before being detected and blocked. If there's
no penalty for guessing wrong, as with an offline attack,
I doubt the GHz ticker can contribute more than about 20
bits or so.
# Implementation
```default
From: jrzx <jrzx@protonmail.ch> 2021-06-06
To: Cryptography Mailing List
```
Every network or disk event provides several bits of unguessability. You
are going to accumulate a 128 bits in a hundred milliseconds or so.
Accumulate the bits into Knuth's additive number generator 3.2.2, then
hash the seed.
Continue accumulating randomness into the seed when you get
uncorrelated events. Continue hashing the seed when you need more
random numbers.
The attacker performing an offline attack will have to guess all 128 bits.