c3fa8fe004
Waste of time
32 KiB
32 KiB
title:
Merkle-patricia Dac
# katex
...
# Definition
## Merkle-patricia Trees
A Merkle-patricia tree is a way of hashing a map, an associative
array, such that the map can have stuff added to it, or removed
from it, without having to rehash the entire map, and such that one
can prove a subset of the map, such as a single mapping, is part of
the whole map, without needing to have to have the whole map
present to construct the hash.
The need to have the entire blockchain present to validate the
current state of the blockchain and any particular fact about a
blockchain is a huge problem with existing blockchains, as they
grow enormous, and rapidly becoming a bigger problem.
In a Merkle dag, vertices are data structures, and edges are hashes,
as if one followed a link by looking up the preimage of a hash.
Obviously this is seldom an efficient way of actually implementing a
edges, and one is in practice going to use a pointer or handle for
data structures in memory, and an oid for structures in a database,
but this is an internal representation. The canonical form has no
pointers, no oids and no handles, these are internal representations
of the structure defined and implied by the hashes, and in
communications between humans and machines, and when defining
algorithms and operation on the data structure, the algorithm should
be defined, and the data communicated, as if one was actually using
the hashes, rather than oids and pointers.
Its practical application is constructing a global consensus on what
public keys have the right to control what digital assets (such as
crypto currencies, and globally defined human readable names) and
proving that everyone who matters agrees on ownership.
If a large group of peers, each peer acting on behalf of a large group
of clients each of whom have rights to a large number of digital assets,
agree on what public keys are entitled to control what digitial assets,
then presumably their clients also agree, or they would not use that
peer.
Thus, for example, we don’t want the Certificate Authority to be able
to tell Bob that his public key is a public key whose corresponding
secret key is on his server, while at the same telling Carol that Bob’s
public key is a public key whose corresponding secret key is in fact
controlled by the secret police.
The Merkle-patricia tree not only allows peers to form a consensus on an
enormous body of data, it allows clients to efficiently verify that any
quite small piece of data, any datum, is in accord with that consensus.
## Patricia trees
A patricia tree is a way of structuring a potentially very large list of
bitstrings sorted by bitstring such that bitstrings can be added or
deleted without resorting to shifting the whole list of bitstrings.
In practice, we are not interested in bitstrings, we are interested in
fields. We want to represent an sql table by patricia tree, and do sql
operations on the tree as if it were a table.
But it is in fact a tree, and its interior vertices do not have complete
fields, they have bitstrings representing part of a field. If we have a
binary patricia tree repesenting a database table with
With this data structure, there is something stopping them. They cannot pull a brand new history out of their pocket, because the clients have a collection of very old roots of very large balanced binary merkle trees of blocks. They keep the hash paths to all their old transactions around, and if the peers invent a brand new history, the clients find that the context of all their old transactions has changed. 1. It protects clients against malicious peers, since any claim the peer makes about the total state of the blockchain can be proven with
n
entries, it
will have n
leaf vertices, and n-1
internal vertices.
So, to map from a bitstring to a field representing the primary key
of an sql index, we append to the bitstring a 1
bit, followed by as
many 0
bits as are needed to bring it up to one bit past the right
boundary of the field plus one additional bit.
If it is already at the right boundary, we append merely one
additional 1
bit.
The additional bit is flag indicating a final vertex, a leaf vertex of the
index, false (0
) for interior vertices, true (1
) for leaf vertices of
the index -- so we now have a full field, plus a flag.
A bitstring represents the path through the merkle patricia tree to
vertex, and we will, for consistency with sql database terminology,
call the bitstring padded to one bit past the field boundary the key,
the key being the sql field plus the one additional trailing bit, the
field boundary flag (because we are dealing with the tree
representation of an sql table, so need to know whether we have
finally reached the actual record, or are still walking through the
index, and the field boundary flag represents whether we have
reached the actual record or not.)
To obtain the bitstring from the key, we remove the trailing 0
bits
and last 1 bit. Which is to say, if the field boundary flag is true, our
bitstring is the field, and if the field boundary flag is false our
bitstring is the field with last 1
bit and the following 0
bits discarded.
In a patricia tree each vertex is associated with a bitstring.
The when you walk the tree to the left child of a vertex, you add a
zero bit, plus the bits, if any associated with that link, to predict the
bitstring of the left child, and when you walk to the right hand child,
a one bit,plus the bits if any associated with that link.
This enables you, given the bitstring you start with, and bitstring of
the vertex you want to find, the path through the patricia tree.
And, if it is a Merkle patricia tree, this enables you to not only
produce a short efficient proof that proves the presence of a
certain datum in an enormous pile of data, but the absence of a datum.
Suppose we have database table whose primary key is a three bit
integer, and it contains four records, with oids 2, 4, 5, and 6. The
big endian representation of those primary keys 0b010, 0b100,
0b101, and 0b111
The resulting patricia tree with infix keys is:
On the rightmost path from the top, the path gains two 1
bits
because it goes right both times, then a 0
bit from the link, thus the
bitstring identity of the path and the vertex is 110
.
We call the bits that link adds, in addition to the bit that results
from the choice to go left or right, the skip bits, because we are
skipping levels in the binary tree.
On the left most path from the top, the path gains one 0
bit
because it goes left, then 10
from the link, thus the
bitstring identity of the path and the vertex is 010
.
The two middle paths have identity purely from the left/right
choices, not receiving any additional bits from the links other than
the bit that comes from going left or right.
Each bitstring, thus each key field, identifies a vertex and a path
through the patricia tree.
We do not necessarily want to actually manipulate or represent
the bitstrings of vertices and skip fields as bitstrings. It is likely to
be a good more convenient to represent and manipulate keys, and to
represent the skip bits by the key of the target vertex.
Fields have meanings for the application using the Merkle patricia
dag, bitstrings lack meaning.
But to understand what a patricia tree is, and to manipulate it, our
actions have to be equivalent to an algorithm described in terms of
bitstrings. We use keys because computers manipulate bytes better
than bits, just as we use pointers because we don't want to look up
the preimages of hashes in a gigantic table of hashes. But a Merkle
tree algorithm must work as if we were looking up preimages by
their hash, and sometimes we will have to look up a preimage by its
hash,and a patricia tree algorithm as if we were manipulating
bitstrings, and sometimes we will have to manipulate bitstrings.
The total number of vertexes equals the twice the number of leaves
minus one. Each parent node has as its identifier, a sequence of
bits not necessarily aligned on field boundaries, that both its children
have in common.
A Merkle-patricia dac is a patricia tree with binary radix (which is
the usual way patricia trees are implemented) where the hash of each
node depends on the hash and the skip of its two children; Which means
that each node contains proof of the entire state of all its descendant
nodes.
The skip of a branch is the bit string that differentiates its bit
string from its parent, with the first such bit excluded as it is
implied by being a left or right branch. This is often the empty
bitstring, which when mapped to a byte string for hashing purposes, maps
to the empty byte string.
It would often be considerably faster and more efficient to hash the
full bitstring, rather than the skip, and that may sometimes be not
merely OK, but required, but often we want the hash to depend only on
the data, and be independent of the metadata, as when the leaf index is
an arbitrary precision integer representing the global order of a
transaction, that is going to be constructed at some later time and
determined by a different authority.
Most of the time we will be using the tree to synchronize two
blocks pending transactions, so though a count of the number of
children of a vertex or an edge is not logically part of a Merkle-patricia
tree, it will make synchronization considerably more
efficient, since the peer that has the block with fewer children wants
information from the peer that has the node with more children.
## A sequential append only collection of postfix binary trees
This data structure means that instead of having one gigantic
proof that takes weeks to evaluate that the entire blockchain is
valid, you have an enormous number of small proofs that each
particular part of the blockchain is valid. This has three
advantages over the chain structure.
1. A huge problem with proof of stake is "nothing at stake".
There is nothing stopping the peers from pulling a whole
new history out of their pocket.With this data structure, there is something stopping them. They cannot pull a brand new history out of their pocket, because the clients have a collection of very old roots of very large balanced binary merkle trees of blocks. They keep the hash paths to all their old transactions around, and if the peers invent a brand new history, the clients find that the context of all their old transactions has changed. 1. It protects clients against malicious peers, since any claim the peer makes about the total state of the blockchain can be proven with
\bigcirc(\log_2n)
hashes.
1. If a block gets lost or corrupted that peer can identify that one
specific block that is a problem. At present peers have to download,
or at least re-index, the entire blockchain far too often, and a full
re-index takes days or weeks.
This is not a Merkle-patricia tree. This is a generalization of a Merkle
patricia dag to support immutability.
The intended usage is an immutable append only dag.
In a binary patricia tree each vertex has two links to other vertices,
one of which corresponds to appending a 0
bit to the bitstring that
identifies the vertex and the path to the vertex, and one of which
corresponds to adding a 1
bit to the bitstring.
In an immutable append only Merkle patricia dag vertices identified by bit
strings ending in a 0
bit have a third hash link, that links to a vertex whose
bit string is truncated back by zeroing the prior the 1
bit and removing any
0
bits following it. Thus, whereas in blockchain (Merkle chain) you need $n$
hashes to reach and prove data n
blocks back, in a immutable append
only Merkle patricia dag, you only need \bigcirc(\log_2n)
hashes to reach a
vertex of the blockdag n
blocks back.
The vertex 0010
has an extra link back to the vertex 000
, the
vertices 0100
and 010
have extra links back to the vertex 00
, the
vertices 1000
, 100
, and 10
have extra links back to the vertex 0
,
and so on and so forth.
This enables clients to reach any previous vertex through a chain of
hashes, and thus means that each new item in sequence is a hash of
all previous data in the tree. Each new item has a hash
commitment to all previous items.
The clients keep the old roots of the balanced binary trees of
blocks around, so the peers cannot sodomize them. This will matter
more and more as the blockchain gets bigger, and bigger, resulting
in ever fewer peers with ever greater power and ever more clients,
whose interests are apt to be different from those of ever the fewer
and ever greater and more powerful peers.
The superstructure of balanced binary Merkle trees allows us to
verify any part of it with only O(log)
hashes, and thus to verify that
one version of this data structure that one party is using is a later
version of the same data structure that another party is using.
This reduces the amount of trust that clients have to place in peers.
When the blockchain gets very large there will be rather few peers
and a great many clients, thus there will be a risk that the peers will
plot together to bugger the clients. This structure enables a client
to verify that any part of the blockchain is what his peer say it is, and thus avoids the risk that peer may tell different clients different
accounts of the consensus. Two clients can quickly verify that they
are on the same total order and total set of transactions, and that
any item that matters to them is part of this same total order and
total set.
When the chain becomes very big, sectors and disks will be failing
all the time, and we don't want such failures to bring everything to a
screaming halt. At present, such failures far too often force you to
reindex the blockchain, and redownload a large part of it, which
happens far too often and happens more and more as the
blockchain becomes enormous.
And, when the chain becomes very big, most people will be
operating clients, not peers, and they need to be able to ensure
that the peers are not lying to them.
### storage
We would like to represent an immutable append only data
structure by append only files, and by sql tables with sequential and
ever growing oids.
When we defined the key for a Merkle patricia tree, the key
definition gave us the parent node with a key field in the middle of
its chilren, infix order. For the tree depicted above, we want postfix order.
Normally, if the bitstring is a full field width, the vertex contains the
information we actually care about, while if the bitstring is less than
the field width, it just contains hashes ensuring the data is
immutable, that the past consensus has not been changed
underneath us, so, regardless of how the data is actually physically
stored on disk, these belong in differnt sql tables.
So, the oid of a vertex that has a full field width sized bitstring is
simply that bitstring, while the oid of its parent vertices is obtained
by appending 1
bits to pad the bitstring out to full field width, and
subtracting a count of the number of 1
bits in the original bitstring,
std::popcount
, which gives us sequential and ever increasing oids
for the parent vertices, if the leaf vertices, the vertices with full field
width bitstrings, are sequential and ever increasing..
Let us suppose the leaf nodes of the tree depicted above are fixed size c
, and the interior vertices are fixed size d
(d
is probably thirty two or sixty four bytes) and they are being physically stored in
memory or a file in sequence.
Let us suppose the leaf nodes are stored with the interior vertices
and are sequentially numbered.
Then the location of leaf node n
begins at $n\times c+\big(n-$std::popcount
(n)\times d\big)
(which unfortunately lacks a simple
relationship to the bitstring of a vertex corresponding to a complete
field, which is the field that represents the meaning that we actually
care about).
# Blockchain
A Merkle-patricia block chain represents an immutable past and a constantly changing present.
Which represents an immutable and ever growing sequence of transactions,
and also a large and mutable present state of the present database that
is the result of those transactions, the database of unspent transaction
outputs.
When we are assembling a new block, the records live in memory as native
format C++ objects. Upon a new block being finalized, they get written
to disk in key order, with implementation dependent offsets between
records and implementation dependent compression, which compression
likely reflects canonical form. Once written to disk, they are accessed
by native format records in memory, which access by bringing disk
records into memory in native format, but the least recently loaded
entry, or least recetly used entry, gets discarded. Even when we are
operating at larger scale than visa, a block representing five minutes
of transactions fits easily in memory.
Further, a patricia tree is a tree. But we want, when we have the Merkle
patricia tree representing registered names organized by names or the
Merkle-patricia tree represenging as yet unspent transaction outputs its
Merkle characteristic to represent a directed acyclic graph. If two
branches have the same hash, despite being at different positions and
depths in the tree, all their children will be identical. And we want to
take advantage of this in that block chain will be directed acyclic
graph, each block being a tree representing the state of the system at
that block commitment, but that tree points back into previous block
commitments for those parts of the state of the system that have not
changed. So the hash of the node in such a tree will identify, probably
through an OID, a record of the block it was a originally constructed
for, and its index in that tree.
A Merkle-patricia directed acyclic graph, Merkle-patricia dac, is a
Merkle dac, like a git repository or the block chain, with the patricia
key representing the path of hashes, and acting as index through that
chain of hashes to find the data that you want.
The key will thread through different computers under the control of
different people, thus providing a system of witness that the current
global consensus hash accurately reflects past global consensus hashes,
and that each entities version of the past agree with the version it
previously espoused.
This introduces some complications when a portion of the tree represents
a database table with more than one index.
Ethereum has a discussion and
definition of this
data structure.
Suppose, when the system is at scale, we have thousand trillion entries
in the public, readily accessible, and massively replicated part of the
blockchain. (I intend that every man and his dog will also have a
sidechain, every individual, every business. The individual will
normally not have his side chain publicly available, but in the event of
a dispute, may make a portion of it visible, so that certain of his
payments, an the invoice they were payments for, become visible to
others.)
In that case, a new transaction output is typically going to require
forty thirty two byte hashes, taking up about two kilobytes in total on
any one peer. And a single person to person payment is typicaly going to
take ten transaction outputs or so, taking twenty kilobytes in total on
any one peer. And this is going to be massively replicated by a few
hundred peers, taking about four megabytes in total.
(A single transaction will typically be much larger than this, because
it will mingle several person to person payments.
Right now you can get system with sixty four terabytes of hard disk,
thirty two gigabytes of ram, under six thousand, for south of a hundred
dollars per terabyte, so storing everything forever is going to cost
about a twentieth of a cent per person to person payment. And a single
such machine will be good to hold the whole blockchain for the first few
trillion person to person payments, good enough to handle paypal volumes
for a year.
“OK”, I hear you say. “And after the first few trillion transactions?”.
Well then, if we have a few trillion transactions a year, and only a few
hundred peers, then the clients of any one peer will be doing about ten
billion transactions a year. If he profits half a cent per transaction,
he is making about fifty million a year. He can buy a few more sixty
four terabyte computers every year.
The target peer machine we will write for will have thirty two gigabytes
of ram and sixty four terabytes of hard disk, but our software should
run fine on a small peer machine, four gigabytes of ram and two
terabytes of hard disk, until the crypto currency surpasses bitcoin.
# vertex identifiers
We need a canonical form for all data structures, the form which is
hashed, even if it is not convenient to use or manipulate the data in
that form on a particular machine with particular hardware and a
particular complier.
A patricia tree representation of a field and record of fields does
not gracefully represent variable sized records.
If we represented the bitstring that corresponds to the block
number, the block height, has having a large number of leading
zero bits, so that it corresponds to a sixty three bit integer (we need
the additional low order bit for operations translating the bitstring
to its representation as a key field or oid field) a fixed field of sixty
four bits will do us fine for a trillion years or so.
But I have an aesthetic objection to representing things that are not
fixed sized as fixed sized.
Therefore I am inclined to represent bit strings as count of bytes, a
byte string containing the zero padded bitstring, the bitstring being
byte aligned with the field boundary, and count of the distance in
bits between the right edge of the bitstring, and the right edge of
the field, that being the height of the interior vertex above the
leaf vertices containing the actual data that we are interested in, in
its representation as an sql table.
We do not hash the leading zero bytes of a bitstring that is part of
an integer field because we do not know, and do not care, how
many zero bytes there are. A particular machine running a program
compiled by a particular compiler will represent that integer with a
particular sufficiently large machine word, but the hash cannot
depend on the word size of a particular machine.
A particular machine will represent the bitstring that is part of an
integer field with a particular number of leading zero bytes,
depending on its hardware and its compiler, but this cannot be
allowed to affect the representation on the wire or the value of the hash.
If one peer represents the block number as a thirty two bit value,
and another peer as a sixty four bit value, and thus thinks the
bitstring has four more leading zero bytes than the former peer
does, this should have no effect. They should both get the same
hashes, because the preimage of our hash should be independent
of the number of leading zero bytes.
For integer fields such as the block number, we would like to
represent integers in a form independent of the computer word
size, so we do not know the alignment from start of field for a
bitstring that is part of an integer field, only the size of the byte
aligned bitstring, and how far the end of the bistring is from
the end of the integer field. Each particular peer executing the algorithm then applies as many leading zero bytes to the bitstring
as suits the way it represents integers.
The skip field of a link crossing a field boundary into an integer field
should not tell the machine following that link how many leading
zero bytes to add to the bitstring, but where the first non zero byte
of the bitstring is above the right edge of the integer, and the peer
interpreting that skip field will add as many leading zero bytes to
the bitstring as it finds handy for its hardware.
Some fields, notably text strings, do not have a definite right hand
boundary, representing the boundary inline. In that case, we
represent the vertex depth below the start of field, rather than the
vertex height above the end of field.
We always start walking the vertexes representing an immutable
append only Merkle patricia tree knowing the bitstring, so their
preimages do not need to contain a vertex bitstring, nor do their
links need to add bits to the bitstring, because all the bits added
or subtracted are implicit in the choice of branch to take, so those
links do not contain representations of skip field bit string either.
However, when passing blocks around, we do need to communicate
the bitstring of a block, and when passing a hash path, we do need
to communicate the bitstring of the root vertex of the path, and
many of the hashes will be interior to a block, and thus their links
do need bitstrings for their skip fields, and we will need to sign
those messages, thus need to hash them, so we need a canonical
hash of a bitstring, which requires a canonical representation of a bitstring. A bitstring lives in a field, so the position of the bitstring
relative to the field boundary needs a canonical representation,
though when we are walking the tree, this information is usually
implicit, so does not inherently need to present in the preimage of
the vertex. But, since we are putting byte aligned bitfields in byte
strings, we need to know where the bitstring of a skip field for a link
ends within the byte, which is most conveniently done by giving the
field alignment of the end of the bitstring within the field as part of
vertex skip fields - assuming there is a skip field, which there
frequently will not be.