wallet/docs/merkle_patricia-dac.md

1462 lines
71 KiB
Markdown
Raw Normal View History

---
title:
Merkle-patricia Dac
# katex
---
# Definition
## Merkle-patricia Trees
A Merkle-patricia tree is a way of hashing a map, an associative array,
such that the map can have stuff added to it, or removed from it,
without having to rehash the entire map, and such that one can prove a
subset of the map, such as a single mapping, is part of the whole map,
without needing to have to have the whole map present to construct the
hash.
Its practical application is constructing a global consensus on what
public keys have the right to control what digital assets (such as
crypto currencies, and globally defined human readable names) and
proving that everyone who matters agrees on ownership.
If a large group of peers, each peer acting on behalf of a large group
of clients each of whom have rights to a large number of digital assets,
agree on what public keys are entitled to control what digitial assets,
then presumably their clients also agree, or they would not use that
peer.
Thus, for example, we dont want the Certificate Authority to be able
to tell Bob that his public key is a public key whose corresponding
secret key is on his server, while at the same telling Carol that Bobs
public key is a public key whose corresponding secret key is in fact
controlled by the secret police.
The Merkle-patricia tree not only allows peers to form a consensus on an
enormous body of data, it allows clients to efficiently verify that any
quite small piece of data, any datum, is in accord with that consensus.
## Patricia trees
A patricia tree is a way of structuring a potentially very large list of
bitstrings sorted by bitstring, with a binary point implied somewhere in the
bistring, such that bitstrings can be added or deleted without resorting or
shifting the whole list of bitstrings.
A patricia tree defines a set of keys, the keys of its leaf nodes. Or perhaps
the keys _are_ its leaf nodes, all the information may well be in the strings
defined. The keys may be bounded on the left, but unbounded on the right,
for example strings, unbounded on the left but bounded on the right, for example
arbitrary precision integers, bounded both left and right, for example sixty
four bit, integers, or unbounded on either size, for example binary arbitrary
precision floating point.
If unbounded left, then the edge pointing at the root node had to give its
height above the binary point. The bitstring of the root node is always the
empty string, but where is that empty string positioned in height with respect
to the binary point?
We cannot reference nodes by bitstring in the canonical form, because the
number of leading zeroes in bitstring will change over time as the tree gets
deeper we have to represent nodes by their height plus the bitstring
starting at the first non zero bit, or by the key, which is the bitstring
with a one bit and several zero bits appended, to align the significance of
bits in different bitstrings, in which case again only the first non zero bit
matters.
When we give a chain of vertexes, starting at the root vertexes, the compact
representation of the location of the vertex in the tree is its vertical
position in the tree, plus the bitstring starting at the first nonzero bit. If
it its a tree of items with fixed right bound, items identified by their
integer sequence, then we give the height above the leaves, since this will
not change as the tree grows. If fixed left bound, for example names as utf8
strings, the depth from the root. If neither bound is fixed, a case we are
unlikely to have to deal with, the signed height or depth from some arbitrary
starting point which will never change as the tree grows. Thus we always
need to implicitly or explicitly define the bit alignment in the bitstring.
If a leaf in a patricia tree representing values with fixed right boud, for
example oids, the usual case, then the bitstring of a leaf is its oid, or its
oid minus one, which does not need leading zeroes.
We need to be able to represent a bitstring containing all zeroes, thus if a
bitstring contains any ones, we have to represent that one. The height,
therefore, is the alignment of the right edge of the bitstring, hence we can
leave out leading zeroes in the bitstring, and indeed must leave them out of
the canonical form of a tree representing values with no fixed left bound,
such as indefinite precision integers, so that the canonical form does not
change when a parent node is placed on top of a former root node
The key to a node, whether a vertex or a leaf, is the bitstring aligned by
padding it with a one bit, followed by as many zero bits as needed.
The total number of vertexes equals the twice the number of leaves
minus one. Each parent node has as its key the bit string, a sequence of
bits not necessarily aligned on bit boundaries, that both its children
have in common. This creates the substring problem for patricia trees
mapping keys that have no right bound, mapping variable length keys. We
cannot permit one key of the map to be the prefix of another key. If,
however, the key is is self delimiting, as with null terminated strings,
no key can be the prefix of another key, and this tends to be the usual
way that variable length values are used as map keys. There are wide
variety of too clever by half ways of dealing with prefix keys, but they
all involve messing up a rather elegant algorithm with considerable
complexity and surprising code paths in special cases and fencepost
cases. It is better just to not allow prefix keys, as for example by
having strings null terminated. If we really wanted to define arbitrary
bit strings as leaf keys of a patricia tree, which I doubt we will,
would be better to encode them in self delimiting format. No string in
self delimiting format can be the prefix of another string.
A Merkle-patricia dac is a patricia tree with binary radix (which is
the usual way patricia trees are implemented) where the hash of each
node depends on the hash and the skip of its two children; Which means
that each node contains proof of the entire state of all its descendant
nodes.
The skip of a branch is the bit string that differentiates its bit
string from its parent, with the first such bit excluded as it is
implied by being a left or right branch. This is often the empty
bitstring, which when mapped to a byte string for hashing purposes, maps
to the empty byte string.
It would often be considerably faster and more efficient to hash the
full bitstring, rather than the skip, and that may sometimes be not
merely OK, but required, but often we want the hash to depend only on
the data, and be independent of the metadata, as when the leaf index is
an arbitrary precision integer representing the global order of a
transaction, that is going to be constructed at some later time and
determined by a different authority.
Most of the time we will be using the tree to synchronize two t of
pending transactions, so though a count of the number of children of a vertex
or an edge is not logically part of a Merkle-patricia tree, it will make
synchronization considerably more efficient, since the peer that has the node
with fewer children wants information from the peer that has the node with
more children.
# Representation
The canonical form will not directly reflect the disk organization.
The canonical form of a sparse tree is that each vertex is represented by the
hash of its two children, and the bitstring of the offset of each child from
its parent, minus the leading bit of that bitstring. The root node, of
course, has an empty bitstring.
Often it will be more compact to transmit the child itself rather than the
hash, in reverse polish notation order, from which the hash can be generated.
To form the hash of a node, we need the hashes and relative bitstrings of its
children, but if we already have the children, identified by reverse polish
position in the stream or by their bitstrings relative to a common ancestor,
we dont need and should not represent the hashes, giving them implicitly
rather than explicitly.
Since we cannot hash a bitstring, only a bytestring, the bitstring will be
hashed in its representation as a bytecount represented by a variable
precision integer, followed by that many bytes, with the bitstring being
padded if needed to an integer number of bytes by adding a one bit followed
by as many zero bits as needed. Thus a twofiftysix bit byte string requires
a count of thirty three, plus thirty three bytes, the last byte being 0x80.
In memory as a ceeplusplus object, the bitstring may conveniently be
represented by an integer of at least sixty four bits, with the bitstring bit
aligned so that the significan bits in one bitstring line up with bits of the
same significance in another bitstring, and padded right with a one bit
followed by as many zero bits as needed. In the canonical form, however, the
left edge of the bitstring vertex identifier is the left edge of the
bitstring, and the length of the bitstring is the depth of the vertex from
the root. The left edge of a relative bitstring, identifying a child is one
bit to the right of the bitstring identifier of its parent. The childs
bitstring vertex identifier is the parent bitstring vertex identifier, plus a
zero bit for the left child and a one bit for the right child, plus the bits
of the relative bitstring. The canonical hash of the parent is the hash of
its left child, plus the relative bitstring of its left child, and similarly
for its right child.
Conceptually and canonically, it is equivalent to a patricia tree where
the children of a node are identified by hashes rather than pointers.
The hashes are taken over the canonical form, and are unaffected by
location in memory and the representation, which is not necessarily
canonical. If the actual representation is in a database, it is likely
to be represented in a way that makes recursive sql statements work.
Since we in practice cannot find the thing referred to by its hash, any actual
representation of the canonical form must contain additional information
telling us where to find the data referred to, but this additional information
is likely to vary from one situation to the next, and is not canonical.
For this canonical form to work as a direct representation, we would need a
universal way of finding the pre-image of any hash, which would be costly, and
would deny us some useful cryptographic capabilities where a party reveals a
pre-image and a hidden part of the Merkle-patricia tree typically when a
transaction goes bad, he would then make public the convesation leading to it.
Also, the tree will eventually grow enormous, and have numerous side chains
attached to it, in which case only the party or parties operating the side
chain can reverse their pre-images. 
But the forms actually used should be a representation of a Merkle
patricia tree with hashes and skip fields in place of pointers.
## Balanced binary trees of fixed height.
We will represent an immutable and ever growing data structure as a collection
of balanced binary trees, and a balanced binary tree of fixed height makes
much of the information in this representation redundant, which suggests that
it may be desirable to use a more efficient and direct canonical form to
ensure that the immutable append only data structure is canonically immutable
and append only.
The schere pointing at a balanced binary tree will say that it is a balanced
binary tree whose leaves are objects of a certain type (have a certain schema)
and give the height of the root, assuming that they all have the same schema.
If they have different schemas, the then the leaves will be of type schere.
The patricia bit string for each vertex of the balanced binary tree is
implicitly given by its position within the tree, so we do not represent it in
the canonical form, though we may well represent it in the actual
representation.
## Hashing
Hashing depends on the schema to hash the bitstream, has to parse
it into fields and records by the schema, and distinguish between index
nodes and record nodes, which are hashed and represented differently, a
record node being self contained, and index node depending on its
relationships.
In a dac, rather than a tree, an index node might be referenced by
multiple different entities, so in that case we want the hash to only
depend on the part of the key field that it governs, independent of the
parent part of its key field.
Further, a transaction is a group of records, and we want to represent a
transaction locally, so that its records are physically close together
in storage.
Which implies a transformation, that the canonical form, which knows
nothing about storage location, can have portions represented as a
position relative form, in which a group of records is kept in depth
first tree order, with the boundaries of the group having hashes linking
them to the outside, but internally, when converting back into canonical
form, we recalculate the hashes. For a given schema we might do this one
way in one context, or another way in another context, or have
subschemas.
Of course, tree order assumes we have a tree. In general, we have a dac,
not a tree, the most important case here being the tree of names, where
we are continually issuing new roots for the tree, but we dont want to
continually issue new leaves.
In the canonical form of the Merkle-patricia tree we act as if hashes
were reversible. Of course they are not, nor do we have a general
universal look up table for reversing them. Rather, you have to hit up a
server that can reverse the hashes you care about, which it may do by
looking up a ginormous hash table, or more likely do by having the
Merkle-patricia tree on disk or in memory in the ordinary patricial form
of links pointing at file relative or absolute locations on disk or in
memory.
# Blockchain
Of course we want more than this a Merkle-patricia block chain,
meaning *an immutable past and a constantly changing present*.
Which represents an immutable and ever growing sequence of transactions,
and also a large and mutable present state of the present database that
is the result of those transactions, the database of unspent transaction
outputs.
When we are assembling a new block, the records live in memory as native
format C++ objects. Upon a new block being finalized, they get written
to disk in key order, with implementation dependent offsets between
records and implementation dependent compression, which compression
likely reflects canonical form. Once written to disk, they are accessed
by native format records in memory, which access by bringing disk
records into memory in native format, but the least recently loaded
entry, or least recetly used entry, gets discarded. Even when we are
operating at larger scale than visa, a block representing five minutes
of transactions fits easily in memory.
Further, a patricia tree is a tree. But we want, when we have the Merkle
patricia tree representing registered names organized by names or the
Merkle-patricia tree represenging as yet unspent transaction outputs its
Merkle characteristic to represent a directed acyclic graph. If two
branches have the same hash, despite being at different positions and
depths in the tree, all their children will be identical. And we want to
take advantage of this in that block chain will be directed acyclic
graph, each block being a tree representing the state of the system at
that block commitment, but that tree points back into previous block
commitments for those parts of the state of the system that have not
changed. So the hash of the node in such a tree will identify, probably
through an OID, a record of the block it was a originally constructed
for, and its index in that tree.
A Merkle-patricia directed acyclic graph, Merkle-patricia dac, is a
Merkle dac, like a git repository or the block chain, with the patricia
key representing the path of hashes, and acting as index through that
chain of hashes to find the data that you want. 
The key will thread through different computers under the control of
different people, thus providing a system of witness that the current
global consensus hash accurately reflects past global consensus hashes,
and that each entities version of the past agree with the version it
previously espoused.
This introduces some complications when a portion of the tree represents
a database table with more than one index. 
[Ethereum has a discussion and
definition](https://github.com/ethereum/wiki/wiki/Patricia-Tree) of this
data structure.
Suppose, when the system is at scale, we have thousand trillion entries
in the public, readily accessible, and massively replicated part of the
blockchain. (I intend that every man and his dog will also have a
sidechain, every individual, every business. The individual will
normally not have his side chain publicly available, but in the event of
a dispute, may make a portion of it visible, so that certain of his
payments, an the invoice they were payments for, become visible to
others.)
In that case, a new transaction output is typically going to require
forty thirty two byte hashes, taking up about two kilobytes in total on
any one peer. And a single person to person payment is typicaly going to
take ten transaction outputs or so, taking twenty kilobytes in total on
any one peer. And this is going to be massively replicated by a few
hundred peers, taking about four megabytes in total.
(A single transaction will typically be much larger than this, because
it will mingle several person to person payments.
Right now you can get system with sixty four terabytes of hard disk,
thirty two gigabytes of ram, under six thousand, for south of a hundred
dollars per terabyte, so storing everything forever is going to cost
about a twentieth of a cent per person to person payment.  And a single
such machine will be good to hold the whole blockchain for the first few
trillion person to person payments, good enough to handle paypal volumes
for a year.
“OK”, I hear you say. “And after the first few trillion transactions?”.
Well then, if we have a few trillion transactions a year, and only a few
hundred peers, then the clients of any one peer will be doing about ten
billion transactions a year. If he profits half a cent per transaction,
he is making about fifty million a year. He can buy a few more sixty
four terabyte computers every year.
The target peer machine we will write for will have thirty two gigabytes
of ram and sixty four terabytes of hard disk, but our software should
run fine on a small peer machine, four gigabytes of ram and two
terabytes of hard disk, until the crypto currency surpasses bitcoin.
Because we will employ fixed size transaction units larger currency
amounts will be broken into tens, twenties, fifties, hundreds, two
hundreds, five hundreds, thousands, two thousands and so forth, and
because we will be using a blockchain in the form of a Merkle-patricia
dac, our transactions will tak up several times as much space a similar
bitcoin transaction, and currently bitcoin transactions take up several
hundred megabytes. But this is OK, because the Merkle-patricia dac gives
client wallets far more power than on the bitcoin system, so we can get
by with far fewer peer wallets and far more client wallets.
------------------------------------------------------------------------
# Packed Form
Assume we have an ordered sequence of records, as if the result of a
database query with the index fields first.
Suppose we have a potentially very large sequence of records, and we
want to generate the Merkle-patricia hash, so that we can generate an
efficient proof that one and only one record with a certain prefix
appears in this pile.
We want to convert from sequence of records form to patricia form.
If we look at the difference between each pair of records, we get an
index node, which is the position of the first bit at which they differ.
Which is a patricia tree expressed in infix order. We want to convert
infix order to postfix order, a sequence of records interleaved with a
sequence of bit positions at which two records differ.
Or equivalently, hash them as if we already had them in postfix order. 
Suppose we want to output self delimiting records, interleaved with
postfix indicators of the difference position. We want the least index
output last, so that when we do a second sequential pass, hashing each
record when we encounter a record, and putting that has on the stack,
and when we encounter and index, hashing two hashes from the stack with
the index, and put the resulting hash on the stack.
So, we find the difference position between the current record and the
next, then we output the current record, and make the next record the
current record. If the difference position is less than the difference
position on the stack, we output difference positions from the stack
until the difference position on the stack is greater that the current
difference position, meaning the node represented by the difference
position on the stack is a child node of the current node. We then put
the current difference position on the stack, and repeat for the next
record.
A similar algorithm simply generates the hash as if we had the literal
nodes.
A full path to a leaf node, proving that the leaf node is represented in
the tree, contains not the hashes of the things in the path, but the off
path hashes and off path keys.
An incomplete tree has the same data structure as full tree, but with
missing nodes. A full path is a more compact representation of an
incomplete tree, and is treated as a compressed form of an incomplete
tree with a single leaf node present. You can regenerate an incomplete
tree from a full path, and a full path can be generated for any non
missing leaf node in an incomplete tree.
Rather than a chain of blocks, we have a Merkle-patricia dac of blocks,
where the index is the block number. This means that the state of any
block can be proved to be part of the global consensus with a proof of
length logarithmic in the total block number. Thus peers can provide
clients with short proofs, so that clients do not have to take
assertions by peers on trust.
We also have a small number of Merkle-patricia dacs representing the
state of the system at any given block time, for example a Merkle
patricia dac of unspent transaction outputs, and a trie linking globally
unique human readable names to probabilistically unique public keys.
These trees change each new block, though their state in past blocks is
immutable.  Each new block contains not the entire new Merkle-patricia
dac, which is apt to be enormous, but only those nodes that have
changed.  A new block contains the new roots of the new Merkle-patricia
dacs and their new descendants, which link to unchanged descendants in
past blocks.
Peers synchronize their state by sharing new information to form a new
block. They efficiently discover what they have in common, and what is
new, by sharing the root of the Merkle-patricia dac describing the new
block, and then give each other the new information, after the fashion
of usenet news. 
A new block is not just a list of new events, but it is generated from a
list of new events, which are themselves listed in a Merkle-patricia
dac. To compare nodes in the tree of new events, a peer sends its
neighbours an offset into the hash, the leading part of the key of the
node, a one bit flag to indicate if the node has children, a hundred and
twenty eight bits of the hash of the node, and for each of its two
children, the leading parts of their keys,a one bit flag to indicate if
the node has children, and sixty bits of their hash, and for each of the
four grandchilden, the leading parts of their keys, a one bit flag to
indicate if the node has children, twenty eight bits of hash, for each
of the eight great grandchildren, the leading parts of their keys, a one
bit flag to indicate if the node has children, and thirteen bits of
hash, for each of the sixteen great great grandchildren, the leading
parts of their keys, a one bit flag to indicate if the node has
children, and six bits of hash, for each of the thirty two great great
great grandchildren, the leading parts of their keys, a one bit flag to
indicate if the node has children, and three bits of hash, and for each
of the sixty four great great great great grandchildren, the leading
parts of their keys, a one bit flag to indicate if the node has
children, and two bits of hash.
This tells it which subtrees are definitely different, and which are
definitely new, and for each subtree definitely different, it sends more
comparison data for that node, and for each subtree definitely missing,
it sends that subtree.  Once there are no more subtrees to be sent,
repeats the process starting at the root once again.
When it has some information about which subtrees are definitely
missing, which are probably the same, and which are definitely
different, it then sends much the same, but for each one missing, sends
the full subtree, for each probably the same, skips, for each definitely
different, doubles the level of detail repeats with a different part
of the hash, and twice as many bits.
So now we need more than a one bit flag. Need to distinguish between the
cases:
1. no children
2. Sending a full item a leaf node.
3. skipping a child subtree because we are likely in agreement
4. dropping down to a lower level of detail, for example from thirteen
bits of hash to six bits of hash. If down to two bits of hash,
always going to leave out the children.
5. not dropping down to a lower level of detail, keeping the same
number of bits in the hash.
The intent is to discover what parts of the tree we have agreement on,
and send an image of the tree with those parts skipped over.  Rinse and
repeat. We annotate our model of the tree with the probability that a
subtree is identical. If the other guy sent us a leaf node, we know the
node is identical, and every hash fragment that agrees creates
exponential probability that a subtree is identical.  If we recently got
a leaf node from source, not from sharing, we know for sure that the
other guy does not have it, and send it to him.
Ok, that covers information gathering, but what about the final stages,
when we are going to throw out data that was late in coming, for the
sake of consensus?
A proposed final hash of all items to be in a block for a certain period
is announced. And now, the job is to get those items that are missing,
and tag those items that the are not in the proposed final hash, and
exclude them from a version of the tree. So instead of “definitely not
present in the other guys hash” means you send the guy your item, it
now means you exclude it, and see if that gets your root hash to agree.
A peer in good standing endorses the proposed final hash, the root of
the block if it can get its trees to agree
When building a block, the peers share these new events. When coming to
a consensus, the peers attempt go get agreement on the new events in the
block. But the block will also contain diffs on the Merkle-patricia dac
of unspent transaction outputs, and the Merkle-patricia dac of spent
transaction inputs. The peers need to maintain these trees so that
clients can see proof of consensus on the tries, so that a peer cannot
mislead a client, and a peer should only vote for a consensus if it can
generate the same root hash for the the block thus has the same tries
describing the entire block chain and providing access to a block chain
for clients.
After agreement on Merkle-patricia dac of all new items for the new
block, a peer generates the other revisions of the other Merkle-patricia
dacs, and add them to the block, the generated items that go into the
block, and you should get final agreement. If a peer gets final
agreement all block items are valid, then it votes for the consensus. 
The block contains the root of a Merkle-patricia dac that contains all
previous blocks (but not the current block) thus not so much a block
chain, as a block trie, which means that the proof of any fact about the
state of the block trie is reasonably short, of order log N hashes,
where N is the number of items in the block trie. 
# Node identifiers
We will call the thing that identifies a node a node infix order, and
the member of the subset that the partricia tree identifies a key -
because we are generally using it as a map key. Part of the map key is
part of the node infix order.
But the canonical form of the map key is a bitstring, which is what we are going to hash.
The node infix order is the representation of the bitstring with significant
bits in the different bitstrings aligned, padded on the right with a one bit
and as many zero bits as needed, and for a tree of quantities unbounded on the
left, padded with a one bit. and as many zero bits as needed on the left.
In practice all integers have some finite bound, but one does not want
the computer word length to affect the data on the wire or the result of
the hash, so it usually preferable to structure the data and algorithms
so that the actual left bound has no effect provided it is sufficiently
large, as if the integers had unbounded precision. We will not actually
need numbers larger than 2^64^ until Anno Domini ten thousand or so, but
the when we do, it will have no effect on blockchain format or previous
hash results, hence for integers we will use a tree with right bound but
no left bound.
But a patricia tree is bit string oriented. So for integer indexes, in
order that we can ascertain which nodes correspond to the leaf nodes,
need to have, associated with the root node, the the length of the bit
string for the leaf node. The length needs to be run time value, rather
than a compile time value. But for hashes, can be a compile time value.
To map from bit strings to byte strings, and to have a bit string index
for leaf nodes and tree nodes, we either append 10000\... or 011111\...
to bitstring, to get a one to one map from byte strings to bit strings,
and from bit strings to integers.
Compiler intrinsics are generally ffs. For Microsoft compilers use
`_BitScanForward` & `_BitScanReverse`. For GCC use `__builtin_ffs`,
`__builtin_clz`, `__builtin_ctz`.([find first
set](https://infogalactic.com/info/Find_first_set)) or ctz (count
trailing zeros) so is probably faster to represent a bit string as a
word string by appending 100000\... than 0111111\.... See [Microsoft
`__lzcnt()`](https://docs.microsoft.com/en-us/cpp/intrinsics/lzcnt16-lzcnt-lzcnt64?view=vs-2019)
and [gcc
`__builtin_clz()`.](https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html)
. `BitScanReverse` is portable between processers, and lzcnt is not, so
you need a runtime check at the start of program to see if your code can
run.
If the leaf nodes correspond to the integers 1 to N, where N is at most
m bits long, then the bit strings of the leaf nodes are m+1 bits,
becaust we have 2N-1 vertexes in the tree, and 2N edges.
# Sparse or sequential, complete or partial, Merkle-patricia trees.
## Sparse and complete
As, for example, a Merkle-patricia tree of a map that maps globally
unique human readable and writeable names to public keys.
A patricia tree with no right bound on its keys is necessarily sparse.
Thus, for example, in a patricia tree of strings terminated by a null
byte, the null byte ensures a gap between the address spaces governned
by two successive keys.
If a node in the tree has direct children (no skips) or the leaf nodes
are sequential and contiguous its has its hash is the hash of the hashes
of children. If the tree is sparse part of what we know is the gaps,
hence the parent node has to hash the skip links. If the tree is
sequential and contiguous, then the root node has to directly hash its
direct descendants, and also the size of the tree.
A block is a sparse tree, while a blockchain is a sequential tree of
sparse trees. The sparse tree in a block will itself contain sequential
trees, and the root hashes of numerous sequential trees.
In a sequential Merkle-patricia tree, the hash of the parent node is
simply the hash of the hashes of its two child nodes, but in a sparse
tree, we want two maps from keys to objects with the same sequence of
the same objects, but different key values, to have different hashes, so
hash of a parent node has to be hash of the hashes of its two child
nodes, plus the hash of the portions of each childs key that govern
the skip links to those two children, that portion of each childs key.
The portion of the childs key that govern the skip link is (Level
difference -1) bits long.
But computers do not handle bit fields easily, and databases do not
handle them at all, plus, how do you hash a bit string, such as the leaf
indicator?
Any variable length field creates ambiguity in the hash, so that two
values could be hashed as the same stream of bytes. To avoid this
outcome, the canonical format for a bit string will be an integer in
stream format specifying the number of bytes, followed by the bit
string, followed by a zero bit, followed by enough one bits to fill to
the next byte boundary. If the representation of a bit string as an
integral number of bytes has some 0x00 bytes at the end, it is not in
canonical format and gets truncated before hashing till it no longer has
any 0xff trailing bytes.
If a bit string has a start position that is not necessarily byte
aligned, and is known from context, we left pad it with zeroes. If its
start position is not known from context, we provide the starting bit
position as an integer in stream format. At this point in the code we
are deep in the bitbashing weeds, and are no longer worried about
passing the bit string around as a regular byte string.
Assuming no prefix problem, one way or another way, then the index of a
node can be two fields, the bit string, and the number of bits in the
bit string. But since we already have to represent the number of bytes
in the field representing the bit string, we might use the 01111..
canonical format trick, so that we can use the more familiar, standard,
and convenient infix order of byte string, in which case we will have to
pad the map key to form the node infix order.
However we do this, it is an implementation detail that should not
affect the canonical form or the root hash, and the appended 0111\...
form simplifies fencepost problems on interpreting what is on the wire.
Otherwise we are always going to be bothered by distinguishing the bit
string 0101000 from the bit string 0101. One less `if` to screw up.
Since a sequential Merkle-patricia tree always maps the integers from
zero to n-1 to the objects, hashing link information is redundant, we do
not actually need any link information. It is sufficient that the root
hash defines the objects and the sequence.
## Sequential and complete
The tree bears a simple and natural relationship to a linear vector of
leaves and a linear vector, one smaller, of nodes.
Each node appears with a position in the linear vector one less than the
position of the leaf that required it to be added to the tree.
Size eight Merkle-patricia tree by:
- patricia bitstring
- right padded patricia bitstring (key)
::: {style="text-align: center;"}
“”\
1000\
::: {.clear}
:::
::: {style="float: left; text-align: center; width: 45%;"}
“0”\
0100\
::: {.clear}
:::
::: {style="float: left; text-align: center; width: 45%;"}
“00”\
0010\
::: {.clear}
:::
::: {style="float: left; text-align: center; width: 45%;"}
"000"\
0001\
:::
::: {style="float: right; text-align: center; width: 45%;"}
"001"\
0001\
:::
:::
::: {style="float: right; text-align: center; width: 45%;"}
"01"\
0110\
::: {.clear}
:::
::: {style="float: left; text-align: center; width: 45%;"}
"010"\
0101\
:::
::: {style="float: right; text-align: center; width: 45%;"}
"011"\
0111\
:::
:::
:::
::: {style="float: right; text-align: center; width: 45%;"}
"1"\
1100\
::: {.clear}
:::
::: {style="float: left; text-align: center; width: 45%;"}
"10"\
1010\
::: {.clear}
:::
::: {style="float: left; text-align: center; width: 45%;"}
"100"\
1001\
:::
::: {style="float: right; text-align: center; width: 45%;"}
"101"\
1011\
:::
:::
::: {style="float: right; text-align: center; width: 45%;"}
"11"\
1110\
::: {.clear}
:::
::: {style="float: left; text-align: center; width: 45%;"}
"110"\
1101\
:::
::: {style="float: right; text-align: center; width: 45%;"}
"111"\
1111\
:::
:::
:::
:::
::: {.clear}
:::
\
\
\
\
\
\
\
\
\
\
We want to collapse the tree into a linear
list, so that we can find the correct node without walking one bit at a
time through a binary tree, both in order to represent relatively small
blocks of hashes, and also to represent the top of very large trees.
We extend each bit string to a fixed and uniform size by appending a one
bit, followed by as many one zero bits as are necessary fill to the standard
size, so that we can use uniform length bit strings, instead of variable sized
bit strings, so that we can use them directly to access an array in memory,
or OIDs in a database.
The resulting padded bit strings (keys) are in infix order. But for an immutable append only file or database, we want postfix order, which is
harder. And we don't want to be restricted to only having a power of two objects. We want to be have an arbitrary number of objects, and add an arbitrary number of objects without changing the existing tree.
A binary postfix tree, power of two, is going to look like this:
For an append only structure, the position of a leaf node is number of prior
leaf vertexes, plus number of prior internal vertexes, and for an sql append
only database, the oid of a leaf node is number or prior leaf nodes, plus one.
So a leaf oid is:
$$\displaystyle\frac{key}{2}+1$$
where $key$ is the patricia bitstring right padded to a fixed size by a one
bit, followed by as many zero bits as needed.
Let $\displaystyle{C_{key}}$ be the count of bits in the key. (which in C++
is `bitset.count()`,
But C++ provides no access to cool intrinsic assembly instructions.)
The number of internal vertices prior to a leaf is
$$\displaystyle{\frac{key}{2}+1-C_{key}}$$
So the leaf position in postfix order is:
$$\displaystyle{size_{leaf}*\frac{key}{2}+size_{vertex}*({\frac{key}{2}+1-C_{key}})}$$
Let the height of a vertex be $h_{key}$, the number of trailing zeroes in the key.
So the postfix vertex position, supposing vertexes are in a separate data structure, is
$\displaystyle{size_{vertex}*\frac{key}{2}+2^{h_{key}}-C_{key}}$\
Is that right?\
Need to check it.
I think we could also get the postfix vertex oid by
$\displaystyle{\frac{{(key-1)} | {key}}{2}+1-C_{key}}$, but again, needs checking.
Given the vertex oid and the leaf oid, the absolute position in the direct immutable append only file is easy to calculate.
Everyone seems to wind up using regular [C bit twiddling hacks], because
hardware intrinsics are erratically available, and because the efficiency
improvement of hardware intrinsics is seldom worth the thought.
[C bit twiddling hacks]:
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64
"Bit Twiddling Hacks"
[uint64_t bitcount(uint64_t c)](../bit_hacks.h)
To do the reverse operation, finding the key (the padded patricia index)
from the postfix position make the starting guess that the $C_{key}$ adjustment
was zero, find the corresponding patricia key, and then walk the tree from
you where guessed that you were are, to where you should be. You find
the predicted postfix position of your guess, find the order of the highest
order bit where they differ, and walk the postfix position and padded
patricia key (infix position) in parallel.
## Adding to a sequential and complete Merkle-patricia tree
Well that solves the problem of a postfix tree, but how do we apply this to
solve the problem of the number of items not being a power of two?
### A sequential append only collection of postfix binary trees
<svg
xmlns="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
width="29em" height="17em"
viewBox="8 209 164 102"
style="background-color:ivory" stroke-width=".6" stroke-linecap="round">
<g font-family="'Times New Roman'" font-size="6" font-weight="400" fill-rule="evenodd">
<g id="height_3_tree" fill="none" >
<path stroke="#b00000"
d="
M71.36 234.686s2.145-.873 3.102 0c1.426 1.303 14.645 21.829 16.933 23.136 1.302.745 4.496.45 5-2.3
M145.916 220c0-.93.124-.992.992-1.364.869-.373 2.42-.373 3.04.558.62.93-2.852-4.94 18.607 38.394.715 1.443 2.348 1.186 4-2
M147.218 218.5c1.303-.124 1.675.062 2.11.93.434.868.558 3.846.558 3.846-.25 2.496.31 3.597-1.365 19.166-1.675 15.568-1.54 21.825-.744 24.872.744 3.853 3.0 2.853 5.2 .295
M71.36 234.686c2.42-.434 2.916-.93 6.079-.186 3.163.745 4.466 1.551 12.715 5.52 8.25 3.97 37.774 3.66 41.31 2.606C134.999 241.57 136 240 137 239
M71.36 234.686s1.551-.558 2.171.186c.62.745 2.481 4.528 1.8 10.545-.683 6.016-2.854 20.719-2.854 22.577 0 2.171 1.116 2.482 2.543 1.8C76.447 269.11 76 268 77 264"/>
<path stroke="#80e080"
d="
M10 253c-3.536 5.335-4.29 7.765-4.466 12.095-.139 3.405.677 6 3.3 2.094
M18 245c1.799-1.55 20.903-7.319 35.603-10.855 14.7-3.535 18.918-7.69 52.35-8.621 33.432-.93 36.037-.869 38-3
M21 248c2.17-.372 6.744-1.343 10.792-1.736 5.858-.57 18.925-2.228 29-4.094 3.36-.728 7.673-1.492 7.618-2.812
M16.963 253.232c-.0 0 3 4 5.2 4.2 1.7 .7 4.093 .6 7-2.3"/>
<path stroke="#000000"
d="
M70.077 236c-.162-6.288 74.008-3.04 76-12
m-7.417 12c.161-6.127 9.306-6.425 8.5-12"/>
<g id="height_2_tree">
<path stroke="#b00000"
d="
M31.3 250.83c1.054-.372 2.046-.744 2.357.31.31 1.055-.571 11.044.682 15.569C36.013 272.766 38 269.675 39 266"/>
<path stroke="#000000"
d="
M29.767 252c1.774-4 38.858-5.904 39.18-12
M60 252c0-4.875 10.41-6.871 11.205-12"/>
<g id="height_1_tree">
<path
style="stroke:#B00000"
d="m 10,264.1 c 1,-2.2 3.2,-2 3.85,-0.4 C 15,267 20,273 21,266"
id="prev_leaf_link"/>
<path stroke="#000"
d="M10.09 264c0-4 18.868-.062 19.174-8.062M21.866 264c0-3.008 8.893-1.544 9.513-8"/>
<g id="leaf_vertex" >
<g style="stroke:#000000;">
<path
d="M 11.7,265 8,271
M 11.7,265 9.5,271
M 11.7,265 11,271"
id="path1024" />
</g>
<rect id="merkle_vertex" width="4" height="4" x="8" y="264" fill="#00f"/>
</g>
<use width="100%" height="100%" transform="translate(12)" xlink:href="#leaf_vertex"/>
<use width="100%" height="100%" transform="translate(20 -12)" xlink:href="#merkle_vertex"/>
</g>
<use width="100%" height="100%" transform="translate(30)" xlink:href="#height_1_tree"/>
<use width="100%" height="100%" transform="translate(60 -28)" xlink:href="#merkle_vertex"/>
</g>
<g width="100%" height="100%" >
<use transform="translate(68)" xlink:href="#height_2_tree"/>
<use transform="translate(136 -44)" xlink:href="#merkle_vertex"/>
<use transform="translate(144)" xlink:href="#height_1_tree"/>
</g>
</g>
<g id="blockchain_id" >
<ellipse cx="14" cy="249" fill="#80e080" rx="8" ry="5"/>
<text>
<tspan x="11.08" y="251.265">id</tspan>
</text>
</g>
<rect width="168" height=".4" x="8" y="276" fill="#000"/>
<text y="278">
<tspan dy="8" x="6" >Immutable append only file as a Merkle chain</tspan>
</text>
<use transform="translate(0,50)" xlink:href="#blockchain_id"/>
<path
style="fill:none;stroke:#80e080;"
d="m 18,297 c 4,-6 4,-6 5.6,3 C 25,305 28,304 28.5,300"/>
<g id="4_leaf_links">
<g id="2_leaf_links">
<g id="leaf_link">
<path
style="fill:none;stroke:#000000;"
d="m 29,299 c 4,-6 4,-6 5.6,3 C 35,305 38,304 38.5,300"/>
<use transform="translate(20,33)" xlink:href="#leaf_vertex"/>
</g>
<use transform="translate(10,0)" xlink:href="#leaf_link"/>
</g>
<use transform="translate(20,0)" xlink:href="#2_leaf_links"/>
</g>
<use transform="translate(40,0)" xlink:href="#4_leaf_links"/>
<use transform="translate(80,0)" xlink:href="#4_leaf_links"/>
<use transform="translate(140,33)" xlink:href="#leaf_vertex"/>
<text y="208">
<tspan dy="8" x="6" >Immutable append only file as a collection of</tspan>
<tspan dy="8" x="6" >balanced binary Merkle trees</tspan>
<tspan dy="8" x="6" >in postfix order</tspan>
</text>
</g>
</svg>
The superstructure of balanced binary Merkle trees allows us to verify any
part of it with only $O(log)$ hashes, and thus to verify that one version of
this data structure that one party is using is a later version of the same data
structure that another party is using.
This reduces the amount of trust that clients have to place in peers. When
the blockchain gets very large there will be rather few peers and a great
many peers, thus there will be a risk that the peers will plot together to
bugger the clients. This structure enables a client to verify that any part of
the blockchain is what his peer say it is, and thus avoids the risk that peer
may tell different clients different accounts of the consensus. Two clients
can quickly verify that they are on the same total order and total set of
transactions.
Edges of the graph are represented by hashes, and thus can only travelled
from right to left, Vertices are represented by their hash, and in their
canonical form contain child hashes and their full padded Merkle patricia
key within a tree as an arbitrary precision integer, and thus their implicit
postfix position within a tree, which identities provide implicit edges that
can be traversed in any direction with respect to a given total order, while
hash edges are agnostic of total order, and are what we construct the
consensus about total order from.
The bottom most part of the structure consists of data structures that do
not have a canonical order. But when we are figuring out how to order
them, we have to construct vertices on top of them that do have a
canonical order, where each vertex contains a hash commitment to a total
past in total order and a patricia key representing its position in the total
order.
When the chain becomes very big, sectors and disks will be failing all the
time, and we don't want such failures to bring everything to a screaming halt.
And, when the chain becomes very big, most people will be operating
clients, not peers, and they need to be able to ensure that the peers are not
lying to them.
If our initial tree has has a size of zero, this is the same as creating
a sequential and complete Merkle-patricia tree.
Since the whole point of a Merkle tree is immutable entities, we seldom
want to insert, delete, or update anything but the right hand edge of a
sequential Merkle-patricia tree, and normally only want to insert on the
right hand edge.
Thus a sequential Merkle-patricia tree is not exactly a block chain,
since each block does not contain the hash of the previous block. If it
did, you would potentially have to receive and calculate a lot of hashes
of hashes. to ascertain that block one thousand did indeed chain to
block five hundred. This structure means that clients can calculate the
validity of the block chain for those parts of it that contain
transactions that concern them, and know that everyone else doing
similar calculations are getting results that show the same consensus as
they are getting, thus calculating the validity of block chain is
distributed to all clients without all clients needing to deal with the
entire block chain.
If all the peers get together to screw over one client or few clients,
that client is going to have cryptographic proof of misconduct. On
publishing that proof, large numbers of people are likely to blacklist
those peers, resulting in a fork in the blockchain. We could automate
this process, with everyone automatically disregarding the signatures of
peers for which a proof existed that they had changed the rules in a way
inconsistent with the rules implemented in a client, so that if nine
tenths of the peers change their software, and nine tenths of the
clients do not, we automatically get a fork with nine tenths of the
clients on one block chain with a tenth of the peers, and one tenth of
the clients on the blockchain with nine tenths of the peers.
This architecture allows a client peer arrangement where to pervert the
blockchain, you have to synchronously pervert everyones software
everywhere, or most peoples software, whereas with bitcoin, a few big
miners can pervert the blockchain.
Bitcoin will fail because power over the blockchain lies with a few big
miners, and governments will eventually twist their arms, or themselves
get into the business of mining. This is inherent in the scaling
problems of bitcoin. Back when everyone was miner, and everyone had the
whole blockchain on their machine, power was distributed in a way that
ensured that the blockchain was conducted according to consensus rules,
but as the blockchain gets bigger and bigger, fewer and fewer people
host the complete blockchain, and fewer and fewer people mine. So we are
back to the situation we wanted to escape, where a rather small number
of people have power over other peoples money.
A client has partial trees for all his transactions in the blockchain,
and if all clients check their own particular part of the block chain,
the entire blockchain is checked.
Well this is a great layout if we have data structure that fits entirely
in memory, and it is a great layout if we have an enormous mutable file
and are appending some bits to it, and have log n chunks mapped into
memory leading to the part where we are appending stuff.
But if we are frequently constructing partial hash trees here and there,
the trouble is we log n non localbits things to look up. If we are
putting a big Merkle-patricia tree in a database, better to infix order
by the size of the bit string, then the content of the bit string filled
out to the long word size with infix order bits, or if it does not fit
in a long word, as a blob. (sqlite sorts nulls first, then integers,
then strings, then blobs, and if you write something as an integer and
read it as a blob, gets converted a string representing a decimal)
Suppose we want a literally immutable append only file, representing a
sequential patricia Mekle tree, where we append each completed binary
tree?
Well in that case the high nodes have to follow each completed subtree,
and our postfix order would be:
| | | | | | | | |
|---:|---:|---:|---:|---:|---:|---:|---:|
|000|001|010|011|100|101|110|111| |
||00||01||10||11|
||||0||||1|
||||||||“”|
Thus, to construct the postfix order for the leaf position in the tree
using right side nodes, leaf `n` has postfix order
`(n<<1)-std::bitset<64>(n).count()`. Which is considerably more
complicated than the infix order for leaves with nodes uniformly
intermixed with them.
Further, the not a power of two case is more complicated. To figure out
which nodes at the right side are skipped, have to have the infix order
of the node. Then we pack all the non skipped right hand nodes following
the last leaf node. There is no simple way to identify a skipped node
from the postfix order.
But the postfix order has the huge advantage for enormous data
structures that when you are constructing, access is sequential and
forwards only, and if you are making an ever larger file, the additions
are append only.
But this is essentially a plan to build our own database, presumably
with a second file containing a copy of only higher nodes, and a third
file containing a copy of higher nodes still, until one gets down to
copy of the high nodes that fits entirely within memory. X64 has 64 byte
blocks in its top level cache, and can virtual seventy terabyte files
into its copious address space. 4kilobyte blocks for disk access tend to
be fastest but 64 kilobyte blocks are only marginally slower, though in
some algorithms they waste more memory. Microsoft recommends 64 kilobyte
disk blocks for servers. This suggests a structure in which every time
we have a new two byte word in the bitstring, we have a different offset
corresponding to a different area on disk and memory, a hierarchy of
files each 256 or 65536 times smaller than the other.
A radix 256 Merkle-patricia tree on top of a radix 2 Merkle-patricia
tree would be referencing sixteen kilobyte blocks, which sounds like it
is near the sweet spot, in which case the cache file for the equivalent
of the bitcoin blockchain will fit into a gigabyte of ram.
It is premature to think of designing this. After two years at bitcoin
volumes, our blockchain will be two hundred gigabytes, at which point we
might want to think of a custom format for sequential patricia trees as
a collection of immutable files, append only files which grow into
immutable files, and wal files that get idempotently accumulated into
the append only files.
For the first level nodes, the ones directly above the leaves, the
postfix order is 000🡒00010, 001🡒00101, 010🡒01001, 011🡒01100, 100🡒01001,
101🡒10100, 110🡒11000, 111🡒11011. We append two zeroes, subtract the
count of the bits of the bitstring, and add two.
Or equivalently, we take the infix order of first level node, subtract
the count of the bits of the infix order, and add two.
For the second level nodes 00🡒00110, 01🡒01101, 10🡒10101, 11🡒11100, we
append three zeroes, subtract the count of the bits, and add six to get
the postfix order.
For the third level nodes 0🡒01110,1🡒11101. We append four zeroes, add
fourteen, and substract the count of the bits to get the postfix order.
For the fourth level node, null🡒11110, we append five zeroes, add
thirty, and presumably subtract the number of bits to get the postfix
order.
The general formula appears to be add 2\^(level+1)-2-count)
Reversing the postfix order to get the bitstring and the level seems
rather hard. You have to truncate, then find `x: (a+x).count()=x`. And
since there is no clean and elegant way of finding `count()`) it is not
likely that there is a clean and elegant way of finding `x.` But it is
very easy, given the level, to find the parent, the children, and the
sibling, since these are at fixed offsets. Looks like when iterating
through structures in postfix order, you have to keep the level and the
bitstring implicitly or explicitly around, whereas with infix order
there is a clean and simple relationship between the infix order and
level and bitstring.
To reconstruct the bitstring from the postfix order, the fastest way is
probably to construct it one bit at a time by conceptually walking the
tree from the root until we match the postfix order, not recalculating
count every time, but incrementing it every time we move right in the
tree, and representing the level by the power of two that we will add to
the bitstring when we move right.
## incomplete.
We take a list of hashes and their node infix orders, the offset for
sequential trees, and the bit strings for left trees, and stuff them
into map mapping node ids to hashes, using a map that allows random and
sequential access. The representation of sparse and incomplete tree is
similar to the representation of a sequential and incomplete vector. We
provide the nodes that allow the construction of the chain of hashes
from the object to be authenticated, but we do not provide their
children.
## Sparse.
In a sparse Merkle-patricia tree, we are not going to do bit bashing on
the key, because it is likely inconveniently large, and because we are
likely to be counting the height from from the left. But we might do it
for code re-use reasons, where the keys are integers of moderate size.
It is customary to define the least significant bit as bit zero, which
is the convention I have followed in the description of sequential
patricia Markle Trees. So in a patricia tree with node height measured
from the right hand side, nodes have a height, which is zero for leaf
nodes, and variable for the root node. In a sparse tree, one is apt to
measure the from the left, so nodes have a depth, not a height. The
depth of the root node is always zero, and leaves have a variable depth.
To make the code templatable, will need a patricia tree type to have a
member `static constexpr bool left_edge = true;` for trees with node
height measured from the left edge, and `false` for trees measured from
the right edge. The depth of bit three of a byte in position seven in an
array is `8*7 3` The node depth is equal to the position of the bit
that selected that node. If a node is the parent of all keys with the
most signficant bit set in the start byte of the key it has depth one.
The standard algorithm for entering a new item assumes you are using the
patricia tree as the infix order, but databases and map files do not do
this. When we are big and successful, will write a version of sqlite
that supports sharding and Merkle-patricia infix orderes.
And because the tree is sparse, it is probably coming in somewhat non
sequentially, with updates in random locations, though block updates are
likely to be sequential.
Nonetheless, the analysis of sequential blocks suggests we will get more
locality of reference on disk and memory, if we put all the patricia
nodes, leaf and vertex, in one big table, with the key of a vertex node
locating it close to the leaf node that necessitated its creation,
though we will need the bit offset to be part of this table. The height
means we dont need keys of different levels be guaranteed unique,
because when we are after a particular level, we use the height, likely
measured from the right hand side, to select the correct candidate in
the improbable event of a collision.
The bitstring of a node infix order is going, in practice to sorted on
whole bytes or whole words, rather than its exact bit length, resulting
in collisions, therefore we have sort on the bitstring and height, or,
as in a sequential patricia tree, pad the bitstring with a
representation of the height. Which for long keys is going to be a bit
verbose, though hardly a show stopper. We could pad the bitstring with
the difference between the height, and height rounded to the nearest
number of whole words, but this would have the irritating result that
256 bit hashes become 257 bit hashes, unless we truncated to 255 bits,
which is harmless but nonstandard and apt to result in complications all
over the place. And we would still be looking at the height, if only to
know the number of bytes or words.
The analysis of sequential hash trees suggests that it is going to be
more efficient to have all nodes of all levels in the same table,
possibly including leaf nodes. We could store the complete keys of
parent and its children in a node, but this is rather redundant. If we
take the bit field offset from the left, rather than from the right as
in sequential hash trees, then we truncate the key, conceptually at the
bit indicated by the bit offset, and only have the parts of the keys of
the children that are beyond the bit offset in the node.
In a sparse tree, we are just looking up the key to find the thing
referred to, which is probably referenced by an oid, so it may well be a
pointless indirection to have leaf nodes in the same table as their
parent node, which already has both the key to the child and an oid to
the child. Maybe we just have a leaf flag that says “this oid is not an
oid to yet another internal node, but an oid to something else,
somewhere else”.
Two maps, one of them being the map that you are making a Merkle
patricia hash of, and one containing the nodes. The nodes The nodes
point to a other nodes, or they themselves contain the leaf value, the
leaf value being the hash of the map key and object being hashed.
On the wire, we dont have a representation of the sparse tree. The
nodes get generated on the receiving end when we sent the objects
themselves. But we do have representations of incomplete sparse trees,
which are just a list of nodes with their hash values and keys, but not
their child links, that you will need to construct the root node from a
subset of the objects.
In rare and exceptional cases, the key of the root node by not be empty
bit string, so when specifying the root node of a sparse tree, we have
to specify its bit string. But when we send a partial tree, we assume
the recipient does not have, and does not intend to build, the complete
tree, and that he receives the root node by some special path that
authenticates it.
A node with children will be inherently different to a node without,
because it is going to need the data to construct the infix orderes of
its children, which is a good argument for putting them in distinct
tables. If we put leaf hashes and vertex hashes in the same table, we
are massively violating normal form. On the other hand, Sqlite3 supports
this by having variant typing. For sparse trees, not usually a good idea
to put the children in the same table. For sequential trees, usually the
way to go.
## Merkle-patricia Tree of 256 bit values.
Typically these are going to be a tree of public keys and or hashes.
Of course a database cannot handle infix order values that are not a
multiple of eight bits, and is likely to prefer a multiple of sixteen
bits, so we will make the key field of the infix order value fields the
prefix bits that all the leaves of a node have in common, with the bit
string 0 appended, rounded up to whole number of bytes with additional
one bits.
The depth of a node may be determined by taking the number of bytes
times eight, and substracting the bits from the last zero leftwards.
leaf nodes will be special cased.
If leaf nodes are part of the node map, we need 257 bits but we are
also going to need special case handing of leaf nodes, which is
effectively the 257th bit.
Since our infix order has two fields, it is efficient to reference nodes
by OID, rather than by key, though you frequently have to find a node by
its key field and its depth field. So each internal node will have its
depth, (or rather a value reversibly constructed from its depth) and its
key (or rather a value reversibly constructed from its key and its
depth), and the OIDs of its two children. For a leaf node, the child OID
fields have a different meaning.
To reconstruct the hash of a node, we look up the two children by OID,
and hash their hashes and the bit strings that differentiate them from
their parent and each other, or rather the the bytes containing the bit
strings that differentiate them from their parent and each other. We
hash the hash and bit string of one node first, then the hash and bit
string of the next node, to avoid the possibility that one concatenated
pair of bit strings might equal another pair of concatenated bit
strings.
If we are inserting a bunch of new map entries with the intent of
recalculating the new value of the root node once the insertions are
done, we sort them into a temporary table in order of their map key, and
after each insertion calculate the new values of the parent nodes up to
but excluding the parent node that is the parent of both the item
recently inserted, and the next item to be inserted. If no next item to
be inserted, up to and including the top.
Pretty sure that this guarantees all nodes get recalculated as necessary
and only as necessary, but we will need debug check to make sure we have
not missed a fence post problem. Have a forced re-evaluation of the
entire tree to check for internal consistency.
When we are generating blocks on the block chain on an ordinary computer
with an ordinary internet connection. (16mbps down, 3mbps up), and
blocks are typically 300 seconds per block, that implies that blocks are
smaller than a gigabyte, so we can prepare blocks entirely in memory,
either as an in memory sqlite database, or in a custom format using
links, rather than OIDs, and key values of the infix order in sixty four
bit integers, rather than bytes. Once a block is committed to global
blockchain, it however has to acquire global oids, hence goes in a
different database, probably in a slightly different database format,
since all the oids for a given kind item are sequential. But the hash of
a block is calculated in a way entirely independent of the oids that
will be assigned to it. Perhaps when it is committed, it gets capped by
additional data telling us what oids were assigned, but including the
oids when calculating the root hash of a block under construction, and
therefore suffering frequent inserts in random positions, would make it
too costly to insert new data.
Such information about the type and number of objects in the block, and
the blocks position on the block chain, should not, on the dry
principle, be part of the block or the block root hash, though it might
well be useful to include such information in the authentication block
(authentication blocks alternate with blocks containing actual updates
on the block chain) When combining two proposed blocks, it is good to
start by merging the larger block into the smaller block, and as a check
that everything is working as it should, make sure that the counts are
consistent, this being a low cost check against glitches in calculating
the hash. We dont need to incorporate such a consistency check into
the blockchain itself until the authentication block, which attests to
the previous root of the block chain, which attests to the previous root
and previous sequence number of the block chain, thus necessarily
contains its own sequence number, unlike a data block.
The hash of a block should be independent of its oids, but we do not
want a glitch to result in the oids of a peer getting out of sync with
its peers and that peer attempting to sail on unawares that it is out of
consensus, therefore the hash of an authentication block should contain
a hash that depends on the hash of the final authoritative oids of all
blocks in the preceding block chain, or all oids affected by the final
block.
On the dry principle, objects inside a block should not contain
information as to what block they are in, and a block should not contain
information as to what blockchain it is in nor what block number it is.
But a block should be accompanied by context information as to what
block number it is and what block chain it is in, and the root of the
blockchain should be accompanied by context information saying what
block chain it is and h1ow many blocks are in it.
An authentication block attests to the recent previous block of the
block chain in isolation from context, and therefore links to its own
context, but in general, the meaning of a hash is given by the object
that hashes to it. And we somehow have to find or construct the object
that hashes to it. The object that hashes to it therefore needs to
contain information that implies its size and how it is to be hashed,
what it means. With a hash, we ad hoc provide context information to
hint where to find the data that generates the hash. But the chain to
more and more global meanings is on the inside of the information being
hashed not on the outside. Every transaction provids the oids of the
unspent transaction outputs that it refers to and the signatures
authenticaded by their public keys. When an object contains an unhashed
reference to data outside itself, it must hash the broadest context in
which that data occurs. Thus a transaction must contain or imply the oid
of a previous root hash of the blockchain containing all these
transaction outputs, and the hash of the transaction must hash that
root, thus the each transaction authenticates and is authenticated by
previous blockchain.
Repeating. Where an object contains a reference to an object outside
itself which reference is not itself a hash, which is to say, when the
reference is an oid, then its hash has to chain to that object,
preferably by chaining to the largest object in which that object
occurs. The oid is merely a hint to finding data that chains into the
hash of the object, and the derivation of the correct hash depends on
correctly identifying the thing that the oid refers to, the data
structure into which the oid is a pointer. The oid could merely be the
eleventh hash in a sequential map of seventeen hashes, and to know what
the object means, you have to know, or correctly guess from context,
what it is. So for the hash of the object to uniquely determine the
meaning of the object, it would have to hash the root of the Merkle
patricia tree of that map that defines the oid. Objects containing oids
have to chain to the context that gives meaning to their oids, which
context chains to the larger context, the largest context of them all
being a recent past root of the global blockchain, which provides unique
oids global to the blockchain, for each type of oid that is directly in
the blockchain. And to the extent that the context providing oids
provides separate oid sequences for each type of object within it, and
thus an oid can refer to multiple things of multiple types within it,
then the type of the object or other data within the object has to imply
the type of the oid.
In short, if the object mapped by the map a Merkle-patricia tree
references another object, it should it should either reference it by
hash, by the hash of a patricia tree containing that object and the key
within the referenced patricia Mekle tree containing the referenced
object, or the hash of the object should depend on the hash of the
objects that it references, or the hash of the Merkle-patricia tree
containing that object, in order that the hash of the referencing
object, and thus the hash of the patricia Markle tree containing the
object, uniquely identifies everything that the object references.
# Implementation of the specific Merkle-patricia trees we will need.
We have a sequential tree of attestation blocks containing proof of
consensus, each of which chains to the previous block, and each of which
contains the root hash of the tree as it was before this block was
added.
We have a sequential tree of transactions, with each attestation block
attesting to the state of the sequential tree of transactions as it was
at that time, so that each participant knows that they agree with the
consensus on transactions, and they agree on the specific sequence
number of each transactions. They also agree on the current state after
all transactions were applied, they agree on the sparse tree of unspent
transaction outputs.
In order to generate that order, each full peer have to construct a
sparse tree ordered by hash code. When they reach agreement on the root
of that hash tree, they then proceed to agreement on the sequential list
of transactions, and the current state of unspent transaction outputs.
## Sparse tree of hashes
Because the bit string is going to be generally quite short, or else
exactly two hundred and thirty six bits, we treat this as a list of
bitstrings, with each list composed of lists of bitstrings, with each
list and sublist prefigured by a bitstring corresponding to their common
prefix. We dont try to save space by omitting the (typically short)
common part of the prefix, and we pad the bit string to a whole word
with the padding sequence 1000000\....
Each node contains a bit count and a count of its descendants. We
special case bit strings corresponding to whole words, and we special
case that special case for the case that bitstring is exactly two fifty
six bits.
We dont have a representation for the empty list, but we do have a
representation for a list consisting of a single item. So we build the
tree by starting with a single item, and then adding items to the tree.
The case where we have no items is handled separately and specially. We
never have an empty tree. A single item is itself a tree, and you can
add a tree to a tree. A tree is, of course composed of trees.
Because we are not prefix compressing the items in the tree of hashes,
our skip links do not contain the skip data, and the hash of a tree with
more than one item it does not hash the common prefix, but simply hashes
the hash of the subtrees. The hash of a leaf node, is, of course,
itself.
The root of this tree is recorded in the blockchain, but we do not need
to store the tree itself in the blockchain. It lives only in memory. The
sequential tree of transactions that it gives rise to lives on disk.