forked from cheng/wallet
d25baa764d
Got a lot shorter as copious half formed thoughts were deleted
651 lines
29 KiB
Markdown
651 lines
29 KiB
Markdown
---
|
||
title:
|
||
Merkle-patricia Dac
|
||
# katex
|
||
---
|
||
|
||
# Definition
|
||
|
||
## Merkle-patricia Trees
|
||
|
||
A Merkle-patricia tree is a way of hashing a map, an associative
|
||
array, such that the map can have stuff added to it, or removed
|
||
from it, without having to rehash the entire map, and such that one
|
||
can prove a subset of the map, such as a single mapping, is part of
|
||
the whole map, without needing to have to have the whole map
|
||
present to construct the hash.
|
||
|
||
The need to have the entire blockchain present to validate the
|
||
current state of the blockchain and any particular fact about a
|
||
blockchain is a huge problem with existing blockchains, as they
|
||
grow enormous.
|
||
|
||
In a Merkle dag, vertices are data structures, and edges are hashes,
|
||
as if one followed a link by looking up the preimage of a hash.
|
||
Obviously this is seldom an efficient way of actually implementing a
|
||
edges, and one is in practice going to use a pointer or handle for
|
||
data structures in memory, and an oid for structures in a database,
|
||
but this is an internal representation. The canonical form has no
|
||
pointers, no oids and no handles, these are internal representations
|
||
of the structure defined and implied by the hashes, and in
|
||
communications between humans and machines, and when defining
|
||
algorithms and operation on the data structure, the algorithm should
|
||
be defined, and the data communicated, as if one was actually using
|
||
the hashes, rather than oids and pointers.
|
||
|
||
Its practical application is constructing a global consensus on what
|
||
public keys have the right to control what digital assets (such as
|
||
crypto currencies, and globally defined human readable names) and
|
||
proving that everyone who matters agrees on ownership.
|
||
|
||
If a large group of peers, each peer acting on behalf of a large group
|
||
of clients each of whom have rights to a large number of digital assets,
|
||
agree on what public keys are entitled to control what digitial assets,
|
||
then presumably their clients also agree, or they would not use that
|
||
peer.
|
||
|
||
Thus, for example, we don’t want the Certificate Authority to be able
|
||
to tell Bob that his public key is a public key whose corresponding
|
||
secret key is on his server, while at the same telling Carol that Bob’s
|
||
public key is a public key whose corresponding secret key is in fact
|
||
controlled by the secret police.
|
||
|
||
The Merkle-patricia tree not only allows peers to form a consensus on an
|
||
enormous body of data, it allows clients to efficiently verify that any
|
||
quite small piece of data, any datum, is in accord with that consensus.
|
||
|
||
## Patricia trees
|
||
|
||
A patricia tree is a way of structuring a potentially very large list of
|
||
bitstrings sorted by bitstring such that bitstrings can be added or
|
||
deleted without resorting to shifting the whole list of bitstrings.
|
||
|
||
In practice, we are not interested in bitstrings, we are interested in
|
||
fields. We want to represent an sql table by patricia tree, and do sql
|
||
operations on the tree as if it were a table.
|
||
|
||
But it is in fact a tree, and its interior vertices do not have complete
|
||
fields, they have bitstrings representing part of a field. If we have a
|
||
binary patricia tree repesenting a database table with $n$ entries, it
|
||
will have $n$ leaf vertices, and $n-1$ internal vertices.
|
||
|
||
So, to map from a bitstring to a field representing the primary key
|
||
of an sql index, we append to the bitstring a `1` bit, followed by as
|
||
many `0` bits as are needed to bring it up to one bit past the right
|
||
boundary of the field plus one additional bit.
|
||
|
||
If it is already at the right boundary, we append merely one
|
||
additional `1` bit.
|
||
|
||
The additional bit is flag indicating a final vertex, a leaf vertex of the
|
||
index, false (`0`) for interior vertices, true (`1`) for leaf vertices of
|
||
the index -- so we now have a full field, plus a flag.
|
||
|
||
A bitstring represents the path through the merkle patricia tree to
|
||
vertex, and we will, for consistency with sql database terminology,
|
||
call the bitstring padded to one bit past the field boundary the key,
|
||
the key being the sql field plus the one additional trailing bit, the
|
||
field boundary flag (because we are dealing with the tree
|
||
representation of an sql table, so need to know whether we have
|
||
finally reached the actual record, or are still walking through the
|
||
index, and the field boundary flag represents whether we have
|
||
reached the actual record or not.)
|
||
|
||
To obtain the bitstring from the key, we remove the trailing `0` bits
|
||
and last 1 bit. Which is to say, if the field boundary flag is true, our
|
||
bitstring is the field, and if the field boundary flag is false our
|
||
bitstring is the field with last `1` bit and the following `0` bits discarded.
|
||
|
||
In a patricia tree each vertex is associated with a bitstring.
|
||
|
||
The when you walk the tree to the left child of a vertex, you add a zero bit, plus the bits, if any associated with that link, to predict the bitstring of the left child, and when you walk to the right hand child, a one bit,plus the bits if any associated with that link.
|
||
|
||
This enables you, given the bitstring you start with, and bitstring of
|
||
the vertex you want to find, the path through the patricia tree.
|
||
And, if it is a Merkle patricia tree, this enables you to not only
|
||
produce a short efficient proof that proves the presence of a
|
||
certain datum in an enormous pile of data, but the absence of a datum.
|
||
|
||
Suppose we have database table whose primary key is a three bit
|
||
integer, and it contains four records, with oids 2, 4, 5, and 6. The
|
||
big endian representation of those primary keys 0b010, 0b100,
|
||
0b101, and 0b111
|
||
|
||
The resulting patricia tree is:
|
||
|
||
<svg
|
||
xmlns="http://www.w3.org/2000/svg"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||
width="29em" height="24em"
|
||
viewBox="0 -10 320 265"
|
||
style="background-color:#FF9" stroke-width="1.5"
|
||
stroke-linecap="round" >
|
||
<g font-family="'Times New Roman'" font-size="10" font-weight="400"
|
||
fill-rule="evenodd" fill="black" >
|
||
<path stroke="#000000" fill="none" d="
|
||
M 156,18 c10,60 -121,149 -111,209
|
||
M 156,18 c-8,55 70,20 60,69
|
||
c 8,55 -70,20 -63,70
|
||
M 273,227 c20,-30 -60,-100 -57,-140
|
||
M 197,227 c12,-35 -52,-35 -44,-70
|
||
c 2,35 -38,35 -32,70" />
|
||
<g font-weight="800" fill=#FFF>
|
||
<g id="link_bits" font-size="18">
|
||
<text>
|
||
<tspan y="74" x="134">10</tspan>
|
||
<tspan y="134" x="228">0</tspan></text>
|
||
</g>
|
||
</g>
|
||
<use font-weight="400" xlink:href="#link_bits"/>
|
||
<g id="rect">
|
||
<rect width="66" height="27" x="123" y="4" rx=5
|
||
fill="#0FF" />
|
||
<text y="2">
|
||
<tspan dy="12" x="126">bitstring</tspan>
|
||
<tspan dy="12" x="126" >key</tspan></text>
|
||
</g>
|
||
<text y="2">
|
||
<tspan dy="12" x="172" >""</tspan>
|
||
<tspan dy="12" x="153" >4, false</tspan></text>
|
||
<use transform="translate(60 70)" xlink:href="#rect"/>
|
||
<text y="72">
|
||
<tspan dy="12" x="238" >1</tspan>
|
||
<tspan dy="12" x="215" >6, false</tspan></text>
|
||
<use transform="translate(-3 140)"
|
||
xlink:href="#rect"/>
|
||
<text y="142">
|
||
<tspan dy="12" x="170" >10</tspan>
|
||
<tspan dy="12" x="152" >5, false</tspan></text>
|
||
<g transform="translate(-43 50)">
|
||
<use transform="translate(-68 160)"
|
||
xlink:href="#rect"/>
|
||
<text y="162">
|
||
<tspan dy="12" x="98" >010</tspan>
|
||
<tspan dy="12" x="91" >2, true</tspan></text>
|
||
<use transform="translate(8 160)"
|
||
xlink:href="#rect"/>
|
||
<text y="162">
|
||
<tspan dy="12" x="174" >100</tspan>
|
||
<tspan dy="12" x="167" >4, true</tspan></text>
|
||
<use transform="translate(84 160)"
|
||
xlink:href="#rect"/>
|
||
<text y="162">
|
||
<tspan dy="12" x="250" >101</tspan>
|
||
<tspan dy="12" x="243" >5, true</tspan></text>
|
||
<use transform="translate(160 160)"
|
||
xlink:href="#rect"/>
|
||
<text y="162">
|
||
<tspan dy="12" x="326" >110</tspan>
|
||
<tspan dy="12" x="319" >6, true</tspan></text>
|
||
</g>
|
||
</g>
|
||
</svg>
|
||
|
||
On the rightmost path from the top, the path gains two $1$ bits
|
||
because it goes right both times, then a $0$ bit from the link, thus the
|
||
bitstring identity of the path and the vertex is $110$.
|
||
|
||
We call the bits that link adds, in addition to the bit that results
|
||
from the choice to go left or right, the skip bits, because we are
|
||
skipping levels in the binary tree.
|
||
|
||
On the left most path from the top, the path gains one $0$ bit
|
||
because it goes left, then $10$ from the link, thus the
|
||
bitstring identity of the path and the vertex is $010$.
|
||
|
||
The two middle paths have identity purely from the left/right
|
||
choices, not receiving any additional bits from the links other than
|
||
the bit that comes from going left or right.
|
||
|
||
Each bitstring, thus each key field, identifies a vertex and a path
|
||
through the patricia tree.
|
||
|
||
We do not necessarily want to actually manipulate or represent
|
||
the bitstrings of vertices and skip fields as bitstrings. It is likely to
|
||
be a good more convenient to represent and manipulate keys, and to
|
||
represent the skip bits by the key of the target vertex.
|
||
|
||
Fields have meanings for the application using the Merkle patricia
|
||
dag, bitstrings lack meaning.
|
||
|
||
But to understand what a patricia tree is, and to manipulate it, our
|
||
actions have to be equivalent to an algorithm described in terms of
|
||
bitstrings. We use keys because computers manipulate bytes better
|
||
than bits, just as we use pointers because we don't want to look up
|
||
the preimages of hashes in a gigantic table of hashes. But a Merkle
|
||
tree algorithm must work as if we were looking up preimages by
|
||
their hash, and sometimes we will have to look up a preimage by its
|
||
hash,and a patricia tree algorithm as if we were manipulating
|
||
bitstrings, and sometimes we will have to manipulate bitstrings.
|
||
|
||
The total number of vertexes equals the twice the number of leaves
|
||
minus one. Each parent node has as its identifier, a sequence of
|
||
bits not necessarily aligned on field boundaries, that both its children
|
||
have in common.
|
||
|
||
A Merkle-patricia dac is a patricia tree with binary radix (which is
|
||
the usual way patricia trees are implemented) where the hash of each
|
||
node depends on the hash and the skip of its two children; Which means
|
||
that each node contains proof of the entire state of all its descendant
|
||
nodes.
|
||
|
||
The skip of a branch is the bit string that differentiates its bit
|
||
string from its parent, with the first such bit excluded as it is
|
||
implied by being a left or right branch. This is often the empty
|
||
bitstring, which when mapped to a byte string for hashing purposes, maps
|
||
to the empty byte string.
|
||
|
||
It would often be considerably faster and more efficient to hash the
|
||
full bitstring, rather than the skip, and that may sometimes be not
|
||
merely OK, but required, but often we want the hash to depend only on
|
||
the data, and be independent of the metadata, as when the leaf index is
|
||
an arbitrary precision integer representing the global order of a
|
||
transaction, that is going to be constructed at some later time and
|
||
determined by a different authority.
|
||
|
||
Most of the time we will be using the tree to synchronize two
|
||
blocks pending transactions, so though a count of the number of
|
||
children of a vertex or an edge is not logically part of a Merkle-patricia
|
||
tree, it will make synchronization considerably more
|
||
efficient, since the peer that has the block with fewer children wants
|
||
information from the peer that has the node with more children.
|
||
|
||
## A sequential append only collection of postfix binary trees
|
||
|
||
<svg
|
||
xmlns="http://www.w3.org/2000/svg"
|
||
xmlns:xlink="http://www.w3.org/1999/xlink"
|
||
width="29em" height="17em"
|
||
viewBox="0 186 220 129"
|
||
style="background-color:ivory" stroke-width="1"
|
||
stroke-linecap="round" >
|
||
<g font-family="'Times New Roman'" font-size="10"
|
||
font-weight="400" fill-rule="evenodd" fill="black" >
|
||
<path stroke="#0D0" fill="none" d="
|
||
M13,249 c-10,10 -11,30 -3,17
|
||
M13 249 c10,10 15,10.5 17,5
|
||
M13 249 c0,-16 30,-22 66,-22 s67,7 66,-5
|
||
M13 249 c30,-16 57,2 56,-11
|
||
"/>
|
||
<g id="blockchain_id" >
|
||
<ellipse cx="14" cy="249" fill="#00D000" rx="8" ry="5"/>
|
||
<text fill="black">
|
||
<tspan x="11.08" y="251.265">id</tspan>
|
||
</text>
|
||
</g>
|
||
<g id="balanced merkle tree" fill="none">
|
||
<g id="height_4_tree" >
|
||
<path stroke="#F00"
|
||
d="
|
||
M146,222 C146,217 152,217 156,219 S202,238.2
|
||
206,240 c4,2 8,3 8,-2
|
||
M146,222 C146,217 152,217 154,221 s4,8 12,28
|
||
c8,20 9,-3 7,5
|
||
M146,222
|
||
C146,217 151,217 152,222
|
||
q2,10 -2,33 t4,11
|
||
M0,0 c -1,10 1,27 6.5,14
|
||
"/>
|
||
<path stroke="#000"
|
||
d="
|
||
M146, 220 c4,-20 151,2 151.5,-12
|
||
m-7,12 c0,-6 9,-6 8.5,-12"
|
||
/>
|
||
<g id="height_3_tree">
|
||
<path stroke="#F00"
|
||
d="
|
||
M70,237 c5,-9 5,0 15,12 s11.5,7 14,3
|
||
M70,237 c5,-9 5,0 2.5,20 s5,13 5,9
|
||
"/>
|
||
<path stroke="#000"
|
||
d="
|
||
M70,237 c0,-8 10,-5 20,-5
|
||
c10,0 62,4 55.5,-10
|
||
m-7.5,16 c-2,-8 11,-6 9,-15
|
||
"/>
|
||
<g id="height_2_tree">
|
||
<path stroke="#F00"
|
||
d="
|
||
M29,254 c13,-18 -3,32 11,12
|
||
"/>
|
||
<path stroke="#000" d="
|
||
M30,254 c -4,-16 42,0 40,-16
|
||
"/>
|
||
<path id="uplink_1-2" stroke="#000" d="
|
||
M60 254 c -2,-10 12,-2 11,-16
|
||
x "/>
|
||
<g id="height_1_tree">
|
||
<path stroke="#000"
|
||
d="
|
||
M10,266 c-4,-12 22,-1 20,-12
|
||
M22,266 c-2,-9 9,0 9,-12
|
||
"/>
|
||
<g id="leaf_vertex" >
|
||
<g style="stroke:#000;" stroke-width="0.6">
|
||
<path id="path1024"
|
||
d="
|
||
M 11.7,265 8,271
|
||
M 11.7,265 9.5,271
|
||
M 11.7,265 11,271
|
||
">
|
||
</g>
|
||
<rect id="merkle_vertex" width="4" height="4" x="8" y="264" fill="#00F"/>
|
||
</g><!-- end id="leaf vertex" -->
|
||
<use width="100%" height="100%" transform="translate(12)" xlink:href="#leaf_vertex"/>
|
||
<use width="100%" height="100%" transform="translate(20 -12)" xlink:href="#merkle_vertex"/>
|
||
</g><!-- end id="height_1_tree" -->
|
||
<use width="100%" height="100%" transform="translate(30)" xlink:href="#height_1_tree"/>
|
||
<use width="100%" height="100%" transform="translate(60 -28)" xlink:href="#merkle_vertex"/>
|
||
</g><!-- end id="height_2_tree" -->
|
||
<g width="100%" height="100%" >
|
||
<use transform="translate(68)" xlink:href="#height_2_tree"/>
|
||
<use transform="translate(136 -44)" xlink:href="#merkle_vertex"/>
|
||
</g>
|
||
</g><!-- end id="height_3_tree" -->
|
||
<use transform="translate(144)" xlink:href="#height_3_tree"/>
|
||
<use transform="translate(288 -60)" xlink:href="#merkle_vertex"/>
|
||
</g><!-- end id="height_4_tree" -->
|
||
</g> <!-- end id="balanced merkle tree" -->
|
||
<text y="188">
|
||
<tspan dy="12" x="6" >Immutable append only file as a collection of</tspan>
|
||
<tspan dy="12" x="6" >balanced binary Merkle trees</tspan>
|
||
<tspan dy="12" x="6" >in postfix order</tspan>
|
||
</text>
|
||
<g id="merkle_chain">
|
||
<use transform="translate(0,50)" xlink:href="#blockchain_id"/>
|
||
<path
|
||
style="fill:none;stroke:#00D000;"
|
||
d="m 18,297 c 4,-6 4,-6 5.6,3 C 25,305 28,304 28.5,300"/>
|
||
<g id="16_leaf_links">
|
||
<g id="8_leaf_links">
|
||
<g id="4_leaf_links">
|
||
<g id="2_leaf_links">
|
||
<g id="leaf_link">
|
||
<path
|
||
style="fill:none;stroke:#000;"
|
||
d="m 29,299 c 4,-6 4,-6 5.6,3 C 35,305 38,304 38.5,300"/>
|
||
<use transform="translate(20,33)" xlink:href="#leaf_vertex"/>
|
||
</g><!-- end id="leaf link" -->
|
||
<use transform="translate(10,0)"
|
||
xlink:href="#leaf_link"
|
||
/>
|
||
</g> <!-- end id="2 leaf links" -->
|
||
<use transform="translate(20,0)"
|
||
xlink:href="#2_leaf_links"
|
||
/>
|
||
</g> <!-- end id="4 leaf links" -->
|
||
<use transform="translate(40,0)"
|
||
xlink:href="#4_leaf_links"
|
||
/>
|
||
</g> <!-- end id="8 leaf links" -->
|
||
<use transform="translate(80,0)"
|
||
xlink:href="#8_leaf_links"
|
||
/>
|
||
</g> <!-- end id="16 leaf links" -->
|
||
<use transform="translate(160,0)"
|
||
xlink:href="#16_leaf_links"
|
||
/>
|
||
</g> <!-- end id="merkle chain" -->
|
||
<rect width="210" height=".4" x="8" y="276" fill="#000"/>
|
||
<text y="280">
|
||
<tspan dy="8" x="6" >Immutable append only file as a Merkle chain</tspan>
|
||
</text>
|
||
</g>
|
||
</svg>
|
||
|
||
This data structure means that instead of having one gigantic
|
||
proof that takes weeks to evaluate that the entire blockchain is
|
||
valid, you have an enormous number of small proofs that each
|
||
particular part of the blockchain is valid. This has two huge
|
||
advantages over the chain structure.
|
||
|
||
1. It protects clients against malicious peers, since any claim
|
||
the peer makes about the total state of the blockchain can
|
||
be proven with $\bigcirc(\log_2n)$ hashes.
|
||
1. If a block gets lost or corrupted that peer can identify that one specific block that is a problem. Peers have to reload down, or at least re-index, the entire blockchain far too often.
|
||
|
||
This is not a Merkle-patricia tree. This is a generalization of a Merkle
|
||
patricia dag to support immutability.
|
||
|
||
The intended usage is an immutable append only dag.
|
||
|
||
In a binary patricia tree each vertex has two links to other vertices,
|
||
one of which corresponds to appending a $0$ bit to the bitstring, and
|
||
one of which corresponds to adding a $1$ bit to the bitstring.
|
||
|
||
In this dag vertices that have bit strings ending in a $0$ bit have a
|
||
third link, that links to a vertex whose bit string is truncated back to
|
||
the previous $0$ bit, a shorter bitstring.
|
||
|
||
This enables one to reach any previous vertex through a chain of
|
||
hashes, and thus means that each new item in sequence is a hash of
|
||
all previous data in the tree.
|
||
|
||
The superstructure of balanced binary Merkle trees allows us to
|
||
verify any part of it with only $O(log)$ hashes, and thus to verify that
|
||
one version of this data structure that one party is using is a later
|
||
version of the same data structure that another party is using.
|
||
|
||
This reduces the amount of trust that clients have to place in peers.
|
||
When the blockchain gets very large there will be rather few peers
|
||
and a great many clients, thus there will be a risk that the peers will
|
||
plot together to bugger the clients. This structure enables a client
|
||
to verify that any part of the blockchain is what his peer say it is, and thus avoids the risk that peer may tell different clients different
|
||
accounts of the consensus. Two clients can quickly verify that they
|
||
are on the same total order and total set of transactions, and that
|
||
any item that matters to them is part of this same total order and
|
||
total set.
|
||
|
||
When the chain becomes very big, sectors and disks will be failing
|
||
all the time, and we don't want such failures to bring everything to a
|
||
screaming halt. At present, such failures far too often force you to
|
||
reindex the blockchain, and redownload a large part of it, which
|
||
happens far too often and happens more and more as the
|
||
blockchain becomes enormous.
|
||
|
||
And, when the chain becomes very big, most people will be
|
||
operating clients, not peers, and they need to be able to ensure
|
||
that the peers are not lying to them.
|
||
|
||
### storage
|
||
|
||
We would like to represent an immutable append only data
|
||
structure by append only files, and by sql tables with sequential and
|
||
ever growing oids.
|
||
|
||
When we defined the key for a Merkle patricia tree, the key
|
||
definition gave us the parent node with a key field in the middle of
|
||
its chilren, infix order
|
||
|
||
For this dag, we would like to define an oid field so that the oid
|
||
field of a parent follows the oid fields of its children.
|
||
|
||
Let us suppose the leaf nodes of the tree depicted above are fixed size $c$, and the interior vertices are fixed size $d$ ($d$ is probably thirty two or sixty four bytes) and they are being physically stored in
|
||
memory or a file in sequence.
|
||
|
||
Let us suppose the leaf nodes are stored with the interior vertices
|
||
and are sequentially numbered.
|
||
|
||
Then the location of leaf node $n$ begins at $n\times c+\big(n-$`std::popcount`$(n)\times d\big)$ (which unfortunately lacks a simple
|
||
relationship to the bitstring of a vertex corresponding to a complete
|
||
field, which is the field that represents the meaning that we actually
|
||
care about).
|
||
|
||
We can calculate the location of an interior vertex from the number
|
||
of the largest numbered leaf node that it could be a parent of:\
|
||
To find the oid of a vertex accessed as an sql table pad its bitstring
|
||
out to the field width plus one with $1$ bits, (equivalent to
|
||
subtracting one from key and oring the result with the key) subtract
|
||
the `std::popcount` of the bitstring, and you have the sequential
|
||
and always incrementing oid, such that the oid of a parent is always
|
||
one greater than the oid of its right child.
|
||
|
||
If the field is an integer, the block height, the number of blocks in
|
||
the blockchain, the oid is one bit larger and approximately twice the
|
||
size of that integer, assuming that we are putting vertices and block
|
||
roots in the same sql table. (Which we probably won't.)
|
||
|
||
# Blockchain
|
||
|
||
A Merkle-patricia block chain represents *an immutable past and a constantly changing present*.
|
||
|
||
Which represents an immutable and ever growing sequence of transactions,
|
||
and also a large and mutable present state of the present database that
|
||
is the result of those transactions, the database of unspent transaction
|
||
outputs.
|
||
|
||
When we are assembling a new block, the records live in memory as native
|
||
format C++ objects. Upon a new block being finalized, they get written
|
||
to disk in key order, with implementation dependent offsets between
|
||
records and implementation dependent compression, which compression
|
||
likely reflects canonical form. Once written to disk, they are accessed
|
||
by native format records in memory, which access by bringing disk
|
||
records into memory in native format, but the least recently loaded
|
||
entry, or least recetly used entry, gets discarded. Even when we are
|
||
operating at larger scale than visa, a block representing five minutes
|
||
of transactions fits easily in memory.
|
||
|
||
Further, a patricia tree is a tree. But we want, when we have the Merkle
|
||
patricia tree representing registered names organized by names or the
|
||
Merkle-patricia tree represenging as yet unspent transaction outputs its
|
||
Merkle characteristic to represent a directed acyclic graph. If two
|
||
branches have the same hash, despite being at different positions and
|
||
depths in the tree, all their children will be identical. And we want to
|
||
take advantage of this in that block chain will be directed acyclic
|
||
graph, each block being a tree representing the state of the system at
|
||
that block commitment, but that tree points back into previous block
|
||
commitments for those parts of the state of the system that have not
|
||
changed. So the hash of the node in such a tree will identify, probably
|
||
through an OID, a record of the block it was a originally constructed
|
||
for, and its index in that tree.
|
||
|
||
A Merkle-patricia directed acyclic graph, Merkle-patricia dac, is a
|
||
Merkle dac, like a git repository or the block chain, with the patricia
|
||
key representing the path of hashes, and acting as index through that
|
||
chain of hashes to find the data that you want.
|
||
|
||
The key will thread through different computers under the control of
|
||
different people, thus providing a system of witness that the current
|
||
global consensus hash accurately reflects past global consensus hashes,
|
||
and that each entities version of the past agree with the version it
|
||
previously espoused.
|
||
|
||
This introduces some complications when a portion of the tree represents
|
||
a database table with more than one index.
|
||
|
||
[Ethereum has a discussion and
|
||
definition](https://github.com/ethereum/wiki/wiki/Patricia-Tree) of this
|
||
data structure.
|
||
|
||
Suppose, when the system is at scale, we have thousand trillion entries
|
||
in the public, readily accessible, and massively replicated part of the
|
||
blockchain. (I intend that every man and his dog will also have a
|
||
sidechain, every individual, every business. The individual will
|
||
normally not have his side chain publicly available, but in the event of
|
||
a dispute, may make a portion of it visible, so that certain of his
|
||
payments, an the invoice they were payments for, become visible to
|
||
others.)
|
||
|
||
In that case, a new transaction output is typically going to require
|
||
forty thirty two byte hashes, taking up about two kilobytes in total on
|
||
any one peer. And a single person to person payment is typicaly going to
|
||
take ten transaction outputs or so, taking twenty kilobytes in total on
|
||
any one peer. And this is going to be massively replicated by a few
|
||
hundred peers, taking about four megabytes in total.
|
||
|
||
(A single transaction will typically be much larger than this, because
|
||
it will mingle several person to person payments.
|
||
|
||
Right now you can get system with sixty four terabytes of hard disk,
|
||
thirty two gigabytes of ram, under six thousand, for south of a hundred
|
||
dollars per terabyte, so storing everything forever is going to cost
|
||
about a twentieth of a cent per person to person payment. And a single
|
||
such machine will be good to hold the whole blockchain for the first few
|
||
trillion person to person payments, good enough to handle paypal volumes
|
||
for a year.
|
||
|
||
“OK”, I hear you say. “And after the first few trillion transactions?”.
|
||
|
||
Well then, if we have a few trillion transactions a year, and only a few
|
||
hundred peers, then the clients of any one peer will be doing about ten
|
||
billion transactions a year. If he profits half a cent per transaction,
|
||
he is making about fifty million a year. He can buy a few more sixty
|
||
four terabyte computers every year.
|
||
|
||
The target peer machine we will write for will have thirty two gigabytes
|
||
of ram and sixty four terabytes of hard disk, but our software should
|
||
run fine on a small peer machine, four gigabytes of ram and two
|
||
terabytes of hard disk, until the crypto currency surpasses bitcoin.
|
||
|
||
Because we will employ fixed size transaction units – larger currency
|
||
amounts will be broken into tens, twenties, fifties, hundreds, two
|
||
hundreds, five hundreds, thousands, two thousands and so forth, and
|
||
because we will be using a blockchain in the form of a Merkle-patricia
|
||
dac, our transactions will tak up several times as much space a similar
|
||
bitcoin transaction, and currently bitcoin transactions take up several
|
||
hundred megabytes. But this is OK, because the Merkle-patricia dac gives
|
||
client wallets far more power than on the bitcoin system, so we can get
|
||
by with far fewer peer wallets and far more client wallets.
|
||
|
||
# vertex identifiers
|
||
|
||
We need a canonical form for all data structures, the form which is
|
||
hashed, even if it is not convenient to use or manipulate the data in
|
||
that form on a particular machine with particular hardware and a
|
||
particular complier.
|
||
|
||
A patricia tree representation of a field and record of fields does
|
||
not gracefully represent variable sized records.
|
||
|
||
If we represented the bitstring that corresponds to the block
|
||
number, the block height, has having a large number of leading
|
||
zero bits, so that it corresponds to a sixty three bit integer (we need
|
||
the additional low order bit for operations translating the bitstring
|
||
to its representation as a key field or oid field) a fixed field of sixty
|
||
four bits will do us fine for a trillion years or so.
|
||
|
||
But I have an aesthetic objection to representing things that are not
|
||
fixed sized as fixed sized.
|
||
|
||
Therefore I am inclined to represent bit strings as count of bytes, a
|
||
byte string containing the zero padded bitstring, the bitstring being
|
||
byte aligned with the field boundary, and count of the distance in
|
||
bits between the right edge of the bitstring, and the right edge of
|
||
the field, that being the height of the interior vertex above the
|
||
leaf vertices containing the actual data that we are interested in, in
|
||
its representation as an sql table.
|
||
|
||
Some fields, notably text strings, do not have a definite right hand
|
||
boundary, representing the boundary inline. In that case, we
|
||
represent the vertex depth below the start of field, rather than the
|
||
vertex height above the end of field.
|
||
|
||
We always start walking the vertexes representing an immutable
|
||
append only Merkle patricia tree knowing the bitstring, so their
|
||
preimages do not need to contain a vertex bitstring, nor do their
|
||
links need to add bits to the bitstring, because all the bits added
|
||
or subtracted are implicit in the choice of branch to take, so those
|
||
links do not contain representations of skip field bit string either.
|
||
However, when passing blocks around, we do need to communicate
|
||
the bitstring of a block, and when passing a hash path, we do need
|
||
to communicate the bitstring of the root vertex of the path, and
|
||
many of the hashes will be interior to a block, and thus their links
|
||
do need bitstrings for their skip fields, and we will need to sign
|
||
those messages, thus need to hash them, so we need a canonical
|
||
hash of a bitstring, which requires a canonical representation of a bitstring. A bitstring lives in a field, so the position of the bitstring
|
||
relative to the field boundary needs a canonical representation,
|
||
though when we are walking the tree, this information is usually
|
||
implicit, so does not inherently need to present in the preimage of
|
||
the vertex. But, since we are putting byte aligned bitfields in byte
|
||
strings, we need to know where the bitstring of a skip field for a link
|
||
ends within the byte, which is most conveniently done by giving the
|
||
field alignment of the end of the bitstring within the field. For hash
|
||
fields and elliptic point fields, it is more efficient to give the
|
||
alignment from the start of field, rather than the end, since the skip
|
||
field bitstring usually ends close to the start of field, but for integer
|
||
fields such as the block number, more efficient to give alignment
|
||
from end. Indeed, we would like to represent integers in a form
|
||
independent of the computer word size, so we do not know the
|
||
alignment from start of field for a bitstring that is part of an integer field, only the size of the byte aligned bitstring, and how far it is
|
||
from the end of the integer field.
|