1
0
forked from cheng/wallet
wallet/docs/merkle_patricia_dag.md
2022-05-23 15:43:42 +10:00

32 KiB
Raw Blame History

title: Merkle-patricia Dac # katex ... # Definition ## Merkle-patricia Trees A Merkle-patricia tree is a way of hashing a map, an associative array, such that the map can have stuff added to it, or removed from it, without having to rehash the entire map, and such that one can prove a subset of the map, such as a single mapping, is part of the whole map, without needing to have to have the whole map present to construct the hash. The need to have the entire blockchain present to validate the current state of the blockchain and any particular fact about a blockchain is a huge problem with existing blockchains, as they grow enormous, and rapidly becoming a bigger problem. In a Merkle dag, vertices are data structures, and edges are hashes, as if one followed a link by looking up the preimage of a hash. Obviously this is seldom an efficient way of actually implementing a edges, and one is in practice going to use a pointer or handle for data structures in memory, and an oid for structures in a database, but this is an internal representation. The canonical form has no pointers, no oids and no handles, these are internal representations of the structure defined and implied by the hashes, and in communications between humans and machines, and when defining algorithms and operation on the data structure, the algorithm should be defined, and the data communicated, as if one was actually using the hashes, rather than oids and pointers. Its practical application is constructing a global consensus on what public keys have the right to control what digital assets (such as crypto currencies, and globally defined human readable names) and proving that everyone who matters agrees on ownership. If a large group of peers, each peer acting on behalf of a large group of clients each of whom have rights to a large number of digital assets, agree on what public keys are entitled to control what digitial assets, then presumably their clients also agree, or they would not use that peer. Thus, for example, we dont want the Certificate Authority to be able to tell Bob that his public key is a public key whose corresponding secret key is on his server, while at the same telling Carol that Bobs public key is a public key whose corresponding secret key is in fact controlled by the secret police. The Merkle-patricia tree not only allows peers to form a consensus on an enormous body of data, it allows clients to efficiently verify that any quite small piece of data, any datum, is in accord with that consensus. ## Patricia trees A patricia tree is a way of structuring a potentially very large list of bitstrings sorted by bitstring such that bitstrings can be added or deleted without resorting to shifting the whole list of bitstrings. In practice, we are not interested in bitstrings, we are interested in fields. We want to represent an sql table by patricia tree, and do sql operations on the tree as if it were a table. But it is in fact a tree, and its interior vertices do not have complete fields, they have bitstrings representing part of a field. If we have a binary patricia tree repesenting a database table with n entries, it will have n leaf vertices, and n-1 internal vertices. So, to map from a bitstring to a field representing the primary key of an sql index, we append to the bitstring a 1 bit, followed by as many 0 bits as are needed to bring it up to one bit past the right boundary of the field plus one additional bit. If it is already at the right boundary, we append merely one additional 1 bit. The additional bit is flag indicating a final vertex, a leaf vertex of the index, false (0) for interior vertices, true (1) for leaf vertices of the index -- so we now have a full field, plus a flag. A bitstring represents the path through the merkle patricia tree to vertex, and we will, for consistency with sql database terminology, call the bitstring padded to one bit past the field boundary the key, the key being the sql field plus the one additional trailing bit, the field boundary flag (because we are dealing with the tree representation of an sql table, so need to know whether we have finally reached the actual record, or are still walking through the index, and the field boundary flag represents whether we have reached the actual record or not.) To obtain the bitstring from the key, we remove the trailing 0 bits and last 1 bit. Which is to say, if the field boundary flag is true, our bitstring is the field, and if the field boundary flag is false our bitstring is the field with last 1 bit and the following 0 bits discarded. In a patricia tree each vertex is associated with a bitstring. The when you walk the tree to the left child of a vertex, you add a zero bit, plus the bits, if any associated with that link, to predict the bitstring of the left child, and when you walk to the right hand child, a one bit,plus the bits if any associated with that link. This enables you, given the bitstring you start with, and bitstring of the vertex you want to find, the path through the patricia tree. And, if it is a Merkle patricia tree, this enables you to not only produce a short efficient proof that proves the presence of a certain datum in an enormous pile of data, but the absence of a datum. Suppose we have database table whose primary key is a three bit integer, and it contains four records, with oids 2, 4, 5, and 6. The big endian representation of those primary keys 0b010, 0b100, 0b101, and 0b111 The resulting patricia tree with infix keys is: 10 0 bitstring key "" 4, false 1 6, false 10 5, false 010 2, true 100 4, true 101 5, true 110 6, true On the rightmost path from the top, the path gains two 1 bits because it goes right both times, then a 0 bit from the link, thus the bitstring identity of the path and the vertex is 110. We call the bits that link adds, in addition to the bit that results from the choice to go left or right, the skip bits, because we are skipping levels in the binary tree. On the left most path from the top, the path gains one 0 bit because it goes left, then 10 from the link, thus the bitstring identity of the path and the vertex is 010. The two middle paths have identity purely from the left/right choices, not receiving any additional bits from the links other than the bit that comes from going left or right. Each bitstring, thus each key field, identifies a vertex and a path through the patricia tree. We do not necessarily want to actually manipulate or represent the bitstrings of vertices and skip fields as bitstrings. It is likely to be a good more convenient to represent and manipulate keys, and to represent the skip bits by the key of the target vertex. Fields have meanings for the application using the Merkle patricia dag, bitstrings lack meaning. But to understand what a patricia tree is, and to manipulate it, our actions have to be equivalent to an algorithm described in terms of bitstrings. We use keys because computers manipulate bytes better than bits, just as we use pointers because we don't want to look up the preimages of hashes in a gigantic table of hashes. But a Merkle tree algorithm must work as if we were looking up preimages by their hash, and sometimes we will have to look up a preimage by its hash,and a patricia tree algorithm as if we were manipulating bitstrings, and sometimes we will have to manipulate bitstrings. The total number of vertexes equals the twice the number of leaves minus one. Each parent node has as its identifier, a sequence of bits not necessarily aligned on field boundaries, that both its children have in common. A Merkle-patricia dac is a patricia tree with binary radix (which is the usual way patricia trees are implemented) where the hash of each node depends on the hash and the skip of its two children; Which means that each node contains proof of the entire state of all its descendant nodes. The skip of a branch is the bit string that differentiates its bit string from its parent, with the first such bit excluded as it is implied by being a left or right branch. This is often the empty bitstring, which when mapped to a byte string for hashing purposes, maps to the empty byte string. It would often be considerably faster and more efficient to hash the full bitstring, rather than the skip, and that may sometimes be not merely OK, but required, but often we want the hash to depend only on the data, and be independent of the metadata, as when the leaf index is an arbitrary precision integer representing the global order of a transaction, that is going to be constructed at some later time and determined by a different authority. Most of the time we will be using the tree to synchronize two blocks pending transactions, so though a count of the number of children of a vertex or an edge is not logically part of a Merkle-patricia tree, it will make synchronization considerably more efficient, since the peer that has the block with fewer children wants information from the peer that has the node with more children. ## A sequential append only collection of postfix binary trees id Immutable append only file as a collection of balanced binary Merkle trees in postfix order Immutable append only file as a Merkle chain This data structure means that instead of having one gigantic proof that takes weeks to evaluate that the entire blockchain is valid, you have an enormous number of small proofs that each particular part of the blockchain is valid. This has three advantages over the chain structure. 1. A huge problem with proof of stake is "nothing at stake". There is nothing stopping the peers from pulling a whole new history out of their pocket.
With this data structure, there is something stopping them. They cannot pull a brand new history out of their pocket, because the clients have a collection of very old roots of very large balanced binary merkle trees of blocks. They keep the hash paths to all their old transactions around, and if the peers invent a brand new history, the clients find that the context of all their old transactions has changed. 1. If a block gets lost or corrupted that peer can identify that one specific block that is a problem. At present peers have to download, or at least re-index, the entire blockchain far too often, and a full re-index takes days or weeks. 1. It protects clients against malicious peers, since any claim the peer makes about the total state of the blockchain can be proven with \bigcirc(\log_2n) hashes. 1. We don't want the transaction metadata to be handled outside the secure wallet system, so we need client wallets interacting directly with other client wallets, so we need any client to be able to verify that the other client is on a consensus about the state of the blockchain that is a successor, predecessor, or the same as its consensus, that each client can itself verify that the consensus claimed by its peer is generally accepted. This is not a Merkle-patricia tree. This is a generalization of a Merkle patricia dag to support immutability. The intended usage is an immutable append only dag. In a binary patricia tree each vertex has two links to other vertices, one of which corresponds to appending a 0 bit to the bitstring that identifies the vertex and the path to the vertex, and one of which corresponds to adding a 1 bit to the bitstring. In an immutable append only Merkle patricia dag vertices identified by bit strings ending in a 0 bit have a third hash link, that links to a vertex whose bit string is truncated back by removing the trailing $0$ bits back to rightmost 1 bit and zeroing that 1 bit. Thus, whereas in blockchain (Merkle chain) you need n hashes to reach and prove a vertext n blocks back, in a immutable append only Merkle patricia dag, you only need \bigcirc(\log_2n) hashes to reach a vertex n blocks back. The vertex 0010 has an extra link back to the vertex 000, the vertices 0100 and 010 have extra links back to the vertex 00, the vertices 1000, 100, and 10 have extra links back to the vertex 0, and so on and so forth. This enables clients to reach any previous vertex through a chain of hashes, and thus means that each new item in sequence is a hash of all previous data in the tree. Each new item has a hash commitment to all previous items. The clients keep the old roots of the balanced binary trees of blocks around, so the peers cannot sodomize them. This will matter more and more as the blockchain gets bigger, and bigger, resulting in ever fewer peers with ever greater power and ever more clients, whose interests are apt to be different from those of ever the fewer and ever greater and more powerful peers. The superstructure of balanced binary Merkle trees allows us to verify any part of it with only O(log) hashes, and thus to verify that one version of this data structure that one party is using is a later version of the same data structure that another party is using. This reduces the amount of trust that clients have to place in peers. When the blockchain gets very large there will be rather few peers and a great many clients, thus there will be a risk that the peers will plot together to bugger the clients. This structure enables a client to verify that any part of the blockchain is what his peer say it is, and thus avoids the risk that peer may tell different clients different accounts of the consensus. Two clients can quickly verify that they are on the same total order and total set of transactions, and that any item that matters to them is part of this same total order and total set. When the chain becomes very big, sectors and disks will be failing all the time, and we don't want such failures to bring everything to a screaming halt. At present, such failures far too often force you to reindex the blockchain, and redownload a large part of it, which happens far too often and happens more and more as the blockchain becomes enormous. And, when the chain becomes very big, most people will be operating clients, not peers, and they need to be able to ensure that the peers are not lying to them. ### storage We would like to represent an immutable append only data structure by append only files, and by sql tables with sequential and ever growing oids. When we defined the key for a Merkle patricia tree, the key definition gave us the parent node with a key field in the middle of its chilren, infix order. For the tree depicted above, we want postfix order. Normally, if the bitstring is a full field width, the vertex contains the information we actually care about, while if the bitstring is less than the field width, it just contains hashes ensuring the data is immutable, that the past consensus has not been changed underneath us, so, regardless of how the data is actually physically stored on disk, these belong in differnt sql tables. So, the oid of a vertex that has a full field width sized bitstring is simply that bitstring, while the oid of its parent vertices is obtained by appending 1 bits to pad the bitstring out to full field width, and subtracting a count of the number of 1 bits in the original bitstring, std::popcount, which gives us sequential and ever increasing oids for the parent vertices, if the leaf vertices, the vertices with full field width bitstrings, are sequential and ever increasing.. Let us suppose the leaf nodes of the tree depicted above are fixed size c, and the interior vertices are fixed size d (d is probably thirty two or sixty four bytes) and they are being physically stored in memory or a file in sequence. Let us suppose the leaf nodes are stored with the interior vertices and are sequentially numbered. Then the location of leaf node n begins at $n\times c+\big(n-$std::popcount(n)\times d\big) (which unfortunately lacks a simple relationship to the bitstring of a vertex corresponding to a complete field, which is the field that represents the meaning that we actually care about). # Blockchain A Merkle-patricia block chain represents an immutable past and a constantly changing present. Which represents an immutable and ever growing sequence of transactions, and also a large and mutable present state of the present database that is the result of those transactions, the database of unspent transaction outputs. When we are assembling a new block, the records live in memory as native format C++ objects. Upon a new block being finalized, they get written to disk in key order, with implementation dependent offsets between records and implementation dependent compression, which compression likely reflects canonical form. Once written to disk, they are accessed by native format records in memory, which access by bringing disk records into memory in native format, but the least recently loaded entry, or least recetly used entry, gets discarded. Even when we are operating at larger scale than visa, a block representing five minutes of transactions fits easily in memory. Further, a patricia tree is a tree. But we want, when we have the Merkle patricia tree representing registered names organized by names or the Merkle-patricia tree represenging as yet unspent transaction outputs its Merkle characteristic to represent a directed acyclic graph. If two branches have the same hash, despite being at different positions and depths in the tree, all their children will be identical. And we want to take advantage of this in that block chain will be directed acyclic graph, each block being a tree representing the state of the system at that block commitment, but that tree points back into previous block commitments for those parts of the state of the system that have not changed. So the hash of the node in such a tree will identify, probably through an OID, a record of the block it was a originally constructed for, and its index in that tree. A Merkle-patricia directed acyclic graph, Merkle-patricia dac, is a Merkle dac, like a git repository or the block chain, with the patricia key representing the path of hashes, and acting as index through that chain of hashes to find the data that you want. The key will thread through different computers under the control of different people, thus providing a system of witness that the current global consensus hash accurately reflects past global consensus hashes, and that each entities version of the past agree with the version it previously espoused. This introduces some complications when a portion of the tree represents a database table with more than one index.  Ethereum has a discussion and definition of this data structure. Suppose, when the system is at scale, we have thousand trillion entries in the public, readily accessible, and massively replicated part of the blockchain. (I intend that every man and his dog will also have a sidechain, every individual, every business. The individual will normally not have his side chain publicly available, but in the event of a dispute, may make a portion of it visible, so that certain of his payments, an the invoice they were payments for, become visible to others.) In that case, a new transaction output is typically going to require forty thirty two byte hashes, taking up about two kilobytes in total on any one peer. And a single person to person payment is typicaly going to take ten transaction outputs or so, taking twenty kilobytes in total on any one peer. And this is going to be massively replicated by a few hundred peers, taking about four megabytes in total. (A single transaction will typically be much larger than this, because it will mingle several person to person payments. Right now you can get system with sixty four terabytes of hard disk, thirty two gigabytes of ram, under six thousand, for south of a hundred dollars per terabyte, so storing everything forever is going to cost about a twentieth of a cent per person to person payment.  And a single such machine will be good to hold the whole blockchain for the first few trillion person to person payments, good enough to handle paypal volumes for a year. “OK”, I hear you say. “And after the first few trillion transactions?”. Well then, if we have a few trillion transactions a year, and only a few hundred peers, then the clients of any one peer will be doing about ten billion transactions a year. If he profits half a cent per transaction, he is making about fifty million a year. He can buy a few more sixty four terabyte computers every year. The target peer machine we will write for will have thirty two gigabytes of ram and sixty four terabytes of hard disk, but our software should run fine on a small peer machine, four gigabytes of ram and two terabytes of hard disk, until the crypto currency surpasses bitcoin. # vertex identifiers We need a canonical form for all data structures, the form which is hashed, even if it is not convenient to use or manipulate the data in that form on a particular machine with particular hardware and a particular complier. A patricia tree representation of a field and record of fields does not gracefully represent variable sized records. If we represented the bitstring that corresponds to the block number, the block height, has having a large number of leading zero bits, so that it corresponds to a sixty three bit integer (we need the additional low order bit for operations translating the bitstring to its representation as a key field or oid field) a fixed field of sixty four bits will do us fine for a trillion years or so. But I have an aesthetic objection to representing things that are not fixed sized as fixed sized. Therefore I am inclined to represent bit strings as count of bytes, a byte string containing the zero padded bitstring, the bitstring being byte aligned with the field boundary, and count of the distance in bits between the right edge of the bitstring, and the right edge of the field, that being the height of the interior vertex above the leaf vertices containing the actual data that we are interested in, in its representation as an sql table. We do not hash the leading zero bytes of a bitstring that is part of an integer field because we do not know, and do not care, how many zero bytes there are. A particular machine running a program compiled by a particular compiler will represent that integer with a particular sufficiently large machine word, but the hash cannot depend on the word size of a particular machine. A particular machine will represent the bitstring that is part of an integer field with a particular number of leading zero bytes, depending on its hardware and its compiler, but this cannot be allowed to affect the representation on the wire or the value of the hash. If one peer represents the block number as a thirty two bit value, and another peer as a sixty four bit value, and thus thinks the bitstring has four more leading zero bytes than the former peer does, this should have no effect. They should both get the same hashes, because the preimage of our hash should be independent of the number of leading zero bytes. For integer fields such as the block number, we would like to represent integers in a form independent of the computer word size, so we do not know the alignment from start of field for a bitstring that is part of an integer field, only the size of the byte aligned bitstring, and how far the end of the bistring is from the end of the integer field. Each particular peer executing the algorithm then applies as many leading zero bytes to the bitstring as suits the way it represents integers. The skip field of a link crossing a field boundary into an integer field should not tell the machine following that link how many leading zero bytes to add to the bitstring, but where the first non zero byte of the bitstring is above the right edge of the integer, and the peer interpreting that skip field will add as many leading zero bytes to the bitstring as it finds handy for its hardware. Some fields, notably text strings, do not have a definite right hand boundary, representing the boundary inline. In that case, we represent the vertex depth below the start of field, rather than the vertex height above the end of field. We always start walking the vertexes representing an immutable append only Merkle patricia tree knowing the bitstring, so their preimages do not need to contain a vertex bitstring, nor do their links need to add bits to the bitstring, because all the bits added or subtracted are implicit in the choice of branch to take, so those links do not contain representations of skip field bit string either. However, when passing blocks around, we do need to communicate the bitstring of a block, and when passing a hash path, we do need to communicate the bitstring of the root vertex of the path, and many of the hashes will be interior to a block, and thus their links do need bitstrings for their skip fields, and we will need to sign those messages, thus need to hash them, so we need a canonical hash of a bitstring, which requires a canonical representation of a bitstring. A bitstring lives in a field, so the position of the bitstring relative to the field boundary needs a canonical representation, though when we are walking the tree, this information is usually implicit, so does not inherently need to present in the preimage of the vertex. But, since we are putting byte aligned bitfields in byte strings, we need to know where the bitstring of a skip field for a link ends within the byte, which is most conveniently done by giving the field alignment of the end of the bitstring within the field as part of vertex skip fields - assuming there is a skip field, which there frequently will not be.