1
0
forked from cheng/wallet
wallet/docs/blockdag_consensus.md
reaction.la 5238cda077
cleanup, and just do not like pdfs
Also, needed to understand Byzantine fault tolerant paxos better.

Still do not.
2022-02-20 18:26:44 +10:00

25 KiB
Raw Blame History

title
Blockdag Consensus

Hedera, Bitcoin Proof of Work, and Paxos

Paxos

All consensus algorithms that work are equivalent to Paxos.

All consensus algorithms that continue to work despite Byzantine Fault and Brigading are equivalent to Byzantine Fault Tolerant Paxos.

But Paxos is not in fact an algorithm. It rather is an idea that underlies actual useful algorithms, and in so far as it is described as algorithm, it is wrong, for the algorithm as described describes many different things that you are unlikely to be interested in doing, or even comprehending, and the algorithm as described is incapable of doing all sorts of things that you are likely to need done. Even worse it is totally specific to one particular common use case, which it studiously avoids mentioning, and does not mention any of the things that you actually need to couple it in to this specific case, making the description utterly mysterious, because the writer has all the specific details of this common case in mind, but is carefully avoiding any mention of what he has in mind. These things are out of scope of the algorithm as given in the interests of maximum generality, but the algorithm as given is not in fact very general and makes no sense and is no use without them.

Despite the studious effort to be as generic as possible by omitting all of the details required to make it actually do anything useful, the algorithm as given is the simplest and most minimal example of the concept, implementing one specific form of Paxos in one specific way, and as given, will very likely not accomplish you need to do.

Paxos assumes that each peer knows exactly how many peers there should be, though some of them may be permanently or temporarily unresponsive or permanently or temporarily out of contact.

In Paxos, every peer repeatedly sends messages to every other peer, and every peer keeps track of those messages, which if you have a lot of peers adds up to a lot of overhead.

Hedera assumes that each peer knows exactly how many peers there should be, and that each peer eventually gets through.

Which is a much stronger assumption than that made by Paxos or Bitcoin.

In Hedera, each peer's state eventually becomes known to every other peer, even though it does not necessarily communicate directly with every other peer, which if you have a whole lot of peers still adds up to a whole lot of overhead, though not as much as Paxos. It can handle more peers than Paxos, but if too many peers, still going to bite.

A blockdag algorithm such as Hedera functions by in effect forking all the time, and resolving those forks very fast, but if you have almost as many forks as you have peers, resolving all those forks is still going to require receiving a great deal of data, processing a great deal of data, and sending a great deal of data.

Hedera and Paxos can handle a whole lot of transactions very fast, but they cannot reach consensus among a very large number of peers in a reasonable time.

Bitcoin does not know or care how many peers there are, though it does know and care roughly how much hashing power there is, but this is roughly guesstimated over time, over a long time, over a very long time, over a very very long time. It does not need to know exactly how much hashing power there is at any one time.

If there are a very large number of peers, this only slows Bitcoin consensus time down logarithmically, not linearly, while the amount of data per round that any one peer has to handle under Hedera is roughly \bigcirc\big(N\log(N)\big) where N is the number of peers. Bitcoin can handle an astronomically large number of peers, unlike Hedera and Paxos, because Bitcoin does not attempt to produce a definitive, known and well defined consensus. It just provides a plausible guess of the current consensus, and over time you get exponentially greater certainty about the long past consensuses. No peer ever knows the current consensus for sure, it just operates on the recent best guess of its immediate neighbours in the network of what the recent consensus likely is. If it is wrong, it eventually finds out.

Equivalence of Proof of Work and Paxos

Bitcoin is of course equivalent to Byzantine Fault Tolerant Paxos, but I compare it to Paxos because Paxos is difficult to understand, and Byzantine Fault Tolerant Paxos is nigh incomprehensible.

In Paxos, before a peer suggests a value to its peers, it must obtain permission from a majority of peers for that suggestion. And when it seeks permission from each peer, it learns if a value has already been accepted by that peer. If so, it has to accept that value, only propose that value in future, and never propose a different value. Which if everyone always gets through, means that the first time someone proposes a value, that value, being the first his peers have seen, will be accepted by someone, if only by that peer himself.

Paxos is effect a method for figuring out who was "first", in an environment where, due to network delays and lost packets, it is difficult to figure out, or even define, who was first. But if most packets mostly get through quickly enough, the peer that was first by clock time will usually get his way. Similarly Bitcoin, the first miner to construct a valid block at block height N usually winds up defining the consensus for the block at block height N.

This permission functionality of Paxos is equivalent to the gossip process in Bitcoin, where a peer learns what the current block height is, and seeks to add another block, rather than attempt to replace an existing block.

In Paxos, once one peer accepts one value, it will eventually become the consensus value, assuming that everyone eventually gets through and that the usual network problems do not foul things up. Thus Paxos can provide a definitive result eventually, while Bitcoin's results are never definitive, merely exponentially probable.

In Paxos, a peer learns of the definitive and final consensus when it discovers that a majority of peers have accepted one value. Which if several values are in play can take a while, but eventually it is going to happen. In Bitcoin, when the blockchain forks, eventually more hashing power piles on one branch of the fork than the other, and eventually everyone can see that more hashing power has piled on one fork than the other, but there is no moment when a peer discovers than one branch is definitive and final. It just finds that one branch is becoming more and more likely, and all the other branches less and less likely.

Thus paxos has a stronger liveness property than bitcoin, but this difference is in practice not important, for paxos may take an indefinitely long time before it can report a definite and final consensus, while Bitcoin takes a fairly definite time to report it is nearly certain about the consensus value and that value is unlikely is unlikely to change.

Bitcoin does not scale to competing with fiat currency

Bitcoin is limited to ten transactions per second. Credit card networks handle about ten thousand transactions per second.

We will need a crypto coin that enables seven billion people to buy a lollipop.

Blockdag consensus can achieve sufficient speed.

There are thirty or more proposed blockdag systems, and the number grows rapidly.

While blockdags can handle very large numbers of transactions, it is not obvious to me that any of the existing blockdag algorithms can handle very large numbers of peers. When actually implemented, they always wind up privileging a small number of special peers, resulting in hidden centralization, as somehow these special and privileged peers all seem to be in the same data centre as the organization operating the blockchain.

Cardano has a very clever, too clever by half, algorithm to generate random numbers known to everyone and unpredictable and uncontrollable by anyone, with which to distribute specialness fairly and uniformly over time, but this algorithm runs in one centre, rather than using speed of light delay based fair randomness algorithms, which makes me wonder if it is distributing specialness fairly, or operating at all.

I have become inclined to believe that there is no way around making some peers special, but we need to distribute the specialness fairly and uniformly, so that every peer get his turn being special at a certain block height, with the proportion of block heights at which he is special being proportional to his stake.

If the number of peers that have a special role in forming the next block is very small, and the selection and organization of those peers is not furtively centralized to make sure that only one such group forms, but rather organized directly those special peers themselves we wind up with forks sometimes, I hope infrequently, because the special peers should most of the time successfully self organize into a single group that contains almost all of the most special peers. If however, we have another, somewhat larger group of peers that have a special role in deciding which branch of the fork is the most popular, two phase blockdag, I think we can preserve blockdag speed without blockdag de-facto concentration of power.

The algorithm will only have bitcoin liveness, rather than paxos liveness, which is the liveness most blockdag algorithms seek to achieve.

I will have to test this empirically, because it is hard to predict, or even to comprehend, limits on consensus bandwidth.

Bitcoin is limited by its consensus bandwidth

Not by its network bandwidth.

Bitcoin makes the miners wade through molasses. Very thick molasses. That is what proof of work is. If there is a fork, it discovers consensus by noticing which fork has made the most progress through the molasses.

This takes a while. And if there are more forks, it takes longer. To slow down the rate of forks, it makes the molasses thicker. If the molasses is thicker, this slows down fork formation more than it slows down the resolution of forks. It needs to keep the rate of new blocks down slow enough that a miner usually discovers the most recent block before it attempts to add a new block. And if a miner does add a new block at roughly the same time as another miner adds a new block, quite a few more blocks have to be added before the fork is resolved. And as the blocks get bigger, it takes longer for them to circulate. So bigger blocks need thicker molasses. If forks form faster than they can be resolved, no consensus.

The network bandwidth limit

The net bandwidth limit on adding transactions is not a problem.

What bites every blockchain is consensus bandwidth limit, how fast all the peers can agree on the total order of transactions, when transactions are coming in fast.

Suppose a typical transaction consists to two input coins, a change output coin, and the actual payment. (I use the term coin to refer to transaction inputs and outputs, although they dont come in any fixed denominations except as part of anti tracking measures)

Each output coin consists of payment amount, suppose around sixty four bits, and a public key, two hundred and fifty six bits. It also has a script reference on any special conditions as to what constitutes a valid spend, which might have a lot of long arguments, but it generally will not, so the script reference will normally be one byte.

The input coins can be a hash reference to a coin in the consensus blockchain, two fifty six bits, or they can be a reference by total order within the blockchain, sixty four bits.

We can use a Schnorr group signature, which is five hundred and twelve bits no matter how many coins are being signed, no matter how many people are signing, and no matter if it is an n of m signature.

So a typical transaction, assuming we have a good compact representation of transactions, should be around 1680 bits, maybe less.

At scale you inevitably have a large number of clients and a small number of full peers. Say several hundred peers, a few billion clients, most of them lightning gateways. So we can assume every peer has a good connection.

A typical, moderately good, home connection is thirty Mbps download but its upload connection is only ten Mbs or so.

So if our peers are typical decent home connections, and they will be a lot better than that, bandwidth limits them to adding transactions at 10Mbps, six thousand transactions per second, Visa card magnitude. Though if such a large number of transactions are coming in so fast, blockchain storage requirements will be very large, around 24 TiB, about three or four standard home desktop system disk drives. But by the time we get to that scale all peers will be expensive dedicated systems, rather than a background process using its owners spare storage and spare bandwidth, running on the same desktop that its owner uses to shop at Amazon.

Which if everyone in the world is buying their lollipops on the blockchain will still need most people using the lightning network layer, rather than the blockchain layer, but everyone will still routinely access the blockchain layer directly, thus ensuring that problems with their lightning gateways are resolved by a peer they can choose, rather than resolved by their lightning network wallet provider, thus ensuring that we can have a truly decentralized lightning network.

We will not necessarily get a truly decentralized lightning layer, but a base layer capable of handling a lot of transactions makes it physically possible.

So if bandwidth is not a problem, why is bitcoin so slow?

The bottleneck in bitcoin is that to avoid too many forks, which waste time with fork resolution, you need a fair bit of consensus on the previous block before you form the next block.

And bitcoin consensus is slow, because the way a fork is resolved is that blocks that received one branch fork first continue to work on that branch, while blocks that received the other branch first continue to work on that branch, until one branch gets ahead of the other branch, whereupon the leading branch spreads rapidly through the peers. With proof of stake, that is not going work, one can lengthen a branch as fast as you please. Instead, each branch has to be accompanied by evidence of the weight of stake of peers on that branch. Which means the winning branch can start spreading immediately.

Blockdag to the rescue

On a blockdag, you dont need a fair bit of consensus on the previous block to avoid too many forks forming. Every peer is continually forming his own fork, and these forks reach consensus about their left great grand child, or left great great … great grandchild. The blocks that eventually become the consensus as leftmost blocks form a blockchain. So we can roll right ahead, and groups of blocks that deviate from the consensus, which is all of them but one, eventually get included, but later in the total order than they initially thought they were.

In a blockdag, each block has several children, instead of just one. Total order starting from any one block is depth first search. The left blocks come before the right blocks, and the child blocks come before the parent block. Each block may be referenced by several different parent blocks, but only the first reference in the total order matters.

Each leftmost block defines the total order of all previous blocks, the total order being the dag in depth first order.

Each peer disagrees with all the other peers about the total order of recent blocks and recent transactions, each is its own fork, but they all agree about the total order of older blocks and older transactions.

previous work

There are umpteen proposals for blockdags most of them garbage, but the general principle is sound.

For a bunch of algorithms that plausibly claim to approach the upload limit, see:

  • Scalable and probabilistic leaderless bft consensus through metastability

    This explains the underlying concept, that a peer looks at the dag, make its best guess as to which way consensus is going, and joins the seeming consensus, which make it more likely to become the actual consensus.

    Which is a good way of making arbitrary choices where it does not matter which choice everyone makes, provided that they all make the same choice, even though it is an utterly disastrous way of making choices where the choice matters.

    This uses an algorithm that rewards fast mixing peers by making their blocks appear earlier in the total order. This algorithm does not look incentive compatible to me. It looks to me that if all the peers are using that algorithm, then any one peer has an incentive to use a slightly different algorithm.

    The authors use the term Byzantine fault incorrectly, referring to behavior that suggests the unpredictable failures of an unreliable data network as Byzantine failure. No, a Byzantine fault suggests Byzantine defection, treachery, and failure to follow process. It is named after Byzantium because of the stuff that happened during the decline of the Byzantine empire.

  • Prism: Deconstructing the blockchain to approach physical limits

    A messy, unclear, and overly complicated proposed implementation of the blockdag algorithm, which, however, makes the important point that it can go mighty fast, that the physical limits on consensus are bandwidth, storage, and communication delay, and that we can approach these limits.

  • Blockmania: from block dags to consensus

    This brings the important concept, that the tree structure created by gossiping the blockdag around is the blockdag, and also is the data you need to create consensus, bringing together things that were separate in Prism, radically simplifying what is complicated in Prism by uniting data and functionality that Prism divided.

    This study shows that the Blockmania implementation of the blockdag is equivalent to the Practical Byzantine Fault Tolerant consensus algorithm, only a great deal faster, more efficient, and considerably easier to understand.

    The Practical Byzantine Fault Tolerant consensus algorithm is an implementation of the Paxos protocol in the presence of Byzantine faults, and the Paxos protocol is already hard enough to understand.

    So anyone who wants to implement consensus in a system where Byzantine failure and Byzantine defection is possible should forget about Paxos, and study blockdags.

  • A highly scalable, decentralized dagbased consensus algorithm

    Another blockdag algorithm, but one whose performance has been tested. Can handle high bandwidth, lots of transactions, and achieves fast Byzantine fault resistant total order consensus in time O(6λ), where λ is the upper bound of the networks gossip period.

  • Blockchaifree cryptocurrencies: A framework for truly decentralised fast transactions

    These transactions are indeed truly decentralized, fast, and free from blocks, assuming all participants download the entire set of transactions all the time.

    The problem with this algorithm is that when the blockchain grows enormous, most participants will become clients, and only a few giant peers will keep the whole transaction set, and this system, because it does not provide a total order of all transactions, will then place all the power in the hands of the peers.

    We would like the clients to have control of their private keys, thus must publish their public keys with the money they spend, in which case the giant peers must exchange blocks of information containing those keys, and it is back to having blocks.

    The defect of this proposal is that convergence does not converge to a total order on all past transactions, but merely a total set of all past transactions. Since the graph is a graph of transactions, not blocks, double spends are simply excluded, so a total order is not needed. While you can get by with a total set, a total order enables you to do many things a total set does not let you do. Such as publish two conflicting transactions and resolve them.

    Total order can represent consensus decisions that total set cannot easily represent, perhaps cannot represent at all. We need a blockdag algorithm that gives us consensus on the total order of blocks, not just the set of blocks.

    In a total order, you do not just converge to the same set, you converge to the same order of the set. Having the same total order of the set makes makes it, among other things, a great deal easier and faster to check that you have the same set. Plus your set can contain double spends, which you are going to need if the clients themselves can commit transactions through the peers, if the clients themselves hold the secret keys and do not need to trust the peers.

Proposed blockdag implementation

The specific details of many of these proposed systems are rather silly and often vague, typical academic exercises unconcerned with real world issues, but the general idea that the academics intend to illustrate is sound and should work, certainly can be made to work. They need to be understood as academic illustrations of the idea of the general algorithm for fast and massive blockdag consensus, and not necessarily intended as ready to roll implementations of that idea.

Here is an even more vague outline of my variant of this idea, I name Yabca “Yet another blockdag consensus algorithm”,

I propose proof of stake. The stake of a peer is not the stake it owns, but the stake that it has injected into the blockchain on behalf of its clients and that its clients have not spent yet. Each peer pays on behalf of its clients for the amount of space it takes up on the blockchain, though it does not pay in each block. It makes an advance payment that will cover many transactions in many blocks. The money disappears, built in deflation, instead of built in inflation. Each block is a record of what a peer has injected

The system does not pay the peers for generating a total order of transactions. Clients pay peers for injecting transactions. We want the power to be in the hands of people who own the money, thus governance will have a built in bias towards appreciation and deflation, rather than inflation.

The special sauce that makes each proposed blockdag different from each of the others is how each peer decides what consensus is forming about the leftmost edge of the dag, the graph analysis that each peer performs. And this, my special sauce, I will explain when I have something running.

Each peer adopts as its leftmost child for its latest block, a previous block that looks like a good candidate for consensus, which looks like a good candidate for consensus because the left child has a left child that looks like consensus actually is forming around that grandchild , in part because the left child has a … left child has a … left child that looks like it might have consensus, until eventually, as new blocks pile on top of old blocks, we actually do get consensus about the left most child sufficiently deep in the dag from the latest blocks.

The blockdag can run fast because all the forks that are continually forming eventually get stuffed into the consensus total order somewhere. So we dont have to impose a speed limit to prevent excessive forking.

Cost of storage on the blockchain.

Tardigrade charges $120 per year for per terabyte of storage, $45 per terabyte of download

We have a pile of theory, though no practical experience, that a blockdag can approach the physical limits, that its limits are going to be bandwidth and storage..

Storage on the blockdag is going to cost more, because massively replicated, so say three hundred times as much, and is going to be optimized for tiny fragments of data while Tardigrade is optimized for enormous blocks of data, so say three times as much on top of that, a thousand times as expensive to store should be in the right ballpark.

When you download, you are downloading from only a single peer on the blockdag, but you are downloading tiny fragments dispersed over a large pile of data, so again, a thousand times as expensive to download sounds like it might be in the right ballpark.

Then storing a chain of keys and the accompanying roots of total state, with one new key per day for ten years will cost about two dollars over ten years.

Ten megabytes is a pretty big pile of human readable documentation. Let us suppose you want to store ten megabytes of human readable data and read and write access costs a thousand times what tardigrade costs, will cost about twelve dollars.

So, we should consider the blockdag as an immutable store of arbitrary typed data, a reliable broadcast channel, where some types are executable, and, when executed, cause a change in mutable total state, typically that a new unspent coin record is added, and an old unspent coin record is deleted.

In another use, a valid update to a chain of signatures should cause a change in the signature associated with a name, the association being mutable state controlled by immutable data. Thus we can implement corporations on the blockdag by a chain of signatures, each of which represents [an n of m multisig](./PracticalLargeScaleDistributedKeyGeneration.pdf “Practical Large Scale Distributed Key Generation”).