477 lines
25 KiB
Markdown
477 lines
25 KiB
Markdown
---
|
||
title: Blockdag Consensus
|
||
---
|
||
|
||
# Hedera, Bitcoin Proof of Work, and Paxos
|
||
|
||
## Paxos
|
||
|
||
All consensus algorithms that work are equivalent to Paxos.
|
||
|
||
All consensus algorithms that continue to work despite Byzantine Fault
|
||
and Brigading are equivalent to Byzantine Fault Tolerant Paxos.
|
||
|
||
But Paxos is not in fact an algorithm. It rather is an idea that underlies
|
||
actual useful algorithms, and in so far as it is described as algorithm, it is
|
||
wrong, for the algorithm as described describes many different things that
|
||
you are unlikely to be interested in doing, or even comprehending, and the
|
||
algorithm as described is incapable of doing all sorts of things that you are
|
||
likely to need done. Even worse it is totally specific to one particular
|
||
common use case, which it studiously avoids mentioning, and does not
|
||
mention any of the things that you actually need to couple it in to this
|
||
specific case, making the description utterly mysterious, because the
|
||
writer has all the specific details of this common case in mind, but is carefully avoiding any mention of what he has in mind. These things are
|
||
out of scope of the algorithm as given in the interests of maximum
|
||
generality, but the algorithm as given is not in fact very general and makes
|
||
no sense and is no use without them.
|
||
|
||
Despite the studious effort to be as generic as possible by omitting all of
|
||
the details required to make it actually do anything useful, the algorithm as
|
||
given is the simplest and most minimal example of the concept,
|
||
implementing one specific form of Paxos in one specific way, and as
|
||
given, will very likely not accomplish you need to do.
|
||
|
||
Paxos assumes that each peer knows exactly how many peers there should
|
||
be, though some of them may be permanently or temporarily unresponsive
|
||
or permanently or temporarily out of contact.
|
||
|
||
In Paxos, every peer repeatedly sends messages to every other peer, and
|
||
every peer keeps track of those messages, which if you have a lot of peers
|
||
adds up to a lot of overhead.
|
||
|
||
Hedera assumes that each peer knows exactly how many peers there
|
||
should be, *and that each peer eventually gets through*.
|
||
|
||
Which is a much stronger assumption than that made by Paxos or Bitcoin.
|
||
|
||
In Hedera, each peer's state eventually becomes known to every other
|
||
peer, even though it does not necessarily communicate directly with every
|
||
other peer, which if you have a whole lot of peers still adds up to a whole
|
||
lot of overhead, though not as much as Paxos. It can handle more peers
|
||
than Paxos, but if too many peers, still going to bite.
|
||
|
||
A blockdag algorithm such as Hedera functions by in effect forking all the
|
||
time, and resolving those forks very fast, but if you have almost as many
|
||
forks as you have peers, resolving all those forks is still going to require
|
||
receiving a great deal of data, processing a great deal of data, and sending
|
||
a great deal of data.
|
||
|
||
Hedera and Paxos can handle a whole lot of transactions very fast, but
|
||
they cannot reach consensus among a very large number of peers in a
|
||
reasonable time.
|
||
|
||
Bitcoin does not know or care how many peers there are, though it does
|
||
know and care roughly how much hashing power there is, but this is
|
||
roughly guesstimated over time, over a long time, over a very long time,
|
||
over a very very long time. It does not need to know exactly how much
|
||
hashing power there is at any one time.
|
||
|
||
If there are a very large number of peers, this only slows Bitcoin
|
||
consensus time down logarithmically, not linearly, while the amount of
|
||
data per round that any one peer has to handle under Hedera is roughly
|
||
$\bigcirc\big(N\log(N)\big)$ where N is the number of peers. Bitcoin can handle an
|
||
astronomically large number of peers, unlike Hedera and Paxos, because
|
||
Bitcoin does not attempt to produce a definitive, known and well defined
|
||
consensus. It just provides a plausible guess of the current consensus, and
|
||
over time you get exponentially greater certainty about the long past
|
||
consensuses. No peer ever knows the current consensus for sure, it just
|
||
operates on the recent best guess of its immediate neighbours in the
|
||
network of what the recent consensus likely is. If it is wrong, it eventually
|
||
finds out.
|
||
|
||
## Equivalence of Proof of Work and Paxos
|
||
|
||
Bitcoin is of course equivalent to Byzantine Fault Tolerant Paxos, but I
|
||
compare it to Paxos because Paxos is difficult to understand, and Byzantine
|
||
Fault Tolerant Paxos is nigh incomprehensible.
|
||
|
||
In Paxos, before a peer suggests a value to its peers, it must obtain
|
||
permission from a majority of peers for that suggestion. And when it seeks
|
||
permission from each peer, it learns if a value has already been accepted
|
||
by that peer. If so, it has to accept that value, only propose that value in
|
||
future, and never propose a different value. Which if everyone always gets
|
||
through, means that the first time someone proposes a value, that value,
|
||
being the first his peers have seen, will be accepted by someone, if only by
|
||
that peer himself.
|
||
|
||
Paxos is effect a method for figuring out who was "first", in an
|
||
environment where, due to network delays and lost packets, it is difficult
|
||
to figure out, or even define, who was first. But if most packets mostly get
|
||
through quickly enough, the peer that was first by clock time will usually
|
||
get his way. Similarly Bitcoin, the first miner to construct a valid block at
|
||
block height $N$ usually winds up defining the consensus for the block at
|
||
block height $N$.
|
||
|
||
This permission functionality of Paxos is equivalent to the gossip process
|
||
in Bitcoin, where a peer learns what the current block height is, and seeks
|
||
to add another block, rather than attempt to replace an existing block.
|
||
|
||
In Paxos, once one peer accepts one value, it will eventually become the
|
||
consensus value, assuming that everyone eventually gets through and that
|
||
the usual network problems do not foul things up. Thus Paxos can provide
|
||
a definitive result eventually, while Bitcoin's results are never definitive,
|
||
merely exponentially probable.
|
||
|
||
In Paxos, a peer learns of the definitive and final consensus when it
|
||
discovers that a majority of peers have accepted one value. Which if
|
||
several values are in play can take a while, but eventually it is going to
|
||
happen. In Bitcoin, when the blockchain forks, eventually more hashing
|
||
power piles on one branch of the fork than the other, and eventually
|
||
everyone can see that more hashing power has piled on one fork than the
|
||
other, but there is no moment when a peer discovers than one branch is
|
||
definitive and final. It just finds that one branch is becoming more and
|
||
more likely, and all the other branches less and less likely.
|
||
|
||
Thus paxos has a stronger liveness property than bitcoin, but this
|
||
difference is in practice not important, for paxos may take an indefinitely
|
||
long time before it can report a definite and final consensus, while Bitcoin
|
||
takes a fairly definite time to report it is nearly certain about the consensus
|
||
value and that value is unlikely is unlikely to change.
|
||
|
||
# Bitcoin does not scale to competing with fiat currency
|
||
|
||
Bitcoin is limited to ten transactions per second. Credit card networks
|
||
handle about ten thousand transactions per second.
|
||
|
||
We will need a crypto coin that enables seven billion people to buy a lollipop.
|
||
|
||
Blockdag consensus can achieve sufficient speed.
|
||
|
||
There are thirty or more proposed blockdag systems, and the number grows rapidly.
|
||
|
||
While blockdags can handle very large numbers of transactions, it is not
|
||
obvious to me that any of the existing blockdag algorithms can handle
|
||
very large numbers of peers. When actually implemented, they always
|
||
wind up privileging a small number of special peers, resulting in hidden
|
||
centralization, as somehow these special and privileged peers all seem to
|
||
be in the same data centre as the organization operating the blockchain.
|
||
|
||
Cardano has a very clever, too clever by half, algorithm to generate
|
||
random numbers known to everyone and unpredictable and uncontrollable
|
||
by anyone, with which to distribute specialness fairly and uniformly over
|
||
time, but this algorithm runs in one centre, rather than using speed of light
|
||
delay based fair randomness algorithms, which makes me wonder if it is
|
||
distributing specialness fairly, or operating at all.
|
||
|
||
I have become inclined to believe that there is no way around making
|
||
some peers special, but we need to distribute the specialness fairly and
|
||
uniformly, so that every peer get his turn being special at a certain block
|
||
height, with the proportion of block heights at which he is special being
|
||
proportional to his stake.
|
||
|
||
If the number of peers that have a special role in forming the next block is
|
||
very small, and the selection and organization of those peers is not
|
||
furtively centralized to make sure that only one such group forms, but
|
||
rather organized directly those special peers themselves we wind up with
|
||
forks sometimes, I hope infrequently, because the special peers should
|
||
most of the time successfully self organize into a single group that
|
||
contains almost all of the most special peers. If however, we have another,
|
||
somewhat larger group of peers that have a special role in deciding which
|
||
branch of the fork is the most popular, two phase blockdag, I think we can
|
||
preserve blockdag speed without blockdag de-facto concentration of power.
|
||
|
||
The algorithm will only have bitcoin liveness, rather than paxos liveness,
|
||
which is the liveness most blockdag algorithms seek to achieve.
|
||
|
||
I will have to test this empirically, because it is hard to predict, or even to
|
||
comprehend, limits on consensus bandwidth.
|
||
|
||
## Bitcoin is limited by its consensus bandwidth
|
||
|
||
Not by its network bandwidth.
|
||
|
||
Bitcoin makes the miners wade through molasses. Very thick molasses.
|
||
That is what proof of work is. If there is a fork, it discovers consensus by
|
||
noticing which fork has made the most progress through the molasses.
|
||
|
||
This takes a while. And if there are more forks, it takes longer. To slow
|
||
down the rate of forks, it makes the molasses thicker. If the molasses is
|
||
thicker, this slows down fork formation more than it slows down the
|
||
resolution of forks. It needs to keep the rate of new blocks down slow
|
||
enough that a miner usually discovers the most recent block before it
|
||
attempts to add a new block. And if a miner does add a new block at
|
||
roughly the same time as another miner adds a new block, quite a few
|
||
more blocks have to be added before the fork is resolved. And as the
|
||
blocks get bigger, it takes longer for them to circulate. So bigger blocks
|
||
need thicker molasses. If forks form faster than they can be resolved, no
|
||
consensus.
|
||
|
||
## The network bandwidth limit
|
||
|
||
The net bandwidth limit on adding transactions is not a problem.
|
||
|
||
What bites every blockchain is consensus bandwidth limit, how fast all the
|
||
peers can agree on the total order of transactions, when transactions are
|
||
coming in fast.
|
||
|
||
Suppose a typical transaction consists to two input coins, a change output
|
||
coin, and the actual payment. (I use the term coin to refer to transaction
|
||
inputs and outputs, although they don’t come in any fixed denominations
|
||
except as part of anti tracking measures)
|
||
|
||
Each output coin consists of payment amount, suppose around sixty four bits,
|
||
and a public key, two hundred and fifty six bits. It also has a script
|
||
reference on any special conditions as to what constitutes a valid spend,
|
||
which might have a lot of long arguments, but it generally will not, so the
|
||
script reference will normally be one byte.
|
||
|
||
The input coins can be a hash reference to a coin in the consensus
|
||
blockchain, two fifty six bits, or they can be a reference by total order
|
||
within the blockchain, sixty four bits.
|
||
|
||
We can use a Schnorr group signature, which is five hundred and twelve
|
||
bits no matter how many coins are being signed, no matter how many
|
||
people are signing, and no matter if it is an n of m signature.
|
||
|
||
So a typical transaction, assuming we have a good compact representation
|
||
of transactions, should be around 1680 bits, maybe less.
|
||
|
||
At scale you inevitably have a large number of clients and a small number
|
||
of full peers. Say several hundred peers, a few billion clients, most of them
|
||
lightning gateways. So we can assume every peer has a good connection.
|
||
|
||
A typical, moderately good, home connection is thirty Mbps download but
|
||
its upload connection is only ten Mbs or so.
|
||
|
||
So if our peers are typical decent home connections, and they will be a lot
|
||
better than that, bandwidth limits them to adding transactions at 10Mbps,
|
||
six thousand transactions per second, Visa card magnitude. Though if such
|
||
a large number of transactions are coming in so fast, blockchain storage
|
||
requirements will be very large, around 24 TiB, about three or four
|
||
standard home desktop system disk drives. But by the time we get to that
|
||
scale all peers will be expensive dedicated systems, rather than a
|
||
background process using its owners spare storage and spare bandwidth,
|
||
running on the same desktop that its owner uses to
|
||
shop at Amazon.
|
||
|
||
Which if everyone in the world is buying their lollipops on the blockchain
|
||
will still need most people using the lightning network layer, rather than
|
||
the blockchain layer, but everyone will still routinely access the blockchain
|
||
layer directly, thus ensuring that problems with their lightning
|
||
gateways are resolved by a peer they can choose, rather than resolved by
|
||
their lightning network wallet provider, thus ensuring that we can have a
|
||
truly decentralized lightning network.
|
||
|
||
We will not necessarily *get* a truly decentralized lightning layer, but a base
|
||
layer capable of handling a lot of transactions makes it physically possible.
|
||
|
||
So if bandwidth is not a problem, why is bitcoin so slow?
|
||
|
||
The bottleneck in bitcoin is that to avoid too many forks, which waste time
|
||
with fork resolution, you need a fair bit of consensus on the previous block
|
||
before you form the next block.
|
||
|
||
And bitcoin consensus is slow, because the way a fork is resolved is that
|
||
blocks that received one branch fork first continue to work on that branch,
|
||
while blocks that received the other branch first continue to work on that
|
||
branch, until one branch gets ahead of the other branch, whereupon the
|
||
leading branch spreads rapidly through the peers. With proof of stake, that
|
||
is not going work, one can lengthen a branch as fast as you please. Instead,
|
||
each branch has to be accompanied by evidence of the weight of stake of
|
||
peers on that branch. Which means the winning branch can start spreading
|
||
immediately.
|
||
|
||
# Blockdag to the rescue
|
||
|
||
On a blockdag, you don’t need a fair bit of consensus on the previous
|
||
block to avoid too many forks forming. Every peer is continually forming
|
||
his own fork, and these forks reach consensus about their left great grand
|
||
child, or left great great … great grandchild. The blocks that eventually
|
||
become the consensus as leftmost blocks form a blockchain. So we can
|
||
roll right ahead, and groups of blocks that deviate from the consensus,
|
||
which is all of them but one, eventually get included, but later in the total
|
||
order than they initially thought they were.
|
||
|
||
In a blockdag, each block has several children, instead of just one. Total
|
||
order starting from any one block is depth first search. The left blocks
|
||
come before the right blocks, and the child blocks come before the parent
|
||
block. Each block may be referenced by several different parent blocks, but
|
||
only the first reference in the total order matters.
|
||
|
||
Each leftmost block defines the total order of all previous blocks, the
|
||
total order being the dag in depth first order.
|
||
|
||
Each peer disagrees with all the other peers about the total order of recent
|
||
blocks and recent transactions, each is its own fork, but they all agree
|
||
about the total order of older blocks and older transactions.
|
||
|
||
## previous work
|
||
|
||
[There are umpteen proposals for blockdags](./SoK_Diving_into_DAG-based_Blockchain_Systems) most of them garbage, but the general principle is sound.
|
||
|
||
For a bunch of algorithms that plausibly claim to approach the upload
|
||
limit, see:
|
||
|
||
* [Scalable and probabilistic leaderless bft consensus through metastability](https://files.avalabs.org/papers/consensus.pdf)
|
||
|
||
This explains the underlying concept, that a peer looks at the dag,
|
||
make its best guess as to which way consensus is going, and joins
|
||
the seeming consensus, which make it more likely to become the
|
||
actual consensus.
|
||
|
||
Which is a good way of making arbitrary choices where it does not
|
||
matter which choice everyone makes, provided that they all make
|
||
the same choice, even though it is an utterly disastrous way of
|
||
making choices where the choice matters.
|
||
|
||
This uses an algorithm that rewards fast mixing peers by making
|
||
their blocks appear earlier in the total order. This algorithm does
|
||
not look incentive compatible to me. It looks to me that if all the
|
||
peers are using that algorithm, then any one peer has an incentive
|
||
to use a slightly different algorithm.
|
||
|
||
The authors use the term Byzantine fault incorrectly, referring to
|
||
behavior that suggests the unpredictable failures of an unreliable
|
||
data network as Byzantine failure. No, a Byzantine fault suggests
|
||
Byzantine defection, treachery, and failure to follow process. It is
|
||
named after Byzantium because of the stuff that happened during
|
||
the decline of the Byzantine empire.
|
||
|
||
* [Prism: Deconstructing the blockchain to approach physical limits](https://arxiv.org/pdf/1810.08092.pdf)
|
||
|
||
A messy, unclear, and overly complicated proposed implementation
|
||
of the blockdag algorithm, which, however, makes the important
|
||
point that it can go mighty fast, that the physical limits on
|
||
consensus are bandwidth, storage, and communication delay, and
|
||
that we can approach these limits.
|
||
|
||
* [Blockmania: from block dags to consensus](https://arxiv.org/pdf/1809.01620.pdf)
|
||
|
||
This brings the important concept, that the tree structure created by
|
||
gossiping the blockdag around _is_ the blockdag, and also is the data
|
||
you need to create consensus, bringing together things that were
|
||
separate in Prism, radically simplifying what is complicated in
|
||
Prism by uniting data and functionality that Prism divided.
|
||
|
||
This study shows that the Blockmania implementation of the
|
||
blockdag is equivalent to the Practical Byzantine Fault Tolerant
|
||
consensus algorithm, only a great deal faster, more efficient, and
|
||
considerably easier to understand.
|
||
|
||
The Practical Byzantine Fault Tolerant consensus algorithm is an
|
||
implementation of the Paxos protocol in the presence of Byzantine
|
||
faults, and the Paxos protocol is already hard enough to understand.
|
||
|
||
So anyone who wants to implement consensus in a system where
|
||
Byzantine failure and Byzantine defection is possible should forget
|
||
about Paxos, and study blockdags.
|
||
|
||
* [A highly scalable, decentralized dag–based consensus algorithm](https://eprint.iacr.org/2018/1112.pdf)
|
||
|
||
Another blockdag algorithm, but one whose performance has been tested. Can handle high bandwidth, lots of transactions, and achieves fast Byzantine fault resistant total order consensus in time $O(6λ)$, where λ is the upper bound of the network’s gossip period.
|
||
|
||
* [Blockchai–free cryptocurrencies: A framework for truly decentralised fast transactions](https://eprint.iacr.org/2016/871.pdf)
|
||
|
||
These transactions are indeed truly decentralized, fast, and free from
|
||
blocks, assuming all participants download the entire set of
|
||
transactions all the time.
|
||
|
||
The problem with this algorithm is that when the blockchain grows enormous, most participants will become clients, and only a few giant peers will keep the whole transaction set, and this system, because it does not provide a total order of all transactions, will then place all the power in the hands of the peers.
|
||
|
||
We would like the clients to have control of their private
|
||
keys, thus must publish their public keys with the money they
|
||
spend, in which case the giant peers must exchange blocks of
|
||
information containing those keys, and it is back to having blocks.
|
||
|
||
The defect of this proposal is that convergence does not
|
||
converge to a total order on all past transactions, but merely a total
|
||
set of all past transactions. Since the graph is a graph of
|
||
transactions, not blocks, double spends are simply excluded, so a
|
||
total order is not needed. While you can get by with a total set, a
|
||
total order enables you to do many things a total set does not let
|
||
you do. Such as publish two conflicting transactions and resolve them.
|
||
|
||
Total order can represent consensus decisions that total set cannot
|
||
easily represent, perhaps cannot represent at all. We need a
|
||
blockdag algorithm that gives us consensus on the total order of
|
||
blocks, not just the set of blocks.
|
||
|
||
In a total order, you do not just converge to the same set, you
|
||
converge to the same order of the set. Having the same total order
|
||
of the set makes makes it, among other things, a great deal easier
|
||
and faster to check that you have the same set. Plus your set can
|
||
contain double spends, which you are going to need if the clients
|
||
themselves can commit transactions through the peers, if the clients
|
||
themselves hold the secret keys and do not need to trust the peers.
|
||
|
||
# Proposed blockdag implementation
|
||
|
||
The specific details of many of these proposed systems are rather silly and
|
||
often vague, typical academic exercises unconcerned with real world
|
||
issues, but the general idea that the academics intend to illustrate is sound
|
||
and should work, certainly can be made to work. They need to be
|
||
understood as academic illustrations of the idea of the general algorithm
|
||
for fast and massive blockdag consensus, and not necessarily intended as
|
||
ready to roll implementations of that idea.
|
||
|
||
Here is an even more vague outline of my variant of this idea, I name
|
||
Yabca “Yet another blockdag consensus algorithm”,
|
||
|
||
I propose proof of stake. The stake of a peer is not the stake it owns, but
|
||
the stake that it has injected into the blockchain on behalf of its clients
|
||
and that its clients have not spent yet. Each peer pays on behalf of its
|
||
clients for the amount of space it takes up on the blockchain, though it does
|
||
not pay in each block. It makes an advance payment that will cover many
|
||
transactions in many blocks. The money disappears, built in deflation,
|
||
instead of built in inflation. Each block is a record of what a peer has
|
||
injected
|
||
|
||
The system does not pay the peers for generating a total order of
|
||
transactions. Clients pay peers for injecting transactions. We want the
|
||
power to be in the hands of people who own the money, thus governance will
|
||
have a built in bias towards appreciation and deflation, rather than
|
||
inflation.
|
||
|
||
The special sauce that makes each proposed blockdag different from each
|
||
of the others is how each peer decides what consensus is forming about
|
||
the leftmost edge of the dag, the graph analysis that each peer performs.
|
||
And this, my special sauce, I will explain when I have something running.
|
||
|
||
Each peer adopts as its leftmost child for its latest block, a previous block
|
||
that looks like a good candidate for consensus, which looks like a good
|
||
candidate for consensus because the left child has a left child that looks
|
||
like consensus actually is forming around that grandchild , in part because
|
||
the left child has a … left child has a … left child that looks like it might
|
||
have consensus, until eventually, as new blocks pile on top of old blocks, we
|
||
actually do get consensus about the left most child sufficiently deep in
|
||
the dag from the latest blocks.
|
||
|
||
The blockdag can run fast because all the forks that are continually
|
||
forming eventually get stuffed into the consensus total order somewhere.
|
||
So we don’t have to impose a speed limit to prevent excessive forking.
|
||
|
||
# Cost of storage on the blockchain.
|
||
|
||
Tardigrade charges $120 per year for per terabyte of storage, $45 per terabyte of download
|
||
|
||
We have a pile of theory, though no practical experience, that a blockdag can approach the physical limits, that its limits are going to be bandwidth and storage..
|
||
|
||
Storage on the blockdag is going to cost more, because massively
|
||
replicated, so say three hundred times as much, and is going to be
|
||
optimized for tiny fragments of data while Tardigrade is optimized for
|
||
enormous blocks of data, so say three times as much on top of that, a
|
||
thousand times as expensive to store should be in the right ballpark.
|
||
|
||
When you download, you are downloading from only a single peer on the blockdag, but you are downloading tiny fragments dispersed over a large pile of data, so again, a thousand times as expensive to download sounds like it might be in the right ballpark.
|
||
|
||
Then storing a chain of keys and the accompanying roots of total state,
|
||
with one new key per day for ten years will cost about two dollars over ten
|
||
years.
|
||
|
||
Ten megabytes is a pretty big pile of human readable documentation. Let
|
||
us suppose you want to store ten megabytes of human readable data and
|
||
read and write access costs a thousand times what tardigrade costs, will
|
||
cost about twelve dollars.
|
||
|
||
So, we should consider the blockdag as an immutable store of arbitrary
|
||
typed data, a reliable broadcast channel, where some types are executable,
|
||
and, when executed, cause a change in mutable total state, typically that
|
||
a new unspent coin record is added, and an old unspent coin record is
|
||
deleted.
|
||
|
||
In another use, a valid update to a chain of signatures should cause a
|
||
change in the signature associated with a name, the association being
|
||
mutable state controlled by immutable data. Thus we can implement
|
||
corporations on the blockdag by a chain of signatures, each of which
|
||
represents [an n of m multisig](./PracticalLargeScaleDistributedKeyGeneration.pdf “Practical Large Scale Distributed Key Generation”).
|