--- title: Paxos ... Paxos addresses the arrow theorem, and the difficulty of having a reliable broadcast channel. You want a fifty percent vote, so you don’t want anyone voting for two candidates, at leas not until one vote has been invalidated by timeout, and you want to somehow have a single arbitrarily selected agenda to vote up or down. And, having been voted up, you don’t want anything else to be voted up, so that you can definitely know when an agenda has been selected. But Paxos assumes that many of these problems such as who is eligible to vote and what their vote is worth, have solutions that have been somewhat arbitrarily predetermined by the engineer setting things up, and that we don’t have the problem of the Roman Senate and popular assembly, where only about a third of the Senate actually showed up to vote, and an insignificant number of those eligible to vote in popular assembly showed up, most of them clients of senators, so Paxos is not worried about people gaming the system to exclude voters they do not want, nor worried about people gaming the somewhat arbitrary preselection of the agenda to be voted up and down. # Analysing [Paxos Made Simple] in terms of Arrow and total order broadcast [Paxos Made Simple]:./paxos-simple.pdf The trouble with Lamport’s proposal, described in [Paxos Made Simple] is that it assumes no byzantine failure, and that therefore the total order broadcast channel is trivial, and it assumes that any proposal will be acceptable, that all anyone cares about is converging on one proposal, therefore it always converges on the first proposal accepted by one acceptor. > 1. A proposer chooses a new proposal number n and sends a `prepare n` > request to each member of some set of acceptors, asking it to respond > with: > a. A promise never again to accept a proposal numbered less than > `n`, and > b. The proposal with the highest number less than `n` that it has > accepted, if any. > 2. If the proposer receives the requested responses from a majority of > the acceptors, then it can issue a `accept n v` request with number > `n` and value `v`, where `v` is the value of the highest-numbered > proposal among the responses, or is any value selected by the proposer > if the responders reported no proposals accepted. So the proposer either signs on with the existing possible consensus `v`, and notifies a bunch of acceptors with the consensus, or initiates new possible consensus `v` The assumption that `n` can be arbitrary seems to assume the proposers are all agents of the same human, so do not care which proposal is accepted. But we intend that they be agents of different humans. But let us figure out how everything fits together before critiquing that. > if an acceptor ignores a `prepare` or `accept` request because it has already > received a prepare request with a higher number, then it should probably > inform the proposer, who should then abandon its proposal. This is a > performance optimization that does not affect correctness. If a majority of acceptors accept some proposal, then we have a result. But we do not yet have everyone, or indeed anyone, knowing the result. Whereupon we have the total order broadcast channel problem, which Lamport hand waves away. The learners are going to learn it. Somehow. And once we have a result accepted, we then happily go on to the next round. Well, suppose the leader’s proposal is just intolerable? Lamport assumes a level of concord that is unlikely to exist. Well screw them, they have to propose the same value already accepted by an acceptor. So we are going to get a definite result. Worst case outcome, is that proposers keep issuing new higher numbered proposals before any proposal is accepted. Lamport is assuming no byzantine failure – but, assuming no byzantine failure, this is going to generate a definite result sooner or later. But because we could have overlapping proposals with no acceptances, Lamport concludes, we need a “distinguished proposer”, a leader, the primus inter pares, the Chief executive officer. As an efficiency requirement, not a requirement to reach consensus. Trouble is that Lamport’s Paxos algorithm is that as soon as it becomes known that one acceptor has accepted one proposal, everyone should converge to it. But suppose everyone becomes aware of the proposal, and 49% of acceptors think it is great, and 51% of acceptors think it sucks intolerably? If a leader’s proposal could be widely heard, and widely rejected, we have a case not addressed by Lamport’s Paxos protocol. Lamport does not appear to think about the case that the leader’s proposals are rejected because they are just objectionable. # Analysing [Practical Byzantine Fault Tolerance] [Practical Byzantine Fault Tolerance]:./byzantine_paxos.pdf [Practical Byzantine Fault Tolerance] differs from [Paxos Made Simple] in having “views” where the change of leader (what they call “the primary”) is accomplished by a change of “view”, and in having three phases, pre-prepare, prepare, and accept, instead of two phases, prepare and accept. Pre-prepare is the “primary” (leader, CEO, primus inter pares) notifying the “replicas” (peers) of the total order of a client message. Prepare is the “replicas” (peers, total order broadcast channel) notifying each other of association between total order and message digest. Accept is the “replicas” and the client learning that $33\%+1$ of the “replicas” (peers, total order broadcast channel) agree on the total order of the client’s message. # Analysing Raft Protocol The [raft protocol] is inherently insecure against Byzantine attacks, because the leader is fully responsible for managing log replication on the other servers of the cluster [raft protocol]: https://ipads.se.sjtu.edu.cn/_media/publications/wang_podc19.pdf We obviously want the pool to be replicated peer to peer, with the primus inter pares (leader, what [Practical Byzantine Fault Tolerance] call “the primary”) organizing the peers to vote for one block after they have already come close to agreement, and the only differences are transactions not yet widely circulated, or disagreements over which of two conflicting spends is to be incorporated in the next block. I am pretty sure that this mapping of byzantine Paxos to blockchain is garbled, confused, and incorrect. I am missing something, misunderstanding something, there are a bunch of phases that matter which I am leaving out, unaware I have left them out. I will have to revisit this. The Paxos protocol should be understood as a system wherein peers agree on a total ordering of transactions. Each transaction happens within a block, and each block has a sequential integer identifier. Each transaction within a valid block must be non conflicting with every other transaction within a block and consistent with all past transactions, so that although the the block defines a total order on every transaction within the block, all transactions can be applied in parallel. The problem is that we need authoritative agreement on what transactions are part of block N. Proposed transactions flood fill through the peers. A single distinguished entity must propose a block, the pre-prepare message, notice of this proposal, the root hash of the proposal, flood fills through the peers, and peers notify each other of this proposal and that they are attempting to synchronize on it. Synchronizing on the block and validating it are likely to require huge amounts of bandwidth and processing power, and will take significant time. If a peer successfully synchronize, he issues a prepare message. If something is wrong with the block, he issues a nack, a vote against the proposed block, but the nack is informational only. It signals that peers should get ready for a view change, but it is an optimization only. If a peer receives a voting majority of prepare messages, he issues a commit message. And that is the Paxos protocol for that block of transactions. We then go into the Paxos protocol for the block of prepare messages that proves a majority voted “prepare”. The block of prepare messages chains to the block of transactions, and the block of transactions chains to the previous block of prepare messages. And if time goes by, and we have not managed a commit, perhaps because there are lot of nacks due to bad transactions, perhaps because the primus inter pares claims to have transactions that not everyone has, and then is unable to provide them, (maybe the internet went down for it) peers become open to another pre-prepare message from the next in line to be primus inter pares. In order that we can flood fill, we have to be able to simultaneously synchronize on several different views of the pool of transactions. If synchronization on a proposed block is stalling, perhaps because of missing transactions, we end up synchronizing on multiple proposed blocks proposed by multiple entities. Which is great if one runs to completion, and the others do not, but we are likely to run into multiple proposed blocks succeeding. In which case, we have what [Castro and Liskov](./byzantine_paxos.pdf)call a view change. If things are not going anywhere, a peer issues a view change message which flood fills around the pool, nominating the next peer in line for primus inter pares, or the peer nominated in existing pool of view change messages. When it has a majority for a view change in its pool, it will then pay attention to a pre-prepare message from the new primus inter pares, (which it may well have already received) and we the new primus inter pares restarts the protocol for deciding on the next block, issuing a new pre-prepare message (which is likely one that many of the peers have already synchronized on). A peer has a pre-pare message. He has a recent vote that the entity that issued the pre-pare message is primus inter pares. He has, or synchronizes on, that block whose root hash is the one specified in that pre-prepare message issued by that primus-inter-pares. When he has the block, and it is valid, then away we go. He flood fills a prepare message, which prepare message chains to the block, and to the latest vote for primus inter pares. Each new state of the blockchain has a final root hash at its peak, and each peer that accepts the new state of that blockchain has a pile of commits that add up to a majority committing to this new peak. But it is probable that different peers will have different piles of commits. Whenever an entity wants to change the blockchain state (issues a pre-prepare message), it will propose an addition to the blockchain that contains one particular pile of commits validating the previous state of the blockchain. Thus each block contains one specific view of the consensus validating the previous root. But it does not contain the consensus validating itself. That is in the pool, not yet in the blockchain. A change of primus inter pares is itself a change in blockchain state, so when the new primus inter pares issues a new proposed commit, which is to say a new pre-prepare message, it is going to have, in addition to the pile of commits that they may have already synchronized on, a pile of what [Castro and Liskov](./byzantine_paxos.pdf)call view change messages, one specific view of that pile of view change messages. A valid block is going to have a valid pre-prepare message from a valid primus inter pares, and, if necessary, the vote for that primus inter pares. But the change in primus inter pares, for each peer, took place when that peer had a majority of votes for the primus inter pares, and the commit to the new block took place when that peer had the majority of votes for the block. For each peer, the consensus depends on the votes in his pool, not the votes that subsequently get recorded in the block chain. So, a peer will not normally accept a proposed transaction from a client if it already has a conflicting transaction. There is nothing in the protocol enforcing this, but if the double spend is coming from Joe Random Scammer, that is just extra work for that peer and all the other peers. But once a peer has a accepted a double spend transaction, finding consistency is likely to be easier in the final blockchain, where that transaction simply has no outputs. Otherwise for the sake of premature optimization, we complicate the algorithm for reaching consensus. # Fast Byzantine multi paxos # Generating the next block But in the end, we have to generate the next block, which includes some transactions and not others In the bitcoin blockchain, the transactions flood fill through all miners, one miner randomly wins the right to decide on the block, forms the block, and the block floodfills through the blockchain. In our chain, the primus inter pares proposes a block, peers synchronize on it, which should be fast because they have already synchronized with each other, and if there is nothing terribly wrong with it, send in their signatures. When the primus inter pares has enough signatures, he then sends out the signature block, containing all the signatures. If we have a primus inter pares, and his proposal is acceptable, then things proceed straight forwardly through the same synchronization process as we use in flood fill, except that we are focusing on flood filling older information to make sure everyone is in agreement.. After a block is agreed upon, the peers focus on flood filling all the transactions that they possess around. This corresponds to the client phase of Fast Byzantine Collapsed MultiPaxos. When the time for the next block arrives, they stop floodfilling, apart from floodfilling any old transactions that they received before the previous block was agreed, but which were not included in the previous block, and focus on achieving agreement on those old transactions, plus a subset of their new transactions. They try to achieve agreement by postponing new transactions, and flooding old transactions around. When a peer is in the Paxos phase, a synchronization event corresponds to what [Castro and Liskov 4.2](./byzantine_paxos.pdf)call a group commit. Synchronization events with the primary inter pares result in what [Castro and Liskov](./byzantine_paxos.pdf) call a pre-prepare multicast, though if the primus inter pares times out, or its proposal is rejected, we then go into what [Castro and Liskov 4.4](./byzantine_paxos.pdf) call view changes. In their proposal, there is a designated sequence, and if one primus inter pares fails, then you go to the next, and the next, thus reducing the stalling problem when two entities are trying to be primus inter pares. They assume an arbitrary sequence number, pre-assigned. We will instead go through the leading signatories of the previous block, with a succession of timeouts. “Hey, the previous primus inter pares has not responded. Let\’s hear it from the leading signatory of the previous block. Hey. No response from him either. Let us try the second leading signatory”. Trying for a different consensus nominator corresponds to what Castro and Liskov call “view change” The primus inter pares, Castro and Liskov\’s primary, Castro and Liskov\’s view, issues a proposed root hash for the next block (a short message). Everyone chats to everyone else, announcing that they are attempting to synchronize on that root hash, also short messages, preparatory to long messages as they attempt to synchronize. If they succeed in generating the proposed block, and there is nothing wrong with the block, they send approvals (short messages) to everyone else. At some point the primus inter pares wraps up a critical mass of approvals in an approval block (a potentially long message). When everyone has a copy of the proposed block, and the approval block, then the block chain has added another immutable block. Castro and Liskov’s pre prepare message is the primary telling everyone “Hey, let us try for the next block having this root hash” Castro and Liskov’s prepare message is each peer telling all the other peers “Trying to synchronize on this announcement”, confirming that the primus inter pares is active, and assumed to be sane. Castro and Liskov’s commit message is each peer telling all the other peers “I see a voting majority of peers trying to synchronize on this root hash”. At this point Castro and Liskov’s protocol is complete – but of course there is no guarantee that we will be able to synchronize on this root hash - it might contain invalid transactions, it might reference transactions that get lost because peers go down or internet communications are interrupted. So our protocol is not complete. The peers have not committed to the block. They have committed to commit to the block if they have it and it is valid. So, we have a Paxos process where they agree to try for a block of transactions with one specific root hash. Then everyone tries to sync on the block. Then we have a second Paxos process where they agree that they have a block of signatories agreeing that they have the block of transactions and it is valid. The block of transactions Merkle chains to previous blocks of signatories and transactions, and the block of signatories Merkle chains to previous blocks of transactions and signatories. Each short message of commitment contains a short proof that it chains to previous commitments, which a peer already has and has already committed to. When a peer has the block of transactions that a voting majority of peers have agreed to commit to, and the block is valid, it announces it has the block, and that the block has majority support, and it goes into the flood fill transactions phase. In the flood fill transactions phase, it randomly synchronizes its pool data with other random peers, where each peer synchronizes with the other peer by each peer giving the other the pool items it does not yet possess. It also enters into a Paxos consensus phase where peers agree on the authoritative block of signatures, and the time for the next block of transactions, so that each not only has a majority of signatures, a but the same block of signatures forming a majority. When the time for forming the next block arrives, it switches to flood filling only old data around, the point being to converge on common set of transactions. After convergence on a common set of transactions has been going for a while, they expect an announcement of a proposed consensus on those transactions by primus inter pares, the pre-prepare message of the Paxos protocol. They then proceed to Paxos consensus on the intended block, by way of the prepare and commit messages, followed, if all goes well, by a voting majority of peers announcing that they have the block and it is valid. Our implementation of Byzantine Paxos differs from that of Castro and Liskov in that a block corresponds to what they call a stable checkpoint, and also corresponds to what they call a group transaction. If every peer swiftly received every transaction, and then rejected conflicting transactions, then it would approximate their protocol, but you cannot say a transaction is rollback proof until you reach a stable checkpoint, and if rollbacks are possible, people are going to game the system. We want authoritative consensus on what transactions have been accepted more than we want prompt response, as decentralized rollback is apt to be chaotic, unpredictable, and easily gamed. They prioritized prompt response, while allowing the possibility of inconsistent response – which was OK in their application, because no one wanted to manipulate the system into giving inconsistent responses, and inconsistent responses did not leave angry clients out of money.