forked from cheng/wallet
5238cda077
Also, needed to understand Byzantine fault tolerant paxos better. Still do not.
313 lines
16 KiB
Markdown
313 lines
16 KiB
Markdown
---
|
||
title: Scaling, trust and clients
|
||
---
|
||
|
||
# Client trust
|
||
|
||
When there are billions of people using the blockchain, it will inevitably
|
||
only be fully verified by a few hundred or at most a few thousand major
|
||
peers, who will inevitably have [interests that do not necessarily coincide]
|
||
with those of the billions of users, who will inevitably have only client
|
||
wallets.
|
||
|
||
[interests that do not necessarily coincide]:https://vitalik.ca/general/2021/05/23/scaling.html
|
||
"Vitalik Buterin talks blockchain scaling"
|
||
|
||
And a few hundred seems to be the minimum size required to stop peers
|
||
with a lot of clients from doing nefarious things. At scale, we are going to
|
||
approach the limits of distributed trust.
|
||
|
||
There are several cures for this. Well, not cures, but measures that can
|
||
alleviate the disease
|
||
|
||
None of these are yet implemented, and we will not get around to
|
||
implementing them until we start to take over the world. But it is
|
||
necessary that what we do implement be upwards compatible with this scaling design:
|
||
|
||
## proof of stake
|
||
|
||
Make the stake of a peer the value of coins (unspent transaction outputs)
|
||
that were injected into the blockchain through that peer. This ensures that
|
||
the interests of the peers will be aligned with the whales, with the interests
|
||
of those that hold a whole lot of value on the blockchain. Same principle
|
||
as a well functioning company board. A company board directly represents
|
||
major shareholders, whose interests are for the most part aligned with
|
||
ordinary shareholders. (This is apt to fail horribly when an accounting or
|
||
law firm is on the board, or a converged investment fund.) This measure
|
||
gives power the whales, who do not want their hosts to do nefarious things.
|
||
|
||
## client verification
|
||
|
||
every single client verifies the transactions that it is directly involved in,
|
||
and a subset of the transactions that gave rise to the coins that it receives.
|
||
|
||
If it verified the ancestry of every coin it received all the way back, it
|
||
would have to verify the entire blockchain, but it can verify the biggest
|
||
ancestor of the biggest ancestor and a random subset of ancestors, thus
|
||
invalid transactions are going immediately generate problems. If every
|
||
client unpredictably verifies a small number of transactions, the net effect
|
||
is going to be that most transactions are going to be unpredictably verified
|
||
by several clients.
|
||
|
||
## sharding, many blockchains
|
||
|
||
Coins in a shard are shares in [sovereign cipher corporations] whose
|
||
primary asset is a coin on the primary blockchain that vests power over
|
||
their name and assets in a frequently changing public key. Every time
|
||
money moves from the main chain to a sidechain, or from one sidechain to
|
||
another, the old coin is spent, and a new coin is created. The public key on
|
||
the mainchain coin corresponds to [a frequently changing secret that is distributed]
|
||
between the peers on the sidechain in proportion to their stake.
|
||
|
||
The mainchain transaction is a big transaction between many sidechains,
|
||
that contains a single output or input from each side chain, with each
|
||
single input or output from each sidechain representing many single
|
||
transactions between sidechains, and each single transaction between
|
||
sidechains representing many single transactions between many clients of
|
||
each sidechain.
|
||
|
||
The single big mainchain transaction merkle chains to the total history of
|
||
each sidechain, and each client of a sidechain can verify any state
|
||
information about his sidechain against the most recent sidechain
|
||
transaction on the mainchain, and routinely does.
|
||
|
||
## lightning layer
|
||
|
||
The [lightning layer] is the correct place for privacy and contracts – because
|
||
we do not want every transaction, let alone every contract, appearing on
|
||
the mainchain. Keeping as much stuff as possible *off* the blockchain helps
|
||
with both privacy and scaling.
|
||
|
||
## zk-snarks
|
||
|
||
Zk-snarks are not yet a solution. They have enormous potential
|
||
benefits for privacy and scaling, but as yet, no one has quite found a way.
|
||
|
||
A zk-snark is a succinct proof that code *was* executed on an immense pile
|
||
of data, and produced the expected, succinct, result. It is a witness that
|
||
someone carried out the calculation he claims he did, and that calculation
|
||
produced the result he claimed it did. So not everyone has to verify the
|
||
blockchain from beginning to end. And not everyone has to know what
|
||
inputs justified what outputs.
|
||
|
||
The innumerable privacy coins around based on zk-snarks are just not
|
||
doing what has to be done to make a zk-snark privacy currency that is
|
||
viable at any reasonable scale. They are intentionally scams, or by
|
||
negligence, unintentionally scams. All the zk-snark coins are doing the
|
||
step from set $N$ of valid coins, valid unspent transaction outputs, to set
|
||
$N+1$, in the old fashioned Satoshi way, and sprinkling a little bit of
|
||
zk-snark magic privacy pixie dust on top (because the task of producing a
|
||
genuine zk-snark proof of coin state for step $N$ to step $N+1$ is just too big
|
||
for them). Which is, intentionally or unintentionally, a scam.
|
||
|
||
Not yet an effective solution for scaling the blockchain, for to scale the
|
||
blockchain, you need a concise proof that any spend in the blockchain was
|
||
only spent once, and while a zk-snark proving this is concise and
|
||
capable of being quickly evaluated by any client, generating the proof is
|
||
an enormous task. Lots of work is being done to render this task
|
||
manageable, but as yet, last time I checked, not manageable at scale.
|
||
Rendering it efficient would be a total game changer, radically changing
|
||
the problem.
|
||
|
||
The fundamental problem is that in order to produce a compact proof that
|
||
the set of coins, unspent transaction outputs, of state $N+1$ was validly
|
||
derived from the set of coins at state $N$, you actually have to have those
|
||
sets of coins, which is not very compact at all, and generate a compact
|
||
proof about a tree lookup and cryptographic verification for each of the
|
||
changes in the set.
|
||
|
||
This is an inherently enormous task at scale, which will have to be
|
||
factored into many, many subtasks, performed by many, many machines.
|
||
Factoring the problem up is hard, for it not only has to be factored, divided
|
||
up, it has to be divided up in a way that is incentive compatible, or else
|
||
the blockchain is going to fail at scale because of peer misconduct,
|
||
transactions are just not going to be validated. Factoring a problem is hard,
|
||
and factoring that has to be mindful of incentive compatibility is
|
||
considerably harder. I am seeing a lot of good work grappling with the
|
||
problem of factoring, dividing the problem into manageable subtasks, but
|
||
it seems to be totally oblivious to the hard problem of incentive compatibility at scale.
|
||
|
||
Incentive compatibility was Satoshi's brilliant insight, and the client trust
|
||
problem is failure of Satoshi's solution to that problem to scale. Existing
|
||
zk-snark solutions fail at scale, though in a different way. With zk-snarks,
|
||
the client can verify the zk-snark, but producing a valid zk-snark in the
|
||
first place is going to be hard, and will rapidly get harder as the scale
|
||
increases.
|
||
|
||
A zk-snark that succinctly proves that the set of coins (unspent transaction
|
||
outputs) at block $N+1$ was validly derived from the set of coins at
|
||
block $N$, and can also prove that any given coin is in that set or not in that
|
||
set is going to have to be a proof about many, many, zk-snarks produced
|
||
by many, many machines, a proof about a very large dag of zk-snarks,
|
||
each zk-snark a vertex in the dag proving some small part of the validity
|
||
of the step from consensus state $N$ of valid coins to consensus state
|
||
$N+1$ of valid coins, and the owners of each of those machines that produced a tree
|
||
vertex for the step from set $N$ to set $N+1$ will need a reward proportionate
|
||
to the task that they have completed, and the validity of the reward will
|
||
need to be part of the proof, and there will need to be a market in those
|
||
rewards, with each vertex in the dag preferring the cheapest source of
|
||
child vertexes. Each of the machines would only need to have a small part
|
||
of the total state $N$, and a small part of the transactions transforming state
|
||
$N$ into state $N+1$. This is hard but doable, but I am just not seeing it done yet.
|
||
|
||
I see good [proposals for factoring the work], but I don't see them
|
||
addressing the incentive compatibility problem. It needs a whole picture
|
||
design, rather than a part of the picture design. A true zk-snark solution
|
||
has to shard the problem of producing state $N+1$, the set of unspent
|
||
transaction outputs, from state $N$, so it should also shard the problem of
|
||
producing a consensus on the total set and order of transactions.
|
||
|
||
[proposals for factoring the work]:https://hackmd.io/@vbuterin/das
|
||
"Data Availability Sampling Phase 1 Proposal"
|
||
|
||
### The problem with zk-snarks
|
||
|
||
Last time I checked, [Cairo] was not ready for prime time.
|
||
|
||
[Cairo]:https://starkware.co/cairo/
|
||
"Cairo - StarkWare Industries Ltd."
|
||
|
||
Maybe it is ready now.
|
||
|
||
The two basic problems with zk-snarks is that even though a zk-snark
|
||
proving something about an enormous data set is quite small and can be
|
||
quickly verified by anyone, it requires enormous computational resources to
|
||
generate the proof, and how does the end user know that the verification
|
||
verifies what it is supposed to verify?
|
||
|
||
To solve the first problem, need distributed generation of the proof,
|
||
constructing a zk-snark that is a proof about a dag of zk-snarks,
|
||
effectively a zk-snark implementation of the map-reduce algorithm for
|
||
massive parallelism. In general map-reduce requires trusted shards that
|
||
will not engage in Byzantine defection, but with zk-snarks they can be
|
||
untrusted, allowing the problem to be massively distributed over the
|
||
internet.
|
||
|
||
To solve the second problem, need an [intelligible scripting language for
|
||
generating zk-snarks], a scripting language that generates serial verifiers
|
||
and massively parallel map-reduce proofs.
|
||
|
||
[intelligible scripting language for
|
||
generating zk-snarks]:https://www.cairo-lang.org
|
||
"Welcome to Cairo
|
||
A Language For Scaling DApps Using STARKs"
|
||
|
||
Both problems are being actively worked on. Both problems need a good deal
|
||
more work, last time I checked. For end user trust in client wallets
|
||
relying on zk-snark verification to be valid, at least some of the end
|
||
users of client wallets will need to themselves generate the verifiers from
|
||
the script.
|
||
|
||
For trust based on zk-snarks to be valid, a very large number of people
|
||
must themselves have the source code to a large program that was
|
||
executed on an immense amount of data, and must themselves build and
|
||
run the verifier to prove that this code was run on the actual data at least
|
||
once, and produced the expected result, even though very few of them will
|
||
ever execute that program on actual data, and there is too much data for
|
||
any one computer to ever execute the program on all the data.
|
||
|
||
Satoshi's fundamental design was that all users should verify the
|
||
blockchain, which becomes impractical when the blockchain approaches four
|
||
hundred gigabytes. A zk-snark design needs to redesign blockchains from
|
||
the beginning, with distributed generation of the proof, but the proof for
|
||
each step in the chain, from mutable state $N$ to mutable state $N+1$, from set
|
||
$N$ of coins, unspent transaction outputs, to set $N+1$ of coins only being
|
||
generated once or generated a quite small number of times, with its
|
||
generation being distributed over all peers through map-reduce, while the
|
||
proof is verified by everyone, peer and client.
|
||
|
||
For good verifier performance, with acceptable prover performance, one
|
||
should construct a stark that can be verified quickly, and then produce
|
||
a libsnark that it was verified at least once ([libsnark proof generation
|
||
being costly], but the proofs are very small and quickly verifiable).
|
||
|
||
At the end of the day, we still need the code generating and executing the
|
||
verification of zk-snarks to be massively replicated, in order that all
|
||
this rigmarole with zk-snarks and starks is actually worthy of producing
|
||
trust.
|
||
|
||
[libsnark proof generation being costly]:https://eprint.iacr.org/2018/046.pdf
|
||
"Scalable computational integrity:
|
||
section 1.3.2: concrete performance"
|
||
|
||
This is not a problem I am working on, but I would be happy to see a
|
||
solution. I am seeing a lot of scam solutions, that sprinkle zk-snarks over
|
||
existing solutions as magic pixie dust, like putting wings on a solid fuel
|
||
rocket and calling it a space plane.
|
||
|
||
[lightning layer]:lightning_layer.html
|
||
|
||
[sovereign cipher corporations]:social_networking.html#many-sovereign-corporations-on-the-blockchain
|
||
|
||
[a frequently changing secret that is distributed]:multisignature.html#scaling
|
||
|
||
# sharding within each single very large peer
|
||
|
||
Sharding within a single peer is an easier problem than sharding the
|
||
blockchain between mutually distrustful peers capable of Byzantine
|
||
defection, and the solutions are apt to be more powerful and efficient.
|
||
|
||
When we go to scale, when we have very large peers on the blockchain,
|
||
we are going to have to have sharding within each very large peer, which will
|
||
multiprocess in the style of Google's massively parallel multiprocessing,
|
||
where scaling and multiprocessing is embedded in interactions with the
|
||
massively distributed database, either on top of an existing distributed
|
||
database such as Rlite or Cockroach, or we will have to extend the
|
||
consensus algorithm so that the shards of each cluster form their own
|
||
distributed database, or extend the consensus algorithm so that peers can
|
||
shard. As preparation for the latter possibility, we need to have each peer
|
||
only form gossip events with a small and durable set of peers with which it
|
||
has lasting relationships, because the events, as we go to scale, tend to
|
||
have large and unequal costs and benefits for each peer. Durable
|
||
relationships make sharding possible, but we will not worry to much about
|
||
sharding until a forty terabyte blockchain comes in sight.
|
||
|
||
When we go to scale, we are going to have to have sharding, which will
|
||
multiprocess in the style of Google’s massively parallel multiprocessing,
|
||
where scaling and multiprocessing is embedded in interactions with the
|
||
massively distributed database, either on top of an existing distributed
|
||
database such as Rlite or Cockroach, or we will have to extend the
|
||
consensus algorithm so that the shards of each cluster form their own
|
||
distributed database, or extend the consensus algorithm so that peers can
|
||
shard. As preparation for the latter possibility, we need to have each peer
|
||
only form gossip events with a small and durable set of peers with which it
|
||
has lasting relationships, because the events, as we go to scale, tend to
|
||
have large and unequal costs and benefits for each peer. Durable
|
||
relationships make sharding possible, but we will not worry to much about
|
||
sharding until a forty terabyte blockchain comes in sight.
|
||
|
||
For sharding, each peer has a copy of a subset of the total blockchain, and
|
||
some peers have a parity set of many such subsets, each peer has a subset
|
||
of the set of unspent transaction outputs as of consensus on total order at
|
||
one time, and is working on constructing a subset of the set of unspent
|
||
transactions as of a recent consensus on total order, each peer has all the
|
||
root hashes of all the balanced binary trees of all the subsets, but not all
|
||
the subsets, each peer has durable relationships with a set of peers that
|
||
have the entire collection of subsets, and two durable relationships with
|
||
peers that have parity sets of all the subsets.
|
||
|
||
Each subset of the append only immutable set of transactions is represented
|
||
by a balanced binary tree of hashes representing $2^n$ blocks of
|
||
the blockchain, and each subset of the mutable set of unspent transaction
|
||
outputs is a subsection of the Merkle-patricia tree of transaction outputs,
|
||
which is part of a directed acyclic graph of all consensus sets of all past
|
||
consensus states of transaction outputs, but no one keeps that entire graph
|
||
around once it gets too big, as it rapidly will, only various subsets of it.
|
||
|
||
But they keep the hashes around that can prove that any subset of it was
|
||
part of the consensus at some time.
|
||
|
||
Gossip vertexes immutable added to the immutable chain of blocks will
|
||
contain the total hash of the state of unspent transactions as of a previous
|
||
consensus block, thus the immutable and ever growing blockchain will contain
|
||
an immutable record of all past consensus Merkle-patricia trees of
|
||
unspent transaction outputs, and thus of the past consensus about the
|
||
dynamic and changing state resulting from the immutable set of all past
|
||
transactions
|
||
|
||
For very old groups of blocks to be discardable, it will from time to time be
|
||
necessary to add repeat copies of old transaction outputs that are still
|
||
unspent, so that the old transactions that gave rise to them can be
|
||
discarded, and one can then re-evaluate the state of the blockchain starting
|
||
from the middle, rather than the very beginning.
|