forked from cheng/wallet
313 lines
16 KiB
Markdown
313 lines
16 KiB
Markdown
|
---
|
|||
|
title: Scaling, trust and clients
|
|||
|
---
|
|||
|
|
|||
|
# Client trust
|
|||
|
|
|||
|
When there are billions of people using the blockchain, it will inevitably
|
|||
|
only be fully verified by a few hundred or at most a few thousand major
|
|||
|
peers, who will inevitably have [interests that do not necessarily coincide]
|
|||
|
with those of the billions of users, who will inevitably have only client
|
|||
|
wallets.
|
|||
|
|
|||
|
[interests that do not necessarily coincide]:https://vitalik.ca/general/2021/05/23/scaling.html
|
|||
|
"Vitalik Buterin talks blockchain scaling"
|
|||
|
|
|||
|
And a few hundred seems to be the minimum size required to stop peers
|
|||
|
with a lot of clients from doing nefarious things. At scale, we are going to
|
|||
|
approach the limits of distributed trust.
|
|||
|
|
|||
|
There are several cures for this. Well, not cures, but measures that can
|
|||
|
alleviate the disease
|
|||
|
|
|||
|
None of these are yet implemented, and we will not get around to
|
|||
|
implementing them until we start to take over the world. But it is
|
|||
|
necessary that what we do implement be upwards compatible with this scaling design:
|
|||
|
|
|||
|
## proof of stake
|
|||
|
|
|||
|
Make the stake of a peer the value of coins (unspent transaction outputs)
|
|||
|
that were injected into the blockchain through that peer. This ensures that
|
|||
|
the interests of the peers will be aligned with the whales, with the interests
|
|||
|
of those that hold a whole lot of value on the blockchain. Same principle
|
|||
|
as a well functioning company board. A company board directly represents
|
|||
|
major shareholders, whose interests are for the most part aligned with
|
|||
|
ordinary shareholders. (This is apt to fail horribly when an accounting or
|
|||
|
law firm is on the board, or a converged investment fund.) This measure
|
|||
|
gives power the whales, who do not want their hosts to do nefarious things.
|
|||
|
|
|||
|
## client verification
|
|||
|
|
|||
|
every single client verifies the transactions that it is directly involved in,
|
|||
|
and a subset of the transactions that gave rise to the coins that it receives.
|
|||
|
|
|||
|
If it verified the ancestry of every coin it received all the way back, it
|
|||
|
would have to verify the entire blockchain, but it can verify the biggest
|
|||
|
ancestor of the biggest ancestor and a random subset of ancestors, thus
|
|||
|
invalid transactions are going immediately generate problems. If every
|
|||
|
client unpredictably verifies a small number of transactions, the net effect
|
|||
|
is going to be that most transactions are going to be unpredictably verified
|
|||
|
by several clients.
|
|||
|
|
|||
|
## sharding, many blockchains
|
|||
|
|
|||
|
Coins in a shard are shares in [sovereign cipher corporations] whose
|
|||
|
primary asset is a coin on the primary blockchain that vests power over
|
|||
|
their name and assets in a frequently changing public key. Every time
|
|||
|
money moves from the main chain to a sidechain, or from one sidechain to
|
|||
|
another, the old coin is spent, and a new coin is created. The public key on
|
|||
|
the mainchain coin corresponds to [a frequently changing secret that is distributed]
|
|||
|
between the peers on the sidechain in proportion to their stake.
|
|||
|
|
|||
|
The mainchain transaction is a big transaction between many sidechains,
|
|||
|
that contains a single output or input from each side chain, with each
|
|||
|
single input or output from each sidechain representing many single
|
|||
|
transactions between sidechains, and each single transaction between
|
|||
|
sidechains representing many single transactions between many clients of
|
|||
|
each sidechain.
|
|||
|
|
|||
|
The single big mainchain transaction merkle chains to the total history of
|
|||
|
each sidechain, and each client of a sidechain can verify any state
|
|||
|
information about his sidechain against the most recent sidechain
|
|||
|
transaction on the mainchain, and routinely does.
|
|||
|
|
|||
|
## lightning layer
|
|||
|
|
|||
|
The [lightning layer] is the correct place for privacy and contracts – because
|
|||
|
we do not want every transaction, let alone every contract, appearing on
|
|||
|
the mainchain. Keeping as much stuff as possible *off* the blockchain helps
|
|||
|
with both privacy and scaling.
|
|||
|
|
|||
|
## zk-snarks
|
|||
|
|
|||
|
Zk-snarks are not yet a solution. They have enormous potential
|
|||
|
benefits for privacy and scaling, but as yet, no one has quite found a way.
|
|||
|
|
|||
|
A zk-snark is a succinct proof that code *was* executed on an immense pile
|
|||
|
of data, and produced the expected, succinct, result. It is a witness that
|
|||
|
someone carried out the calculation he claims he did, and that calculation
|
|||
|
produced the result he claimed it did. So not everyone has to verify the
|
|||
|
blockchain from beginning to end. And not everyone has to know what
|
|||
|
inputs justified what outputs.
|
|||
|
|
|||
|
The innumerable privacy coins around based on zk-snarks are just not
|
|||
|
doing what has to be done to make a zk-snark privacy currency that is
|
|||
|
viable at any reasonable scale. They are intentionally scams, or by
|
|||
|
negligence, unintentionally scams. All the zk-snark coins are doing the
|
|||
|
step from set $N$ of valid coins, valid unspent transaction outputs, to set
|
|||
|
$N+1$, in the old fashioned Satoshi way, and sprinkling a little bit of
|
|||
|
zk-snark magic privacy pixie dust on top (because the task of producing a
|
|||
|
genuine zk-snark proof of coin state for step $N$ to step $N+1$ is just too big
|
|||
|
for them). Which is, intentionally or unintentionally, a scam.
|
|||
|
|
|||
|
Not yet an effective solution for scaling the blockchain, for to scale the
|
|||
|
blockchain, you need a concise proof that any spend in the blockchain was
|
|||
|
only spent once, and while a zk-snark proving this is concise and
|
|||
|
capable of being quickly evaluated by any client, generating the proof is
|
|||
|
an enormous task. Lots of work is being done to render this task
|
|||
|
manageable, but as yet, last time I checked, not manageable at scale.
|
|||
|
Rendering it efficient would be a total game changer, radically changing
|
|||
|
the problem.
|
|||
|
|
|||
|
The fundamental problem is that in order to produce a compact proof that
|
|||
|
the set of coins, unspent transaction outputs, of state $N+1$ was validly
|
|||
|
derived from the set of coins at state $N$, you actually have to have those
|
|||
|
sets of coins, which is not very compact at all, and generate a compact
|
|||
|
proof about a tree lookup and cryptographic verification for each of the
|
|||
|
changes in the set.
|
|||
|
|
|||
|
This is an inherently enormous task at scale, which will have to be
|
|||
|
factored into many, many subtasks, performed by many, many machines.
|
|||
|
Factoring the problem up is hard, for it not only has to be factored, divided
|
|||
|
up, it has to be divided up in a way that is incentive compatible, or else
|
|||
|
the blockchain is going to fail at scale because of peer misconduct,
|
|||
|
transactions are just not going to be validated. Factoring a problem is hard,
|
|||
|
and factoring that has to be mindful of incentive compatibility is
|
|||
|
considerably harder. I am seeing a lot of good work grappling with the
|
|||
|
problem of factoring, dividing the problem into manageable subtasks, but
|
|||
|
it seems to be totally oblivious to the hard problem of incentive compatibility at scale.
|
|||
|
|
|||
|
Incentive compatibility was Satoshi's brilliant insight, and the client trust
|
|||
|
problem is failure of Satoshi's solution to that problem to scale. Existing
|
|||
|
zk-snark solutions fail at scale, though in a different way. With zk-snarks,
|
|||
|
the client can verify the zk-snark, but producing a valid zk-snark in the
|
|||
|
first place is going to be hard, and will rapidly get harder as the scale
|
|||
|
increases.
|
|||
|
|
|||
|
A zk-snark that succinctly proves that the set of coins (unspent transaction
|
|||
|
outputs) at block $N+1$ was validly derived from the set of coins at
|
|||
|
block $N$, and can also prove that any given coin is in that set or not in that
|
|||
|
set is going to have to be a proof about many, many, zk-snarks produced
|
|||
|
by many, many machines, a proof about a very large dag of zk-snarks,
|
|||
|
each zk-snark a vertex in the dag proving some small part of the validity
|
|||
|
of the step from consensus state $N$ of valid coins to consensus state
|
|||
|
$N+1$ of valid coins, and the owners of each of those machines that produced a tree
|
|||
|
vertex for the step from set $N$ to set $N+1$ will need a reward proportionate
|
|||
|
to the task that they have completed, and the validity of the reward will
|
|||
|
need to be part of the proof, and there will need to be a market in those
|
|||
|
rewards, with each vertex in the dag preferring the cheapest source of
|
|||
|
child vertexes. Each of the machines would only need to have a small part
|
|||
|
of the total state $N$, and a small part of the transactions transforming state
|
|||
|
$N$ into state $N+1$. This is hard but doable, but I am just not seeing it done yet.
|
|||
|
|
|||
|
I see good [proposals for factoring the work], but I don't see them
|
|||
|
addressing the incentive compatibility problem. It needs a whole picture
|
|||
|
design, rather than a part of the picture design. A true zk-snark solution
|
|||
|
has to shard the problem of producing state $N+1$, the set of unspent
|
|||
|
transaction outputs, from state $N$, so it should also shard the problem of
|
|||
|
producing a consensus on the total set and order of transactions.
|
|||
|
|
|||
|
[proposals for factoring the work]:https://hackmd.io/@vbuterin/das
|
|||
|
"Data Availability Sampling Phase 1 Proposal"
|
|||
|
|
|||
|
### The problem with zk-snarks
|
|||
|
|
|||
|
Last time I checked, [Cairo] was not ready for prime time.
|
|||
|
|
|||
|
[Cairo]:https://starkware.co/cairo/
|
|||
|
"Cairo - StarkWare Industries Ltd."
|
|||
|
|
|||
|
Maybe it is ready now.
|
|||
|
|
|||
|
The two basic problems with zk-snarks is that even though a zk-snark
|
|||
|
proving something about an enormous data set is quite small and can be
|
|||
|
quickly verified by anyone, it requires enormous computational resources to
|
|||
|
generate the proof, and how does the end user know that the verification
|
|||
|
verifies what it is supposed to verify?
|
|||
|
|
|||
|
To solve the first problem, need distributed generation of the proof,
|
|||
|
constructing a zk-snark that is a proof about a dag of zk-snarks,
|
|||
|
effectively a zk-snark implementation of the map-reduce algorithm for
|
|||
|
massive parallelism. In general map-reduce requires trusted shards that
|
|||
|
will not engage in Byzantine defection, but with zk-snarks they can be
|
|||
|
untrusted, allowing the problem to be massively distributed over the
|
|||
|
internet.
|
|||
|
|
|||
|
To solve the second problem, need an [intelligible scripting language for
|
|||
|
generating zk-snarks], a scripting language that generates serial verifiers
|
|||
|
and massively parallel map-reduce proofs.
|
|||
|
|
|||
|
[intelligible scripting language for
|
|||
|
generating zk-snarks]:https://www.cairo-lang.org
|
|||
|
"Welcome to Cairo
|
|||
|
A Language For Scaling DApps Using STARKs"
|
|||
|
|
|||
|
Both problems are being actively worked on. Both problems need a good deal
|
|||
|
more work, last time I checked. For end user trust in client wallets
|
|||
|
relying on zk-snark verification to be valid, at least some of the end
|
|||
|
users of client wallets will need to themselves generate the verifiers from
|
|||
|
the script.
|
|||
|
|
|||
|
For trust based on zk-snarks to be valid, a very large number of people
|
|||
|
must themselves have the source code to a large program that was
|
|||
|
executed on an immense amount of data, and must themselves build and
|
|||
|
run the verifier to prove that this code was run on the actual data at least
|
|||
|
once, and produced the expected result, even though very few of them will
|
|||
|
ever execute that program on actual data, and there is too much data for
|
|||
|
any one computer to ever execute the program on all the data.
|
|||
|
|
|||
|
Satoshi's fundamental design was that all users should verify the
|
|||
|
blockchain, which becomes impractical when the blockchain approaches four
|
|||
|
hundred gigabytes. A zk-snark design needs to redesign blockchains from
|
|||
|
the beginning, with distributed generation of the proof, but the proof for
|
|||
|
each step in the chain, from mutable state $N$ to mutable state $N+1$, from set
|
|||
|
$N$ of coins, unspent transaction outputs, to set $N+1$ of coins only being
|
|||
|
generated once or generated a quite small number of times, with its
|
|||
|
generation being distributed over all peers through map-reduce, while the
|
|||
|
proof is verified by everyone, peer and client.
|
|||
|
|
|||
|
For good verifier performance, with acceptable prover performance, one
|
|||
|
should construct a stark that can be verified quickly, and then produce
|
|||
|
a libsnark that it was verified at least once ([libsnark proof generation
|
|||
|
being costly], but the proofs are very small and quickly verifiable).
|
|||
|
|
|||
|
At the end of the day, we still need the code generating and executing the
|
|||
|
verification of zk-snarks to be massively replicated, in order that all
|
|||
|
this rigmarole with zk-snarks and starks is actually worthy of producing
|
|||
|
trust.
|
|||
|
|
|||
|
[libsnark proof generation being costly]:https://eprint.iacr.org/2018/046.pdf
|
|||
|
"Scalable computational integrity:
|
|||
|
section 1.3.2: concrete performance"
|
|||
|
|
|||
|
This is not a problem I am working on, but I would be happy to see a
|
|||
|
solution. I am seeing a lot of scam solutions, that sprinkle zk-snarks over
|
|||
|
existing solutions as magic pixie dust, like putting wings on a solid fuel
|
|||
|
rocket and calling it a space plane.
|
|||
|
|
|||
|
[lightning layer]:lightning_layer.html
|
|||
|
|
|||
|
[sovereign cipher corporations]:social_networking.html#many-sovereign-corporations-on-the-blockchain
|
|||
|
|
|||
|
[a frequently changing secret that is distributed]:multisignature.html#scaling
|
|||
|
|
|||
|
# sharding within each single very large peer
|
|||
|
|
|||
|
Sharding within a single peer is an easier problem than sharding the
|
|||
|
blockchain between mutually distrustful peers capable of Byzantine
|
|||
|
defection, and the solutions are apt to be more powerful and efficient.
|
|||
|
|
|||
|
When we go to scale, when we have very large peers on the blockchain,
|
|||
|
we are going to have to have sharding within each very large peer, which will
|
|||
|
multiprocess in the style of Google's massively parallel multiprocessing,
|
|||
|
where scaling and multiprocessing is embedded in interactions with the
|
|||
|
massively distributed database, either on top of an existing distributed
|
|||
|
database such as Rlite or Cockroach, or we will have to extend the
|
|||
|
consensus algorithm so that the shards of each cluster form their own
|
|||
|
distributed database, or extend the consensus algorithm so that peers can
|
|||
|
shard. As preparation for the latter possibility, we need to have each peer
|
|||
|
only form gossip events with a small and durable set of peers with which it
|
|||
|
has lasting relationships, because the events, as we go to scale, tend to
|
|||
|
have large and unequal costs and benefits for each peer. Durable
|
|||
|
relationships make sharding possible, but we will not worry to much about
|
|||
|
sharding until a forty terabyte blockchain comes in sight.
|
|||
|
|
|||
|
When we go to scale, we are going to have to have sharding, which will
|
|||
|
multiprocess in the style of Google’s massively parallel multiprocessing,
|
|||
|
where scaling and multiprocessing is embedded in interactions with the
|
|||
|
massively distributed database, either on top of an existing distributed
|
|||
|
database such as Rlite or Cockroach, or we will have to extend the
|
|||
|
consensus algorithm so that the shards of each cluster form their own
|
|||
|
distributed database, or extend the consensus algorithm so that peers can
|
|||
|
shard. As preparation for the latter possibility, we need to have each peer
|
|||
|
only form gossip events with a small and durable set of peers with which it
|
|||
|
has lasting relationships, because the events, as we go to scale, tend to
|
|||
|
have large and unequal costs and benefits for each peer. Durable
|
|||
|
relationships make sharding possible, but we will not worry to much about
|
|||
|
sharding until a forty terabyte blockchain comes in sight.
|
|||
|
|
|||
|
For sharding, each peer has a copy of a subset of the total blockchain, and
|
|||
|
some peers have a parity set of many such subsets, each peer has a subset
|
|||
|
of the set of unspent transaction outputs as of consensus on total order at
|
|||
|
one time, and is working on constructing a subset of the set of unspent
|
|||
|
transactions as of a recent consensus on total order, each peer has all the
|
|||
|
root hashes of all the balanced binary trees of all the subsets, but not all
|
|||
|
the subsets, each peer has durable relationships with a set of peers that
|
|||
|
have the entire collection of subsets, and two durable relationships with
|
|||
|
peers that have parity sets of all the subsets.
|
|||
|
|
|||
|
Each subset of the append only immutable set of transactions is represented
|
|||
|
by a balanced binary tree of hashes representing $2^n$ blocks of
|
|||
|
the blockchain, and each subset of the mutable set of unspent transaction
|
|||
|
outputs is a subsection of the Merkle-patricia tree of transaction outputs,
|
|||
|
which is part of a directed acyclic graph of all consensus sets of all past
|
|||
|
consensus states of transaction outputs, but no one keeps that entire graph
|
|||
|
around once it gets too big, as it rapidly will, only various subsets of it.
|
|||
|
|
|||
|
But they keep the hashes around that can prove that any subset of it was
|
|||
|
part of the consensus at some time.
|
|||
|
|
|||
|
Gossip vertexes immutable added to the immutable chain of blocks will
|
|||
|
contain the total hash of the state of unspent transactions as of a previous
|
|||
|
consensus block, thus the immutable and ever growing blockchain will contain
|
|||
|
an immutable record of all past consensus Merkle-patricia trees of
|
|||
|
unspent transaction outputs, and thus of the past consensus about the
|
|||
|
dynamic and changing state resulting from the immutable set of all past
|
|||
|
transactions
|
|||
|
|
|||
|
For very old groups of blocks to be discardable, it will from time to time be
|
|||
|
necessary to add repeat copies of old transaction outputs that are still
|
|||
|
unspent, so that the old transactions that gave rise to them can be
|
|||
|
discarded, and one can then re-evaluate the state of the blockchain starting
|
|||
|
from the middle, rather than the very beginning.
|