wallet/docs/scale_clients_trust.md

---
title: Scaling, trust and clients
---

# Client trust

When there are billions of people using the blockchain, it will inevitably
only be fully verified by a few hundred or at most a few thousand major
peers, who will inevitably have [interests that do not necessarily coincide]
with those of the billions of users, who will inevitably have only client
wallets.

[interests that do not necessarily coincide]:https://vitalik.ca/general/2021/05/23/scaling.html
"Vitalik Buterin talks blockchain scaling"

And a few hundred seems to be the minimum size required to stop peers
with a lot of clients from doing nefarious things. At scale, we are going to
approach the limits of distributed trust.

There are several cures for this. Well, not cures, but measures that can
alleviate the disease

None of these are yet implemented, and we will not get around to
implementing them until we start to take over the world. But it is
necessary that what we do implement be upwards compatible with this scaling design:

## proof of stake

Make the stake of a peer the value of coins (unspent transaction outputs)
that were injected into the blockchain through that peer. This ensures that
the interests of the peers will be aligned with the whales, with the interests
of those that hold a whole lot of value on  the blockchain. Same principle
as a well functioning company board.   A company board directly represents
major shareholders, whose interests are for the most part aligned with
ordinary shareholders. (This is apt to fail horribly when an accounting or
law firm is on the board, or a converged investment fund.) This measure
gives power the whales, who do not want their hosts to do nefarious things.

## client verification

every single client verifies the transactions that it is directly involved in,
and a subset of the transactions that gave rise to the coins that it receives.

If it verified the ancestry of every coin it received all the way back, it
would have to verify the entire blockchain, but it can verify the biggest
ancestor of the biggest ancestor and a random subset of  ancestors, thus
invalid transactions are going immediately generate problems. If every
client unpredictably verifies a small number of transactions, the net effect
is going to be that most transactions are going to be unpredictably verified
by several clients.

## sharding, many blockchains

Coins in a shard are shares in [sovereign cipher corporations] whose
primary asset is a coin on the primary blockchain that vests power over
their name and assets in a frequently changing public key. Every time
money moves from the main chain to a sidechain, or from one sidechain to
another, the old coin is spent, and a new coin is created. The public key on
the mainchain coin corresponds to [a frequently changing secret that is distributed]
between the peers on the sidechain in proportion to their stake.

The mainchain transaction is a big transaction between many sidechains,
that contains a single output or input from each side chain, with each
single input or output from each sidechain representing many single
transactions between sidechains, and each single transaction between
sidechains representing many single transactions between many clients of
each sidechain.

The single big mainchain transaction merkle chains to the total history of
each sidechain, and each client of a sidechain can verify any state
information about his sidechain against the most recent sidechain
transaction on the mainchain, and routinely does.

## lightning layer

The [lightning layer] is the correct place for privacy and contracts – because
we do not want every transaction, let alone every contract, appearing on
the mainchain.  Keeping as much stuff as possible *off* the blockchain helps
with both privacy and scaling.

## zk-snarks

Zk-snarks are not yet a solution.  They have enormous potential
benefits for privacy and scaling, but as yet, no one has quite found a way.

A zk-snark is a succinct proof that code *was* executed on an immense pile
of data, and produced the expected, succinct, result.  It is a witness that
someone carried out the calculation he claims he did, and that calculation
produced the result he claimed it did.  So not everyone has to verify the
blockchain from beginning to end.  And not everyone has to know what
inputs justified what outputs.

The innumerable privacy coins around based on zk-snarks are just not
doing what has to be done to make a zk-snark privacy currency that is
viable at any reasonable scale.  They are intentionally scams, or by
negligence, unintentionally scams.   All the zk-snark coins are doing the
step from set $N$ of valid coins, valid unspent transaction outputs, to set
$N+1$, in the old fashioned Satoshi way, and sprinkling a little bit of
zk-snark magic privacy pixie dust on top (because the task of producing a
genuine zk-snark proof of coin state for step $N$ to step $N+1$ is just too big
for them).  Which is, intentionally or unintentionally, a scam.

Not yet an effective solution for scaling the blockchain, for to scale the
blockchain, you need a concise proof that any spend in the blockchain was
only spent once, and while a zk-snark proving this is concise and
capable of being quickly evaluated by any client, generating the proof is
an enormous task.  Lots of work is being done to render this task
manageable, but as yet, last time I checked, not manageable at scale.
Rendering it efficient would be a total game changer, radically changing
the problem.

The fundamental problem is that in order to produce a compact proof that
the set of coins, unspent transaction outputs, of state $N+1$ was validly
derived from the set of coins at state $N$, you actually have to have those
sets of coins, which is not very compact at all, and generate a compact
proof about a tree lookup and cryptographic verification for each of the
changes in the set.

This is an inherently enormous task at scale, which will have to be
factored into many, many subtasks, performed by many, many machines.
Factoring the problem up is hard, for it not only has to be factored, divided
up, it has to be divided up in a way that is incentive compatible, or else
the blockchain is going to fail at scale because of peer misconduct,
transactions are just not going to be validated.  Factoring a problem is hard,
and factoring that has to be mindful of incentive compatibility is
considerably harder.  I am seeing a lot of good work grappling with the
problem of factoring, dividing the problem into manageable subtasks, but
it seems to be totally oblivious to the hard problem of incentive compatibility at scale.

Incentive compatibility was Satoshi's brilliant insight, and the client trust
problem is failure of Satoshi's solution to that problem to scale.  Existing
zk-snark solutions fail at scale, though in a different way.  With zk-snarks,
the client can verify the zk-snark, but producing a valid zk-snark in the
first place is going to be hard, and will rapidly get harder as the scale
increases.

A zk-snark that succinctly proves that the set of coins (unspent transaction
outputs) at block $N+1$ was validly derived from the set of coins at
block $N$, and can also prove that any given coin is in that set or not in that
set is going to have to be a proof about many, many, zk-snarks produced
by many, many machines, a proof about a very large dag of zk-snarks,
each zk-snark a vertex in the dag proving some small part of the validity
of the step from consensus state $N$ of valid coins to consensus state
$N+1$ of valid coins, and the owners of each of those machines that produced a tree
vertex for the step from set $N$ to set $N+1$ will need a reward proportionate
to the task that they have completed, and the validity of the reward will
need to be part of the proof, and there will need to be a market in those
rewards, with each vertex in the dag preferring the cheapest source of
child vertexes.  Each of the machines would only need to have a small part
of the total state $N$, and a small part of the transactions transforming state
$N$ into state $N+1$. This is hard but doable, but I am just not seeing it done yet.

I see good [proposals for factoring the work], but I don't see them
addressing the incentive compatibility problem.  It needs a whole picture
design, rather than a part of the picture design.  A true zk-snark solution
has to shard the problem of producing state $N+1$, the set of unspent
transaction outputs, from state $N$, so it should also shard the problem of
producing a consensus on the total set and order of transactions.

[proposals for factoring the work]:https://hackmd.io/@vbuterin/das
"Data Availability Sampling Phase 1 Proposal"

### The problem with zk-snarks

Last time I checked, [Cairo] was not ready for prime time.

[Cairo]:https://starkware.co/cairo/
"Cairo - StarkWare Industries Ltd."

Maybe it is ready now.

The two basic problems with zk-snarks is that even though a zk-snark
proving something about an enormous data set is quite small and can be
quickly verified by anyone, it requires enormous computational resources to
generate the proof, and how does the end user know that the verification
verifies what it is supposed to verify?

To solve the first problem, need distributed generation of the proof,
constructing a zk-snark that is a proof about a dag of zk-snarks,
effectively a zk-snark implementation of the map-reduce algorithm for
massive parallelism.  In general map-reduce requires trusted shards that
will not engage in Byzantine defection, but with zk-snarks they can be
untrusted, allowing the problem to be massively distributed over the
internet.

To solve the second problem, need an [intelligible scripting language for
generating zk-snarks], a scripting language that generates serial verifiers
and massively parallel map-reduce proofs.

[intelligible scripting language for
generating zk-snarks]:https://www.cairo-lang.org
"Welcome to Cairo
A Language For Scaling DApps Using STARKs"

Both problems are being actively worked on.  Both problems need a good deal
more work, last time I checked.  For end user trust in client wallets
relying on zk-snark verification to be valid, at least some of the end
users of client wallets will need to themselves generate the verifiers from
the script.

For trust based on zk-snarks to be valid, a very large number of people
must themselves have the source code to a large program that was
executed on an immense amount of data, and must themselves build and
run the verifier to prove that this code was run on the actual data at least
once, and produced the expected result, even though very few of them will
ever execute that program on actual data, and there is too much data for
any one computer to ever execute the program on all the data.

Satoshi's fundamental design was that all users should verify the
blockchain, which becomes impractical when the blockchain approaches four
hundred gigabytes.  A zk-snark design needs to redesign blockchains from
the beginning, with distributed generation of the proof, but the proof for
each step in the chain, from mutable state $N$ to mutable state $N+1$, from set
$N$ of coins, unspent transaction outputs, to set $N+1$ of coins only being
generated once or generated a quite small number of times, with its
generation being distributed over all peers through map-reduce, while the
proof is verified by everyone, peer and client.

For good verifier performance, with acceptable prover performance, one
should construct a stark that can be verified quickly, and then produce
a libsnark that it was verified at least once  ([libsnark proof generation
being costly], but the proofs are very small and quickly verifiable).

At the end of the day, we still need the code generating and executing the
verification of zk-snarks to be massively replicated, in order that all
this rigmarole with zk-snarks and starks is actually worthy of producing
trust.

[libsnark proof generation being costly]:https://eprint.iacr.org/2018/046.pdf
"Scalable computational integrity:
section 1.3.2: concrete performance"

This is not a problem I am working on, but I would be happy to see a
solution.  I am seeing a lot of scam solutions, that sprinkle zk-snarks over
existing solutions as magic pixie dust, like putting wings on a solid fuel
rocket and calling it a space plane.

[lightning layer]:lightning_layer.html

[sovereign cipher corporations]:social_networking.html#many-sovereign-corporations-on-the-blockchain

[a frequently changing secret that is distributed]:multisignature.html#scaling

# sharding within each single very large peer

Sharding within a single peer is an easier problem than sharding the
blockchain between mutually distrustful peers capable of Byzantine
defection, and the solutions are apt to be more powerful and efficient.

When we go to scale, when we have very large peers on the blockchain,
we are going to have to have sharding within each very large peer, which will
multiprocess in the style of Google's massively parallel multiprocessing,
where scaling and multiprocessing is embedded in interactions with the
massively distributed database, either on top of an existing distributed
database such as Rlite or Cockroach, or we will have to extend the
consensus algorithm so that the shards of each cluster form their own
distributed database, or extend the consensus algorithm so that peers can
shard.  As preparation for the latter possibility, we need to have each peer
only form gossip events with a small and durable set of peers with which it
has lasting relationships, because the events, as we go to scale, tend to
have large and unequal costs and benefits for each peer.  Durable
relationships make sharding possible, but we will not worry to much about
sharding until a forty terabyte blockchain comes in sight.

When we go to scale, we are going to have to have sharding, which will
multiprocess in the style of Google’s massively parallel multiprocessing,
where scaling and multiprocessing is embedded in interactions with the
massively distributed database, either on top of an existing distributed
database such as Rlite or Cockroach, or we will have to extend the
consensus algorithm so that the shards of each cluster form their own
distributed database, or extend the consensus algorithm so that peers can
shard.  As preparation for the latter possibility, we need to have each peer
only form gossip events with a small and durable set of peers with which it
has lasting relationships, because the events, as we go to scale, tend to
have large and unequal costs and benefits for each peer.  Durable
relationships make sharding possible, but we will not worry to much about
sharding until a forty terabyte blockchain comes in sight.

For sharding, each peer has a copy of a subset of the total blockchain, and
some peers have a parity set of many such subsets, each peer has a subset
of the set of unspent transaction outputs as of consensus on total order at
one time, and is working on constructing a subset of the set of unspent
transactions as of a recent consensus on total order, each peer has all the
root hashes of all the balanced binary trees of all the subsets, but not all
the subsets, each peer has durable relationships with a set of peers that
have the entire collection of subsets, and two durable relationships with
peers that have parity sets of all the subsets.

Each subset of the append only immutable set of transactions is represented
by a balanced binary tree of hashes representing $2^n$ blocks of
the blockchain, and each subset of the mutable set of unspent transaction
outputs is a subsection of the Merkle-patricia tree of transaction outputs,
which is part of a directed acyclic graph of all consensus sets of all past
consensus states of transaction outputs, but no one keeps that entire graph
around once it gets too big, as it rapidly will, only various subsets of it.

But they keep the hashes around that can prove that any subset of it was
part of the consensus at some time.

Gossip vertexes immutable added to the immutable chain of blocks will
contain the total hash of the state of unspent transactions as of a previous
consensus block, thus the immutable and ever growing blockchain will contain
an immutable record of all past consensus Merkle-patricia trees of
unspent transaction outputs, and thus of the past consensus about the
dynamic and changing state resulting from the immutable set of all past
transactions

For very old groups of blocks to be discardable, it will from time to time be
necessary to add repeat copies of old transaction outputs that are still
unspent, so that the old transactions that gave rise to them can be
discarded, and one can then re-evaluate the state of the blockchain starting
from the middle, rather than the very beginning.