figured out, at least in outline, how to make

a distributed hash table byzantine fault tolerant Slight clarification on scalability Figured out how to make variable length integers that will be represented in correct order in a patricia merkle tree.
2023-10-13 21:14:31 +10:00 · 2023-10-13 21:14:31 +10:00 · 06b9fc4017
commit 06b9fc4017
parent 776c18a4a6
7 changed files with 489 additions and 44 deletions
--- a/docs/estimating_frequencies_from_small_samples.md
+++ b/docs/estimating_frequencies_from_small_samples.md
@ -5,6 +5,66 @@ title: Estimating frequencies from small samples
 ...
 # The problem to be solved

+## distributed hash table
+
+The Distributed hash table fails horribly in the face of a
+significant likelihood of bad behaviour by the participants,
+because you do not actually know the state of the network.
+The usual procedure  (Bittorrent network) is to treat information as
+unconditionally valid for two hours, then throw it away,
+which is pretty useless if a participant is behind a NAT,
+and a disastrous loss of data if he has a long lived network address.
+
+We would like to accumulate on disk very long lived
+and rarely changing data about long lived participants,
+the backbone of the distributed hash table.
+
+We also want to have an arrangement with peers behind a NAT,
+that each will ping the other at certain times with a keep-alive,
+and if the expected keep-alive fails to arrive, the ensuing nacks and acks
+will re-open the hole in the firewall, and also give us
+information on how often each needs to ping the other.
+When one concludes the timing of the pings could be improved,
+they renegotiate the schedule with each other,
+so that peers behind a nat with long lived holes do not need frequent pings.
+
+At present, a random lookup serves the function of a keep-alive, resulting in
+excessive churn in the DHT
+
+If we represent the state of the distributed hash table with
+metalogistic distributions, the resulting distributed hash table
+should be tolerant to Byzantine fault.
+(Because a Byzantine faulting peer eventually winds up being rated
+as unreliable, and the backbone of the distributed hash table will
+be long lived peers with long lived reputations, the reputation
+being represented by a metalogistic distribution giving the likelihood
+that the information supplied is correct.)
+
+Each peer is identified by its durable public key.  For each peer
+there is its current network address, and a metalogistic distribution
+of the longevity of that network address,
+which no one keeps around for very long or distributes very far
+if it does not indicate much longevity.
+
+There is also a metalogistic distribution of the likelihood
+that hole punching will be needed, and if likely to be needed,
+a list of peers that might provide it,
+and the likelihood that hole punching will work.
+If the first peer in the list is up but fails, the next is not tried.
+But if the first peer cannot be contacted, the next is contacted.
+
+And, if hole punching is needed, a metalogistic distribution of
+how long the hole is likely to last after punching.
+
+And, most importantly, for our backbone of very long lived peers,
+metalogistic distributions of the likelihood of Byzantine fault,
+which will provide us with a Byzantine fault tolerant distributed hash table.
+
+## protocol negotiation
+
+We could also apply distributions to protocol negotiation,
+though this is likely to be colossal overkill.
+
 Because protocols need to be changed, improved, and fixed from time to
 time, it is essential to have a protocol negotiation step at the start of every networked interaction, and protocol requirements at the start of every store
 and forward communication.
@ -27,6 +87,48 @@ sends or requests enough bits to reliably identify that protocol.  But this
 means it must estimate probabilities from limited data.  If one’s data is
 limited, priors matter, and thus a Bayesian approach is required.

+### should not worry about protocol identifier size for a long time.
+
+The above is massive overkill
+
+A quick solution, far less clever than accurately guessing that
+two entities are speaking the same language, is to find an integer such
+that both parties have a Dewey decimal protocol identifier that
+starts with the same integer, and then go with the smaller of the
+two Dewey Decimal protocol identifiers.
+Dewey decimal numbers that start with the same integer should be different
+versions of the same protocol, and if one party can handle the
+higher numbered version,
+he has to be able to handle all lower numbered versions of that same protocol.
+Dewey decimal numbers that start with different integers
+represent unrelated protocols.
+
+So if the client says 7.3.2.2.1, and the server has only been
+updated to 7.2.0, he replies 7.2.0, and both parties then go
+with 7.2.0,
+but if he only knows 6.3.3, 1.6.0 and 219.1.0, he replies
+"fail unknown protocol,"
+
+People launching a new protocol pick an integer,
+and if they are not sure what integers are in use,
+they just pick a fairly large integer.
+In time, we will wind up with a whole lot of integers that "in use",
+the vast majority of which are no longer in use,
+and no one is sure which ones are no longer in use,
+so for a new protocol, they pick a sufficiently large random number.
+(Assuming we represent these integers by variable length quantities
+so that we can go to unlimitedly large integers, or at least integers
+in the range [0 to 283 trillion](./variable_length_quantity.html){target="_blank"},
+which should be unlimited enough for anyone.
+In the unlikely event that there are eventually ten million protocols
+floating around the internet
+a random number in that range is unlikely to lead to a collision)
+
+If there were ten million protocols floating around,
+then the theoretically optimal way of representing
+protocols would only be three or four bytes smaller,
+so doing it this easy way is not a significant waste of space.
+
 # Bayesian Prior

 The Bayesian prior is the probability of a probability, or, if this recursion
@ -68,7 +170,7 @@ take three samples, one of which is X, and two of which are not X, then
 our new distribution is the Beta distribution $α+1,β+2$

 If our distribution is the Beta distribution α,β, then the probability
-that the next sample will be X is $\frac{α}{α+β}$
+that the next sample will be X is $$\frac{α}{α+β}$$

 If $α$ and $β$ are large, then the Beta distribution approximates a delta
 function
@ -78,7 +180,7 @@ equally likely.

 That, of course, is a pretty good prior, which leads us to the conclusion
 that if we have seen $n$ samples that are green, and $m$ samples that are not
-green, then the probability of the next sample being green is $\frac{n+1}{(n+m+2}$
+green, then the probability of the next sample being green is $$\frac{n+1}{(n+m+2}$$

 Realistically, until we have seen diverse results there is a finite probability
 that all samples are X, or all not X, but no beta function describes this
@ -135,6 +237,11 @@ than a boolean.

 # A more realistic prior

+## The beta distribution
+
+The Beta distribution has the interesting property that for each new test,
+the Baysian update of the Beta distribution is also a Beta distribution.
+
 Suppose our prior, before we take any samples from the urn, is that the probability that the proportion of samples in the urn that are X is ρ is
 $$\frac{1}{3}P_{11} (ρ) + \frac{1}{3}δ(ρ) + \frac{1}{3}δ(1-ρ)$$

@ -183,3 +290,35 @@ $$\frac{(n+1)}{n+2}$$
 Which corresponds to our intuition on the question “all men are mortal” If we find no immortals in one hundred men, we think it highly improbable that we will encounter any immortals in a billion men.

 In contrast, if we assume the beta distribution, this implies that the likelihood of the run continuing forever is zero.
+
+## the metalog (metalogistic) distribution
+
+The metalogistic distribution is like the Beta distribution in that
+its Bayesian update is also a metalogistic distribution, but has more terms,
+as many terms as are required for the nature of the thing being represented.
+
+The Beta distribution plus two delta functions is a metalogistic distribution
+if we stretch the definition of the metalogistic distribution slightly.
+
+
+The Beta distribution represents the probability of a probability
+(since we are using it for its Bayesian update capability).
+For example we have a collection of urns containing red and blue balls,
+and from time to time we draw a ball out of an urn and replace it,
+whereupon the Beta distribution is our best guess
+about the likelihood that it contains a certain ratio of red and blue balls
+(also assuming the urns are enormously large,
+and also always contain at least some red and at least some blue balls)
+
+Suppose, however, the jars contain gravel, the size of each piece
+of gravel in a jar being normally distributed, and we want to
+estimate the size and standard deviation of the gravel in an urn,
+rather than the ratio of red balls and blue balls.
+
+(Well, the size $s$ cannot be normally distributed, because $s$ is strictly non negative, but perhaps $\ln(s)$, or $s\ln(s)$, or $(s/a -a/s)$ is normally distributed.)
+
+Whereupon our Baysian updates become more complex,
+and our prior has to contain difficult to justify information
+(no boulders or dust in the urns), but we are still doing Bayesian updates,
+hence the Beta distribution, and its generalization
+the metalogistic distribution, still applies.
--- a/docs/libraries.md
+++ b/docs/libraries.md
@ -2,10 +2,259 @@
 title: Libraries
 ...

-This discussion is way out of date because a rust recursive snark library 
-is now available, and making it public would impose a huge burden on me
-of keeping it current and accurate, when events would render it continually
-out of date.
+A review of potentially useful libraries and utilities.
+
+The material here is usually way out of date and frequently wrong.
+It should be treated as a bunch of hints likely to point the reader
+in the correct direction, so that the reader can do his homework
+on the appropriate library.  It should not be taken as gospel.
+
+# Recursive snarks
+
+A horde of libraries are rapidly appearing on GitHub,
+most of which have stupendously slow performance,
+can only generate proofs for absolutely trivial things,
+and take a very long time to do so.
+
+[Nova]:https://github.com/microsoft/Nova
+{target="_blank"}
+
+[Nova] claims to be fast, is being frequently updated, needs no trusted setup, and other people are writing toy programs using [Nova].
+
+[Nova] claims you can plug in other elliptic curves, though it sounds like you
+might need alarmingly considerable knowledge of elliptic curves in order to
+do so.
+
+Plonky had a special purpose hash, such that it was
+easy to produce recursive proofs about Merkle trees.
+I don't know if Nova can do hashes with useful speed, or hashes at all,
+without which no recursive snark system is useful.
+
+We need a hash that has a relatively small circuit.
+And it appears that no such hash is known.
+
+Nova is built out of commitments, which are about 256 times bigger than a hash.
+A Nova proof is a proof about a merkle tree of commitments.
+If we build our blockchain out of Nova commitments, it will about couple of
+hundred times larger than one built out of regular hashes,
+but will still only occupy about ten or twenty gigabytes of storage.
+Bandwidth limits restrict us to about twenty transactions a second,
+which is still faster than the bitcoin blockchain.
+Plus, when we hit ten or twenty transactions per second,
+we can shard the blockchain, which we can do because each shard can prove
+it is telling the truth about transactions, whereas with bitcoin,
+every peer has to evaluate every transaction,
+lest one shard conspire to cheat the others.
+
+[Nova] does not appear to have a language.
+Representing a proof system as a Turing machine just seems like a bad idea.
+It is not a Turing machine.
+You don't calculate $a=b*c$ you instead prove that
+$a=b*c$, when you already somehow knew $a$, $b$, and $c$.
+A Turing machine is a state machine.  A proof system is not.
+
+It is often said, and is in a sense true, that prover produces a proof
+that for a given computation he knows an input such that after a
+correct execution of the computation he obtains a certain public output.
+
+But no he his not.  The proof system proves that relationships hold between values.
+And because it can only prove certain rather arcane and special things about
+relationships between values, you have to compute a very large number
+of intermediate values such that the relationship you actually want to prove
+between the input and the output corresponds to simple relationships between
+these intermediate values.  But computing those intermediate values belongs
+in an another language, such as C++ or rust.
+
+With Nova, we would get an algorithm such that you start out with your real input.
+You create a bunch of intermediate values in a standard language like C++ or rust,
+then you call the proof system to produce a data structure
+that can be used to prove relationships between your input and those
+intermediate values.
+Then you produce the next set of intermediate values,
+call your proof system to produce a data structure
+that can be used to prove the next set of relationships,
+fold those two proof generating data structures together,
+rinse and repeat,
+and at the end you generate a proof that the set of relationships
+the fold represents is valid.
+That is procedural, but expressing the relationships is not.
+Since your fold is the size of the largest hamiltonian circuit so far,
+you want the steps to be all of similar size.
+
+This suggests a functional language (sql).  There are, in reality,
+no purely functional languages for Turing machines.
+Haskell has its monads, sql has update, insert, and delete.
+But the natural implementation for a proof system would be a truly purely functional language, an sql without update, insert, or delete, without any operations that actually wrote anything to memory or disk, that simply defined relationships without a state machine that changes state to write data into memory consistent with those changes.
+
+The proof script has to be intellible, and the same for prover and verifier,
+the difference being that the prover interleaves the proof language with ordinary code
+in ordinary language, to produce the values that are going to be proven.  The prover
+drives the script along with ordinary language code, and verifier drives it along
+with different ordinary language code, but the proof definition that is common
+to both of them has no concept of being sequential and driven along,
+no concept that things are done in any particular order.
+It is a graph of relationships.
+The proof language, as is typical of purely functional languages,
+ should consist of assertions of about relationships between immutable
+ data structures, without expressing the idea that some of these
+ data structures were created at one time, and destroyed at another.
+ Some of these values are defined recursively, which means that what
+ is actually going to happen in practice is that they are going to be
+ created by a loop, written in the ordinary procedural language
+ such as Rust or C++, but the proof language should have no concept of that.
+
+ if the proof language asserts that $1 \leq 0 \lor n<20 \implies f(n-1)= g(f(n))$,
+ the ordinary procedural language will likely need to 
+ generate the values of $f(n) for n=1 to 19, 
+ and will need to cause the proof language to generate proofs for each value
+ of $n$ for 1 to 19, but the resulting proof will be independent
+ of the order in which these these proofs were generated.
+
+ Purely functional languages like sql do not prescribe an algorithm, and need
+ a code generator that has to make guesses about what a good algorithm would
+ be, as exemplified by sqlite's likelihood, likely, and unlikely no-ops.
+
+ And with a proof system we will, at least at first, have the human
+ make the algorithm but if he changes the algorithm,
+ while leaving the proof system language unchanged, will still work.
+
+The open source nature of Plonky is ... complicated.
+The repository on Github has been frozen for two years, so likely
+does not represent the good stuff.
+
+# Peer to Peer
+
+[p2p]:https://github.com/elenaf9/p2p
+{target="_blank"}
+
+[libp2p]:ipns://libp2p.io/
+{target="_blank"}
+
+The generically named [p2p] does exactly what you want to do.
+It is a thin wrapper around [libp2p] to allow participants in the
+Kademlia cloud to find each other and send each other private messages.
+Unfortunately Kademlia is broken and extremely vulnerable
+to hostile action,
+and [libp2p] has only encryption operations around a broken name system
+which they are making even more user hostile than it already is
+because they don't want anyone using it (because broken),
+uses encryption libraries
+for which there is strong reason to suspect enemy activity,
+and does not support Schnorr keys, nor Ristretto,
+which is an obstacle to scriptless scripts and joint signatures.
+
+The reason they do not support Schnorr is that they are using nonprime groups,
+and doing a Schnorr signature safely in a non prime group is incomprehensibly hard
+and very easy to get subtly wrong.
+The great strength of Ristretto is that it is prime,
+which makes a whole lot of clever cryptography available to do clever things,
+Schnorr signatures, scriptless scripts, lightning locks,
+and compact joint signatures among them.
+
+[libp2p] have a small set of name systems and public key systems,
+and it should not be too hard to add yet another to that set.
+It appears to be extensible,
+because it has to support no end of old obsolete stuff.
+Obviously new stuff has been added from time to time,
+so it should be possible to find the additions in git and follow their example.
+
+[multiple transport schemes]:ipns://docs.libp2p.io/concepts/transports/listen-and-dial/
+{target="_blank"}
+
+[libp2p] supports [multiple transport schemes], and can support a set
+of peers using heterogeneous transport.
+So you just have to add another transport scheme,
+and not everyone has to update simultaneously.
+They use TCP and web, but there is a plugin point for new transport schemes,
+so just plug in UDP under an encryption and reliability layer.
+
+[libp2p] should make it possible to access the IPFS
+both to write and read stuff, though that might be far from trivial.
+
+You could perhaps publish stuff on IPFS that looks like a normal html
+document, but contains embedded cryptographic data giving it more forms
+of interaction when viewed on your browser, than viewed on brave.
+
+Replacing  Kademlia for finding peers in the face of
+enemy entryist action is a big project, though libp2p seems to have
+taken the essential step of identifying peers by their public key,
+rather than IP and port address.
+
+[implementations]:http://libp2p-io.ipns.localhost:48084/implementations
+{target="_blank"}
+
+[distributed hash table]:https://github.com/libp2p/specs/blob/master/kad-dht/README.md
+{target="_blank"}
+
+Their [distributed hash table] (kad-dht) seems to be very much a work in progress. Checking their [implementations] page for the status of various libp2p components, *everything*
+seems to be very much a work in progress.  Some things are implemented
+in some languages which are not implemented in other languages.  Rust has
+hole punching, C++ does not.  But C++ has a whole lot of stuff that Rust
+does not.  And their documentation on kad-dht has its page blank.  On the other hand,
+ipfs works, and polkadot works.  So, something is usable.  libp2p-peer not implemented
+in C++, but is implemented in rust, and is implemented in browser javascript.
+
+Their rendevous protocol presupposes a single central and known server
+which simply records everone's ID and network address.  Unacceptable.
+Anyone using this is fake and an enemy.  Should be disabled in our fork.
+
+libp2p is a pile of odds and ends and a framework for gluing them together.
+
+But everything they do is crippled by the fact that you don't know the
+likely uptime, downtime, or IP stability of an entity.  Which cannot
+in fact be known, but you can form probability of a probability estimates.
+
+What is needed is that everyone forms their own probability of a probability.
+And they compare what they know 
+(they have a high probability of a probability)
+with other party's estimates, and rate the other party's reliability accordingly
+If we add to that probability of probability estimate of IP and port stability
+and use it to govern ping time and keep around time, that goes a large way
+to solving the the problems with Kademlia
+
+We can adapt it to the problem
+by having them preferentially keep around the data for peers
+that have stable ip and a stable port, and,
+somewhat less preferentially, keep around peers that have a
+stable nat penetration or tracking relationship with a peer that has a
+stable ip and stable port.  Selective pinging.  You rarely ping peers
+that have been around for a very long time with stable IP, and you
+ping a peer that has a nat penetration relationship by not pinging it,
+and instead asking the gateway peer how the relationship is going,
+at infrequent intervals.  Thus, a peer with stable IP
+or stable relationship becomes very widely known.
+
+Well, becomes widely known assuming shills do not register
+one billion addresses that happen to be near him.
+
+libp2p is something between actual code, and set of standards -
+which standards you comply with so that you can use other people's code.
+Someone writes something ad hoc for his use case, stuffs it into libp2p
+somewhere somehow so that he can use other people's code.
+Then other people use his code.
+
+It is a pile of standards (many of them irritatingly stupid, incomplete, ad hoc,
+or workarounds for using defective tools) that enable a
+whole lot of people writing this sort of thing to copy a whole lot of
+each other's code.
+
+Their [NAT discovery algorithm](https://github.com/libp2p/specs/tree/master/autonat){target="_blank"}
+Is particularly idiotic and broken.  It is not a nat discovery algorithm,
+but a closed port discovery algorithm, and a ludicrously laborious,
+costly, indirect, error prone, and inefficient closed port discover algorithm.
+
+NAT means the other guy sees your network address different from what you see.
+In which case your port is probably closed, but could well be open.
+If he sees the same network address as you, your port might be open,
+but you don't know that,
+and talking to the other guy might well temporarily open your port,
+with the result that he might tell you that you are not behind a NAT,
+when in fact you are, and your ports are normally closed.
+The guys writing this stuff are dumb as posts,
+and a whole lot of what they write is garbage.
+But, nonetheless, a whole lot of people are using libp2p,
+and a whole lot of people are doing a whole lot of work on it --
+not all of which is ready for prime time.

 # Wireguard, Tailwind, and identity

--- a/docs/libraries/nova_recursive_snarks.pdf
+++ b/docs/libraries/nova_recursive_snarks.pdf
--- a/docs/manifesto/scalability.md
+++ b/docs/manifesto/scalability.md
@ -47,9 +47,28 @@ This is in part malicious, the enemy pouring mud into the tech waters. So I need

 A zk-snark or a zk-stark proves that someone knows something,
 knows a pile of data that has certain properties, without revealing
-that pile of data. Such that he has a preimage of a certain hash
-and that this preimage has certain properties –
-such as the property of being a valid transaction.
+that pile of data.
+
+The prover produces a proof that for a given computation he knows
+an input such that after a correct execution of the computation
+he obtains a certain public output - the public output typically
+being a hash of a transaction, and certain facts about
+the transaction.  The verifier can verify this without knowing
+the transaction, and the verification takes roughly constant time
+even if the prover is proving something about an enormous computation,
+an enormous number of transactions.
+
+To use a transaction output as the input to another transaction we need
+a proof that this output was committed on the public broadcast channel
+of the blockchain to this transaction and no other, and a proof that this
+output was itself an output from a transaction whose inputs were committed
+to that transaction and no other, and that the inputs and outputs of that
+transaction balanced. 
+
+So the proof has to recursively prove that all the transactions
+that are ancestors of this transaction output were valid all the
+way back to the beginning of the blockchain.
+
 You can prove an arbitrarily large amount of data
 with an approximately constant sized recursive snark.
 So you can verify in a quite short time that someone proved
@ -266,6 +285,13 @@ every block height whose binary representation ends in a one
 followed by $m$ zeroes, we use the information in four level $m$
 summary blocks, the blocks $2^{m+1}*n + 2^{m-1}- 4*2^{m}$, $2^{m+1}*n + 2^{m-1}- 3*2^{m}$, $2^{m+1}*n + 2^{m-1}- 2*2^{m}$, and $2^{m+1}*n + 2^{m-1}- 1*2^{m}$ to produce an $m+1$ summary block that allows the two oldest remaining level $m$ summary blocks, the blocks $2^{m+1}*n + 2^{m-1}- 4*2^{m}$ and $2^{m+1}*n + 2^{m-1}- 3*2^{m}$ to be dropped.

+It is not sufficient to merely forget about old data.
+We need to regenerate new blocks because the patricia merkle tree
+presented by the public broadcast channel has to prove
+that outputs that once were registered as unspent,
+and then registered to a commit, or sequence of commits,
+are no longer registered at all.
+
 We summarise the data in the earliest two blocks by discarding
 every transaction output that was, at the time those blocks were
 created, an unspent transaction output, but is now marked as used
@ -345,6 +371,15 @@ height is currently near a hundred thousand, at which height we will
 be keeping about fifty blocks around, instead of a hundred thousand
 blocks around.

+If we are using Nova commitments, which are eight or nine kilobytes,
+in place of regular hashes, which are thirty two bytes,
+the blockchain will still only occupy ten or twenty gigabytes, but,
+if using Nova commitments, bandwidth limits will force us to shard
+when we reach bitcoin transaction rates.  But with recursive snarks,
+you *can* shard, because each shard can produce a concise proof that
+it is not cheating the others, while with bitcoin,
+everyone has to evaluate every transaction to prove that no one is cheating.
+
 # Bigger than Visa

 And when it gets so big that ordinary people cannot handle the
--- a/docs/manifesto/social_networking.md
+++ b/docs/manifesto/social_networking.md
@ -994,6 +994,9 @@ justice.  (They now rely on a Taiwanese owned and operated chip fab), and
 Disney destroyed to Star Wars franchise, turning it into a lecture on social
 justice. Debian broke Gnome3 and cannot fix it because of social justice.

+[book]:./triple_entry_accounting.html
+"triple entry accounting"
+
 Business needs a currency and [book] keeping system that enables them to
 operate a business instead of a social justice crusade.

@ -1080,9 +1083,6 @@ will be traded in a way that gives the developers seigniorage.
 [triple entry accounting]:./triple_entry_accounting.html
 "triple entry accounting"

-[book]:./triple_entry_accounting.html
-"triple entry accounting"
-
 Software that enables businesses that can resist political pressure is a
 superset of software than enables discussion groups that can resist political
 pressure.  We start by enabling discussion groups, which will be an
@ -1408,14 +1408,14 @@ supposedly respectable and highly regulated people, which does not help
 you much if, as in the Great Minority Mortgage Meltdown, the regulators
 are engaged in evil deeds, or if, as with Enron and MF Global, the
 accountants are all in the pay of powerful men engaged in evil deeds.
-Triple entry [book]keeping with immutable journal entries works in a low
+Triple entry [book keeping] with immutable journal entries works in a low
 trust world of badly behaved elites, works in the circumstances now
 prevailing, and, unlike Sox accounting, it does not require wide sharing of
 the books.

 ## Corporate cohesion

-The corporation exists by [book]keeping, which enables the shareholders to
+The corporation exists by [book keeping], which enables the shareholders to
 keep an eye on the board, and the board to keep an eye on the CEO, and our
 current system of bookkeeping is failing for lack of trust and honour.

--- a/docs/navbar
+++ b/docs/navbar
@ -0,0 +1,7 @@
+<div class="button-bar">
+  <a href="vision.html">vision</a>
+  <a href="scalability.html">scalability</a>
+  <a href="social_networking.html">social networking</a>
+  <a href="Revelation.html">revelation</a>
+</div>
+  
--- a/docs/variable-length-quantity.md
+++ b/docs/variable-length-quantity.md
@ -8,50 +8,61 @@ And then I realized that an sql index represented as a merkle-patricia tree inhe
 Which is fine if we represent integers as fixed length integers in big endian format,
 but does not correctly sort variable length quantities if we follow the standard:

-So: To represent variable signed numbers in byte string sortable order:
+So: To represent variable length signed numbers in sequential byte string sortable order so that the integer sequence corresponds one to one to the byte string sequence, a strictly sequential sequence of integers with no gaps corresponding to a strictly sequential sequence of byte strings with no gaps:

 # For positive signed integers

 If the leading bits are $10$, it represents a number in the range\
-$0$ ... $2^6-1$  So only one byte
+$0$ ... $2^6-1$  So only one byte (two bits of header, six bits to represent $2^{6}$ different
+values as the trailing six bits bits of an ordinary eight bit bit
+positive integer).

 If the leading bits are $110$, it represents a number in the range\
 $2^6$ ... $2^6+2^{13}-1$  So two bytes

 if the leading bits are $1110$, it represents a number in the range\
+$2^6+2^{13}$ ... $2^6+2^{13}+2^{20}-1$ So three bytes long
+(four bits of header, twenty bits bits to represent $2^{20}$ different
+values as the trailing twenty bits of an ordinary thirty two bit
+positive integer in big endian format).
+
+if the leading bits are $b1111\,0$, it represents a number in the range\
 $2^6+2^{13}+2^{20}$ ... $2^6+2^{13}+2^{20}+2^{27}-1$ So four bytes long
 (five bits of header, twenty seven bits to represent $2^{27}$ different
 values as the trailing twenty seven bits of an ordinary thirty two bit
 positive integer in big endian format).

-if the leading bits are $1111\,0$, it represents a number in the range\
+if the leading bits are $1111\,10$, it represents a number in the range\
 $2^6+2^{13}+2^{20}+2^{27}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}-1$
 So five bytes long.

-if the leading bits are $1111\,10$, it represents a number in the range\
+if the leading bits are $1111\,110$, it represents a number in the range\
 $2^6+2^{13}+2^{20}+2^{27}+2^{34}-1$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}-1$
 So six bytes long.

-if the leading bits are $1111\,110$, it represents a number in the range\
-$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}$
+if the leading bits are $1111\,1110$, it represents a number in the range\
+$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}-1$
 So seven bytes long.

-if the leading bits are $1111\,1110$, it represents a number in the range\
-$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}-1$
-So eight bytes long.
+The reason for these complicated offsets is to ensure that the byte string are strictly sequential.

-if the leading bits are $1111\,1111\,0$, it represents a number in the range\
-$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}+2^{62}-1$ 
-So nine bytes long (ten bits of header, sixty two bits to represent $2^{62}$
-different values as the trailing sixty two bits of an ordinary sixty four bit positive integer in big endian format).
+if the bits of the first byte are $1111\,1111$, we change representations. 
+Instead that number is represented by a variable
+length quantity that is a count of
+bytes in the rest of the byte string, which is the number itself in its
+natural binary big endian form, with the leading zero bytes discarded.
+So no longer using these complicated offsets for the number itself,
+but are using them for the byte count.

-if the leading bits are $1111\,1111\,10$, it represents a number in the range\
-$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}+2^{62}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}+2^{62}+2^{69}-1$
-So ten bytes long.
+This change in representation simplifies coding and speeds up the transformation,
+but costs an extra byte for numbers larger than $2^{48}$ and less than $2^{55}$. 

 And so on and so forth in the same pattern for positive signed numbers of unlimited size.

-The reason for these complicated offsets is to ensure that the byte string are strictly sequential.
+## examples
+
+The bytestring 0xCABC corresponds to the integer 0x0A7C.\
+The bytestring 0xEABEEF corresponds to the integer 0x0ABCAF.

 # For negative signed integers

@ -86,19 +97,16 @@ if the leading bits are $0000\,0001$, it represents a number in the range\
 $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}$
 So seven bytes long.

-if the leading bits are $0000\,0000\,1$, it represents a number in the range\
-$-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-1$
-So eight bytes long.
-
-if the leading bits are $0000\,0000\,01$, it represents a number in the range\
-$-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}-2^{62}$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}-1$
-So nine bytes long (ten bits of header, sixty two bits to represent $2^{62}$
-different values as the trailing sixty two bits of an ordinary sixty four bit
-negative integer in big endian format).
-
-if the leading bits are $0000\,0000\,001$, it represents a number in the range\
-$-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}-2^{62}
-$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}-1$ So ten bytes long.
+if the bits of the first byte are $0000\,0000$, we change representations.
+Instead that number is represented by a variable length quantity that is
+*zero minus the count* of bytes in the rest of the byte string,
+which is the negative number itself in its natural binary big endian form,
+with the leading minus one bytes discarded.
+So no longer using these complicated offset for the number itself,
+but are using them for the byte count.
+We use the negative of the count, in order to get the correct
+sort order on the underlying byte strings, so that they can be
+represented in a Merkle patricia tree representing and index.

 And so on and so forth in the same pattern for negative signed numbers of unlimited size.

@ -118,3 +126,10 @@ and so on and so forth.

 In other words, we represent it as the integer obtained
 by prepending a leading one bit to the bit string.
+
+# Dewey decimal sequences.
+
+The only thing we ever want to do with Dewey decimal sequences is $<=>$,
+and they are always positive numbers less than $10^{14}$, so we represent them as
+a sequence of variable length numbers terminated by the number minus one
+and compare them as bytestrings.