Saner plan for representing bit strings.

Because I realized that the merkle
vertex encloses them and gives them order
and the merkle vertex is itself enclosed
and given order.
This commit is contained in:
reaction.la 2023-11-03 23:19:26 +00:00
parent 022d794d5f
commit a88fc58e71
No known key found for this signature in database
GPG Key ID: 99914792148C8388

View File

@ -1,5 +1,5 @@
---
# katex
#katex
title: Number encoding
sidebar: true
...
@ -207,22 +207,18 @@ are typically very short, so should be represented by a
variable length quantity. Which does not need to have the correct
bytestring sort order.
So for bitstrings of six bits or less, we represent it as a byte with
a leading zero bit, and the bits following the first one bit are
the bitstring, and if the leading bit is one, it is a byte count
of byte aligned bits. Because we know the start alignment, the
beginning of the bitfield is implicit, and the final byte encodes
a bit field of zero to seven bits. This can represent a bytestring
of one to 128 bytes. However unreasonably large values, representing
variable length bytestrings representing unreasonably large bitstrings,
we reserve for future expansion, since the largest bitstring that will
be valid in normal usage will be thirty three bytes, being a full sized
hash followed by a byte representing the zero length bitstring.
I have no end of clever ideas to represent them in fully compressed form,
but all we actually need is a count of the bits of the vertex, the difference
counts for the number of bits in the left and right edges,
the left bytestring which contains the parent and left bitstrings, and,
if the right bitstring is more than one bit longer than the parent bitstring,
the difference bytes for the right bitstring.
If the bitstring representing the edge brings us the end of field, it
is leaf edge, which is a different type, being a pointer to what is
being indexed rather than a pointer to another patricia vertex and
may have additional data.
If we want to be terribly clever at optimization, if both leaf bitstrings
are only greater by one than the parent bistring, we have bytes containing
the parent bitstring, otherwise the bytestring containing all the bits of
the longest edge bitstring, plus the difference bytes for the shorter
bitstring if it is longer than its parent by more than one.
An edge in a Merkle-patricia sql index contains the bit path
of the thing pointed to, and the completely unrelated hash of the