This commit is contained in:
reaction.la 2024-02-20 02:12:27 +00:00
parent e14148d4f5
commit e6fc13d078
No known key found for this signature in database
GPG Key ID: 99914792148C8388
3 changed files with 55 additions and 25 deletions

View File

@ -202,36 +202,65 @@ And so on and so forth for signed integers of unlimited size.
# bitstrings
Bitstrings in Merkle-patricia tree representing an sql index
are typically very short, so should be represented by a
variable length quantity. Which does not need to have the correct
bytestring sort order.
It might be convenient to represent the data as a pile of edges,
rather than a pile of vertices, thus solving the problem that
the tree must always start with an edge, not vertex.
This duplicates the start position of every edge,
but this duplication does not matter because the patricia representation of an index,
and the standard and usual database representation of an index,
compresses out the leading duplication.
I have no end of clever ideas to represent them in fully compressed form,
but all we actually need is a count of the bits of the vertex, the difference
counts for the number of bits in the left and right edges,
the left bytestring which contains the parent and left bitstrings, and,
if the right bitstring is more than one bit longer than the parent bitstring,
the difference bytes for the right bitstring.
So we are representing an sql index by table whose primary key is the
bitstring of the start position, and whose values are the
start position and the end position.
The patricia edges of this table live in the same table, just
their values are distinguished from actual leaf values.
If we want to be terribly clever at optimization, if both leaf bitstrings
are only greater by one than the parent bistring, we have bytes containing
the parent bitstring, otherwise the bytestring containing all the bits of
the longest edge bitstring, plus the difference bytes for the shorter
bitstring if it is longer than its parent by more than one.
Variable length bitstrings are represented as variable
length bytestrings by appending a one bit followed by
zero to seven zero bits.
An edge in a Merkle-patricia sql index contains the bit path
of the thing pointed to, and the completely unrelated hash of the
thing pointed to, which contains its own type information.
But sometimes, often, we are indexing things
*by* their hash, so need a flag on a leaf edge to denote this case.
In the table we may compress the end values by discarding
all leading bytes except the overlap byte.
Thus the actual table, containing only the leaf values,
is a virtual table based on a select statement that
excludes the internal edges of the patricia tree from
the table of all edges, and concatenates the compressed
value with the index to form the absolute value.
It is very common for the end value to be very short.
We could save a byte (which is a premature optimization)
as follows:
If S is the length of the bitfield in bits:
If $0\le S \lt 5$, it is represented by the variable
length integer obtained by prepending a set bit to the bitfield.
If $5\le S$ we represent the bit sequence as a byte
sequence prepended with the byte count plus 48,
(leaving a gap of sixteen impermissible values for future expansion)
# Dewey decimal sequences.
The only thing we ever want to do with Dewey decimal sequences is $<=>$,
and they are always positive numbers less than $10^{14}$, so we represent them as
a sequence of variable length numbers terminated by the number minus one
and compare them as bytestrings.
The only operation we ever want to do with Dewey
decimal sequences is $<=>$, and they are always
positive numbers less than $10^{34}$, so we represent
them as a sequence of variable length positive
numbers terminated by a byte that corresponds
to the header of an impermissibly large number, the
byte `0xFFFF`, and compare them as bytestrings.
Albeit we could add, subtract, multiply, and divide
Dewey decimal sequences as polynomials,
which would require signed integer sequences,
but I cannot see any use case for this,
while unsigned integer sequences have the advantage
that the ones used to sort and identify things are always
positive in practice, and one may consider a utf
string to be a very long Dewey decimal sequence.
### SQL blobs.

View File

@ -126,6 +126,7 @@ $$\int \sin(x) dx = \cos(x)$$
$$\sum a_i$$
$$\lfloor{(x+5)÷6}\rfloor = \lceil{(x÷6}\rceil$$
$$\lfloor{(x+5)/6}\rfloor = \lceil{(x/6}\rceil$$
$$0\le S \lt 5$$
Use `\bigcirc`, not capital O for Omicron $\bigcirc$. `\Omicron` will not always
compile correctly, but `\ln` and `\log` is more likely to compile correctly than
`ln` and `log`, which it tends to render as symbols multiplied, rather than one

@ -1 +1 @@
Subproject commit 5fd97439acaa39c443fe9a852d5806f5cb435a55
Subproject commit 73809b8550d3919a95c65f07f94ea91f339b5487