This commit is contained in:
reaction.la 2024-02-20 02:12:27 +00:00
parent e14148d4f5
commit e6fc13d078
No known key found for this signature in database
GPG Key ID: 99914792148C8388
3 changed files with 55 additions and 25 deletions

View File

@ -202,36 +202,65 @@ And so on and so forth for signed integers of unlimited size.
# bitstrings # bitstrings
Bitstrings in Merkle-patricia tree representing an sql index It might be convenient to represent the data as a pile of edges,
are typically very short, so should be represented by a rather than a pile of vertices, thus solving the problem that
variable length quantity. Which does not need to have the correct the tree must always start with an edge, not vertex.
bytestring sort order. This duplicates the start position of every edge,
but this duplication does not matter because the patricia representation of an index,
and the standard and usual database representation of an index,
compresses out the leading duplication.
I have no end of clever ideas to represent them in fully compressed form, So we are representing an sql index by table whose primary key is the
but all we actually need is a count of the bits of the vertex, the difference bitstring of the start position, and whose values are the
counts for the number of bits in the left and right edges, start position and the end position.
the left bytestring which contains the parent and left bitstrings, and, The patricia edges of this table live in the same table, just
if the right bitstring is more than one bit longer than the parent bitstring, their values are distinguished from actual leaf values.
the difference bytes for the right bitstring.
If we want to be terribly clever at optimization, if both leaf bitstrings Variable length bitstrings are represented as variable
are only greater by one than the parent bistring, we have bytes containing length bytestrings by appending a one bit followed by
the parent bitstring, otherwise the bytestring containing all the bits of zero to seven zero bits.
the longest edge bitstring, plus the difference bytes for the shorter
bitstring if it is longer than its parent by more than one.
An edge in a Merkle-patricia sql index contains the bit path In the table we may compress the end values by discarding
of the thing pointed to, and the completely unrelated hash of the all leading bytes except the overlap byte.
thing pointed to, which contains its own type information.
But sometimes, often, we are indexing things Thus the actual table, containing only the leaf values,
*by* their hash, so need a flag on a leaf edge to denote this case. is a virtual table based on a select statement that
excludes the internal edges of the patricia tree from
the table of all edges, and concatenates the compressed
value with the index to form the absolute value.
It is very common for the end value to be very short.
We could save a byte (which is a premature optimization)
as follows:
If S is the length of the bitfield in bits:
If $0\le S \lt 5$, it is represented by the variable
length integer obtained by prepending a set bit to the bitfield.
If $5\le S$ we represent the bit sequence as a byte
sequence prepended with the byte count plus 48,
(leaving a gap of sixteen impermissible values for future expansion)
# Dewey decimal sequences. # Dewey decimal sequences.
The only thing we ever want to do with Dewey decimal sequences is $<=>$, The only operation we ever want to do with Dewey
and they are always positive numbers less than $10^{14}$, so we represent them as decimal sequences is $<=>$, and they are always
a sequence of variable length numbers terminated by the number minus one positive numbers less than $10^{34}$, so we represent
and compare them as bytestrings. them as a sequence of variable length positive
numbers terminated by a byte that corresponds
to the header of an impermissibly large number, the
byte `0xFFFF`, and compare them as bytestrings.
Albeit we could add, subtract, multiply, and divide
Dewey decimal sequences as polynomials,
which would require signed integer sequences,
but I cannot see any use case for this,
while unsigned integer sequences have the advantage
that the ones used to sort and identify things are always
positive in practice, and one may consider a utf
string to be a very long Dewey decimal sequence.
### SQL blobs. ### SQL blobs.

View File

@ -126,6 +126,7 @@ $$\int \sin(x) dx = \cos(x)$$
$$\sum a_i$$ $$\sum a_i$$
$$\lfloor{(x+5)÷6}\rfloor = \lceil{(x÷6}\rceil$$ $$\lfloor{(x+5)÷6}\rfloor = \lceil{(x÷6}\rceil$$
$$\lfloor{(x+5)/6}\rfloor = \lceil{(x/6}\rceil$$ $$\lfloor{(x+5)/6}\rfloor = \lceil{(x/6}\rceil$$
$$0\le S \lt 5$$
Use `\bigcirc`, not capital O for Omicron $\bigcirc$. `\Omicron` will not always Use `\bigcirc`, not capital O for Omicron $\bigcirc$. `\Omicron` will not always
compile correctly, but `\ln` and `\log` is more likely to compile correctly than compile correctly, but `\ln` and `\log` is more likely to compile correctly than
`ln` and `log`, which it tends to render as symbols multiplied, rather than one `ln` and `log`, which it tends to render as symbols multiplied, rather than one

@ -1 +1 @@
Subproject commit 5fd97439acaa39c443fe9a852d5806f5cb435a55 Subproject commit 73809b8550d3919a95c65f07f94ea91f339b5487