This commit is contained in:
parent
e14148d4f5
commit
e6fc13d078
@ -202,36 +202,65 @@ And so on and so forth for signed integers of unlimited size.
|
||||
|
||||
# bitstrings
|
||||
|
||||
Bitstrings in Merkle-patricia tree representing an sql index
|
||||
are typically very short, so should be represented by a
|
||||
variable length quantity. Which does not need to have the correct
|
||||
bytestring sort order.
|
||||
It might be convenient to represent the data as a pile of edges,
|
||||
rather than a pile of vertices, thus solving the problem that
|
||||
the tree must always start with an edge, not vertex.
|
||||
This duplicates the start position of every edge,
|
||||
but this duplication does not matter because the patricia representation of an index,
|
||||
and the standard and usual database representation of an index,
|
||||
compresses out the leading duplication.
|
||||
|
||||
I have no end of clever ideas to represent them in fully compressed form,
|
||||
but all we actually need is a count of the bits of the vertex, the difference
|
||||
counts for the number of bits in the left and right edges,
|
||||
the left bytestring which contains the parent and left bitstrings, and,
|
||||
if the right bitstring is more than one bit longer than the parent bitstring,
|
||||
the difference bytes for the right bitstring.
|
||||
So we are representing an sql index by table whose primary key is the
|
||||
bitstring of the start position, and whose values are the
|
||||
start position and the end position.
|
||||
The patricia edges of this table live in the same table, just
|
||||
their values are distinguished from actual leaf values.
|
||||
|
||||
If we want to be terribly clever at optimization, if both leaf bitstrings
|
||||
are only greater by one than the parent bistring, we have bytes containing
|
||||
the parent bitstring, otherwise the bytestring containing all the bits of
|
||||
the longest edge bitstring, plus the difference bytes for the shorter
|
||||
bitstring if it is longer than its parent by more than one.
|
||||
Variable length bitstrings are represented as variable
|
||||
length bytestrings by appending a one bit followed by
|
||||
zero to seven zero bits.
|
||||
|
||||
An edge in a Merkle-patricia sql index contains the bit path
|
||||
of the thing pointed to, and the completely unrelated hash of the
|
||||
thing pointed to, which contains its own type information.
|
||||
But sometimes, often, we are indexing things
|
||||
*by* their hash, so need a flag on a leaf edge to denote this case.
|
||||
In the table we may compress the end values by discarding
|
||||
all leading bytes except the overlap byte.
|
||||
|
||||
Thus the actual table, containing only the leaf values,
|
||||
is a virtual table based on a select statement that
|
||||
excludes the internal edges of the patricia tree from
|
||||
the table of all edges, and concatenates the compressed
|
||||
value with the index to form the absolute value.
|
||||
|
||||
It is very common for the end value to be very short.
|
||||
|
||||
We could save a byte (which is a premature optimization)
|
||||
as follows:
|
||||
|
||||
If S is the length of the bitfield in bits:
|
||||
|
||||
If $0\le S \lt 5$, it is represented by the variable
|
||||
length integer obtained by prepending a set bit to the bitfield.
|
||||
|
||||
If $5\le S$ we represent the bit sequence as a byte
|
||||
sequence prepended with the byte count plus 48,
|
||||
(leaving a gap of sixteen impermissible values for future expansion)
|
||||
|
||||
# Dewey decimal sequences.
|
||||
|
||||
The only thing we ever want to do with Dewey decimal sequences is $<=>$,
|
||||
and they are always positive numbers less than $10^{14}$, so we represent them as
|
||||
a sequence of variable length numbers terminated by the number minus one
|
||||
and compare them as bytestrings.
|
||||
The only operation we ever want to do with Dewey
|
||||
decimal sequences is $<=>$, and they are always
|
||||
positive numbers less than $10^{34}$, so we represent
|
||||
them as a sequence of variable length positive
|
||||
numbers terminated by a byte that corresponds
|
||||
to the header of an impermissibly large number, the
|
||||
byte `0xFFFF`, and compare them as bytestrings.
|
||||
|
||||
Albeit we could add, subtract, multiply, and divide
|
||||
Dewey decimal sequences as polynomials,
|
||||
which would require signed integer sequences,
|
||||
but I cannot see any use case for this,
|
||||
while unsigned integer sequences have the advantage
|
||||
that the ones used to sort and identify things are always
|
||||
positive in practice, and one may consider a utf
|
||||
string to be a very long Dewey decimal sequence.
|
||||
|
||||
### SQL blobs.
|
||||
|
||||
|
@ -126,6 +126,7 @@ $$\int \sin(x) dx = \cos(x)$$
|
||||
$$\sum a_i$$
|
||||
$$\lfloor{(x+5)÷6}\rfloor = \lceil{(x÷6}\rceil$$
|
||||
$$\lfloor{(x+5)/6}\rfloor = \lceil{(x/6}\rceil$$
|
||||
$$0\le S \lt 5$$
|
||||
Use `\bigcirc`, not capital O for Omicron $\bigcirc$. `\Omicron` will not always
|
||||
compile correctly, but `\ln` and `\log` is more likely to compile correctly than
|
||||
`ln` and `log`, which it tends to render as symbols multiplied, rather than one
|
||||
|
@ -1 +1 @@
|
||||
Subproject commit 5fd97439acaa39c443fe9a852d5806f5cb435a55
|
||||
Subproject commit 73809b8550d3919a95c65f07f94ea91f339b5487
|
Loading…
Reference in New Issue
Block a user