This commit is contained in:
parent
e14148d4f5
commit
e6fc13d078
@ -202,36 +202,65 @@ And so on and so forth for signed integers of unlimited size.
|
|||||||
|
|
||||||
# bitstrings
|
# bitstrings
|
||||||
|
|
||||||
Bitstrings in Merkle-patricia tree representing an sql index
|
It might be convenient to represent the data as a pile of edges,
|
||||||
are typically very short, so should be represented by a
|
rather than a pile of vertices, thus solving the problem that
|
||||||
variable length quantity. Which does not need to have the correct
|
the tree must always start with an edge, not vertex.
|
||||||
bytestring sort order.
|
This duplicates the start position of every edge,
|
||||||
|
but this duplication does not matter because the patricia representation of an index,
|
||||||
|
and the standard and usual database representation of an index,
|
||||||
|
compresses out the leading duplication.
|
||||||
|
|
||||||
I have no end of clever ideas to represent them in fully compressed form,
|
So we are representing an sql index by table whose primary key is the
|
||||||
but all we actually need is a count of the bits of the vertex, the difference
|
bitstring of the start position, and whose values are the
|
||||||
counts for the number of bits in the left and right edges,
|
start position and the end position.
|
||||||
the left bytestring which contains the parent and left bitstrings, and,
|
The patricia edges of this table live in the same table, just
|
||||||
if the right bitstring is more than one bit longer than the parent bitstring,
|
their values are distinguished from actual leaf values.
|
||||||
the difference bytes for the right bitstring.
|
|
||||||
|
|
||||||
If we want to be terribly clever at optimization, if both leaf bitstrings
|
Variable length bitstrings are represented as variable
|
||||||
are only greater by one than the parent bistring, we have bytes containing
|
length bytestrings by appending a one bit followed by
|
||||||
the parent bitstring, otherwise the bytestring containing all the bits of
|
zero to seven zero bits.
|
||||||
the longest edge bitstring, plus the difference bytes for the shorter
|
|
||||||
bitstring if it is longer than its parent by more than one.
|
|
||||||
|
|
||||||
An edge in a Merkle-patricia sql index contains the bit path
|
In the table we may compress the end values by discarding
|
||||||
of the thing pointed to, and the completely unrelated hash of the
|
all leading bytes except the overlap byte.
|
||||||
thing pointed to, which contains its own type information.
|
|
||||||
But sometimes, often, we are indexing things
|
Thus the actual table, containing only the leaf values,
|
||||||
*by* their hash, so need a flag on a leaf edge to denote this case.
|
is a virtual table based on a select statement that
|
||||||
|
excludes the internal edges of the patricia tree from
|
||||||
|
the table of all edges, and concatenates the compressed
|
||||||
|
value with the index to form the absolute value.
|
||||||
|
|
||||||
|
It is very common for the end value to be very short.
|
||||||
|
|
||||||
|
We could save a byte (which is a premature optimization)
|
||||||
|
as follows:
|
||||||
|
|
||||||
|
If S is the length of the bitfield in bits:
|
||||||
|
|
||||||
|
If $0\le S \lt 5$, it is represented by the variable
|
||||||
|
length integer obtained by prepending a set bit to the bitfield.
|
||||||
|
|
||||||
|
If $5\le S$ we represent the bit sequence as a byte
|
||||||
|
sequence prepended with the byte count plus 48,
|
||||||
|
(leaving a gap of sixteen impermissible values for future expansion)
|
||||||
|
|
||||||
# Dewey decimal sequences.
|
# Dewey decimal sequences.
|
||||||
|
|
||||||
The only thing we ever want to do with Dewey decimal sequences is $<=>$,
|
The only operation we ever want to do with Dewey
|
||||||
and they are always positive numbers less than $10^{14}$, so we represent them as
|
decimal sequences is $<=>$, and they are always
|
||||||
a sequence of variable length numbers terminated by the number minus one
|
positive numbers less than $10^{34}$, so we represent
|
||||||
and compare them as bytestrings.
|
them as a sequence of variable length positive
|
||||||
|
numbers terminated by a byte that corresponds
|
||||||
|
to the header of an impermissibly large number, the
|
||||||
|
byte `0xFFFF`, and compare them as bytestrings.
|
||||||
|
|
||||||
|
Albeit we could add, subtract, multiply, and divide
|
||||||
|
Dewey decimal sequences as polynomials,
|
||||||
|
which would require signed integer sequences,
|
||||||
|
but I cannot see any use case for this,
|
||||||
|
while unsigned integer sequences have the advantage
|
||||||
|
that the ones used to sort and identify things are always
|
||||||
|
positive in practice, and one may consider a utf
|
||||||
|
string to be a very long Dewey decimal sequence.
|
||||||
|
|
||||||
### SQL blobs.
|
### SQL blobs.
|
||||||
|
|
||||||
|
@ -126,6 +126,7 @@ $$\int \sin(x) dx = \cos(x)$$
|
|||||||
$$\sum a_i$$
|
$$\sum a_i$$
|
||||||
$$\lfloor{(x+5)÷6}\rfloor = \lceil{(x÷6}\rceil$$
|
$$\lfloor{(x+5)÷6}\rfloor = \lceil{(x÷6}\rceil$$
|
||||||
$$\lfloor{(x+5)/6}\rfloor = \lceil{(x/6}\rceil$$
|
$$\lfloor{(x+5)/6}\rfloor = \lceil{(x/6}\rceil$$
|
||||||
|
$$0\le S \lt 5$$
|
||||||
Use `\bigcirc`, not capital O for Omicron $\bigcirc$. `\Omicron` will not always
|
Use `\bigcirc`, not capital O for Omicron $\bigcirc$. `\Omicron` will not always
|
||||||
compile correctly, but `\ln` and `\log` is more likely to compile correctly than
|
compile correctly, but `\ln` and `\log` is more likely to compile correctly than
|
||||||
`ln` and `log`, which it tends to render as symbols multiplied, rather than one
|
`ln` and `log`, which it tends to render as symbols multiplied, rather than one
|
||||||
|
@ -1 +1 @@
|
|||||||
Subproject commit 5fd97439acaa39c443fe9a852d5806f5cb435a55
|
Subproject commit 73809b8550d3919a95c65f07f94ea91f339b5487
|
Loading…
Reference in New Issue
Block a user