diff --git a/docs/number_encoding.md b/docs/number_encoding.md index c7abfcc..fa5d988 100644 --- a/docs/number_encoding.md +++ b/docs/number_encoding.md @@ -202,36 +202,65 @@ And so on and so forth for signed integers of unlimited size. # bitstrings -Bitstrings in Merkle-patricia tree representing an sql index -are typically very short, so should be represented by a -variable length quantity. Which does not need to have the correct -bytestring sort order. +It might be convenient to represent the data as a pile of edges, +rather than a pile of vertices, thus solving the problem that +the tree must always start with an edge, not vertex. +This duplicates the start position of every edge, +but this duplication does not matter because the patricia representation of an index, +and the standard and usual database representation of an index, +compresses out the leading duplication. -I have no end of clever ideas to represent them in fully compressed form, -but all we actually need is a count of the bits of the vertex, the difference -counts for the number of bits in the left and right edges, -the left bytestring which contains the parent and left bitstrings, and, -if the right bitstring is more than one bit longer than the parent bitstring, -the difference bytes for the right bitstring. +So we are representing an sql index by table whose primary key is the +bitstring of the start position, and whose values are the +start position and the end position. +The patricia edges of this table live in the same table, just +their values are distinguished from actual leaf values. -If we want to be terribly clever at optimization, if both leaf bitstrings -are only greater by one than the parent bistring, we have bytes containing -the parent bitstring, otherwise the bytestring containing all the bits of -the longest edge bitstring, plus the difference bytes for the shorter -bitstring if it is longer than its parent by more than one. +Variable length bitstrings are represented as variable +length bytestrings by appending a one bit followed by +zero to seven zero bits. -An edge in a Merkle-patricia sql index contains the bit path -of the thing pointed to, and the completely unrelated hash of the -thing pointed to, which contains its own type information. -But sometimes, often, we are indexing things -*by* their hash, so need a flag on a leaf edge to denote this case. +In the table we may compress the end values by discarding +all leading bytes except the overlap byte. + +Thus the actual table, containing only the leaf values, +is a virtual table based on a select statement that +excludes the internal edges of the patricia tree from +the table of all edges, and concatenates the compressed +value with the index to form the absolute value. + +It is very common for the end value to be very short. + +We could save a byte (which is a premature optimization) +as follows: + +If S is the length of the bitfield in bits: + +If $0\le S \lt 5$, it is represented by the variable +length integer obtained by prepending a set bit to the bitfield. + +If $5\le S$ we represent the bit sequence as a byte +sequence prepended with the byte count plus 48, +(leaving a gap of sixteen impermissible values for future expansion) # Dewey decimal sequences. -The only thing we ever want to do with Dewey decimal sequences is $<=>$, -and they are always positive numbers less than $10^{14}$, so we represent them as -a sequence of variable length numbers terminated by the number minus one -and compare them as bytestrings. +The only operation we ever want to do with Dewey +decimal sequences is $<=>$, and they are always +positive numbers less than $10^{34}$, so we represent +them as a sequence of variable length positive +numbers terminated by a byte that corresponds +to the header of an impermissibly large number, the +byte `0xFFFF`, and compare them as bytestrings. + +Albeit we could add, subtract, multiply, and divide +Dewey decimal sequences as polynomials, +which would require signed integer sequences, +but I cannot see any use case for this, +while unsigned integer sequences have the advantage +that the ones used to sort and identify things are always +positive in practice, and one may consider a utf +string to be a very long Dewey decimal sequence. ### SQL blobs. diff --git a/docs/writing_and_editing_documentation.md b/docs/writing_and_editing_documentation.md index 5aa7bf2..8a3c95e 100644 --- a/docs/writing_and_editing_documentation.md +++ b/docs/writing_and_editing_documentation.md @@ -126,6 +126,7 @@ $$\int \sin(x) dx = \cos(x)$$ $$\sum a_i$$ $$\lfloor{(x+5)÷6}\rfloor = \lceil{(x÷6}\rceil$$ $$\lfloor{(x+5)/6}\rfloor = \lceil{(x/6}\rceil$$ +$$0\le S \lt 5$$ Use `\bigcirc`, not capital O for Omicron $\bigcirc$. `\Omicron` will not always compile correctly, but `\ln` and `\log` is more likely to compile correctly than `ln` and `log`, which it tends to render as symbols multiplied, rather than one diff --git a/wxWidgets b/wxWidgets index 5fd9743..73809b8 160000 --- a/wxWidgets +++ b/wxWidgets @@ -1 +1 @@ -Subproject commit 5fd97439acaa39c443fe9a852d5806f5cb435a55 +Subproject commit 73809b8550d3919a95c65f07f94ea91f339b5487