121 lines
5.6 KiB
Markdown
121 lines
5.6 KiB
Markdown
|
---
|
||
|
title: Variable Length Quantity
|
||
|
---
|
||
|
|
||
|
I originally implemented variable length quantities following the standard.
|
||
|
|
||
|
And then I realized that an sql index represented as a merkle-patricia tree inherently sorts in byte string order.
|
||
|
Which is fine if we represent integers as fixed length integers in big endian format,
|
||
|
but does not correctly sort variable length quantities if we follow the standard:
|
||
|
|
||
|
So: To represent variable signed numbers in byte string sortable order:
|
||
|
|
||
|
# For positive signed integers
|
||
|
|
||
|
If the leading bits are $10$, it represents a number in the range\
|
||
|
$0$ ... $2^6-1$ So only one byte
|
||
|
|
||
|
If the leading bits are $110$, it represents a number in the range\
|
||
|
$2^6$ ... $2^6+2^{13}-1$ So two bytes
|
||
|
|
||
|
if the leading bits are $1110$, it represents a number in the range\
|
||
|
$2^6+2^{13}+2^{20}$ ... $2^6+2^{13}+2^{20}+2^{27}-1$ So four bytes long
|
||
|
(five bits of header, twenty seven bits to represent $2^{27}$ different
|
||
|
values as the trailing twenty seven bits of an ordinary thirty two bit
|
||
|
positive integer in big endian format).
|
||
|
|
||
|
if the leading bits are $1111\,0$, it represents a number in the range\
|
||
|
$2^6+2^{13}+2^{20}+2^{27}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}-1$
|
||
|
So five bytes long.
|
||
|
|
||
|
if the leading bits are $1111\,10$, it represents a number in the range\
|
||
|
$2^6+2^{13}+2^{20}+2^{27}+2^{34}-1$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}-1$
|
||
|
So six bytes long.
|
||
|
|
||
|
if the leading bits are $1111\,110$, it represents a number in the range\
|
||
|
$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}$
|
||
|
So seven bytes long.
|
||
|
|
||
|
if the leading bits are $1111\,1110$, it represents a number in the range\
|
||
|
$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}-1$
|
||
|
So eight bytes long.
|
||
|
|
||
|
if the leading bits are $1111\,1111\,0$, it represents a number in the range\
|
||
|
$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}+2^{62}-1$
|
||
|
So nine bytes long (ten bits of header, sixty two bits to represent $2^{62}$
|
||
|
different values as the trailing sixty two bits of an ordinary sixty four bit positive integer in big endian format).
|
||
|
|
||
|
if the leading bits are $1111\,1111\,10$, it represents a number in the range\
|
||
|
$2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}+2^{62}$ ... $2^6+2^{13}+2^{20}+2^{27}+2^{34}+2^{41}+2^{48}+2^{55}+2^{62}+2^{69}-1$
|
||
|
So ten bytes long.
|
||
|
|
||
|
And so on and so forth in the same pattern for positive signed numbers of unlimited size.
|
||
|
|
||
|
The reason for these complicated offsets is to ensure that the byte string are strictly sequential.
|
||
|
|
||
|
# For negative signed integers
|
||
|
|
||
|
If the leading bits are $01$, it represents a number in the range\
|
||
|
$-2^6$ ... $-1$ So only one byte (two bits of header,
|
||
|
six bits to represent $2^6$ different values as the
|
||
|
trailing six bits of an ordinary eight bit negative integer).
|
||
|
|
||
|
If the leading bits are $001$, it represents a number in the range\
|
||
|
$-2^{13}-2^6$ ... $2^6-1$ So two bytes (three bits of header,
|
||
|
thirteen bits to represent $2^{13}$ different values as the trailing
|
||
|
thirteen bits of an ordinary sixteen bit negative integer in big endian format).
|
||
|
|
||
|
if the leading bits are $0001$, it represents a number in the range\
|
||
|
$-2^6-2^{13}-2^{20}$ ... $-2^6-2^{13}-1$ So three bytes long.
|
||
|
|
||
|
if the leading bits are $0000\,1$, it represents a number in the range\
|
||
|
$-2^6-2^{13}-2^{20}-2^{27}$ ... $-2^6-2^{13}-2^{20}-1$
|
||
|
So four bytes long (five bits of header, twenty seven bits to represent
|
||
|
$2^{27}$ different values as the trailing twenty seven bits of
|
||
|
an ordinary thirty two bit negative integer in big endian format).
|
||
|
|
||
|
if the leading bits are $0000\,01$, it represents a number in the range\
|
||
|
$-2^6-2^{13}-2^{20}-2^{27}-2^{34}$ ... $-2^6-2^{13}-2^{20}-2^{27}-1$
|
||
|
So five bytes long.
|
||
|
|
||
|
if the leading bits are $0000\,001$, it represents a number in the range\
|
||
|
$-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-1$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-1$
|
||
|
So six bytes long.
|
||
|
|
||
|
if the leading bits are $0000\,0001$, it represents a number in the range\
|
||
|
$-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}$
|
||
|
So seven bytes long.
|
||
|
|
||
|
if the leading bits are $0000\,0000\,1$, it represents a number in the range\
|
||
|
$-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-1$
|
||
|
So eight bytes long.
|
||
|
|
||
|
if the leading bits are $0000\,0000\,01$, it represents a number in the range\
|
||
|
$-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}-2^{62}$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}-1$
|
||
|
So nine bytes long (ten bits of header, sixty two bits to represent $2^{62}$
|
||
|
different values as the trailing sixty two bits of an ordinary sixty four bit
|
||
|
negative integer in big endian format).
|
||
|
|
||
|
if the leading bits are $0000\,0000\,001$, it represents a number in the range\
|
||
|
$-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}-2^{62}
|
||
|
$ ... $-2^6-2^{13}-2^{20}-2^{27}-2^{34}-2^{41}-2^{48}-2^{55}-1$ So ten bytes long.
|
||
|
|
||
|
And so on and so forth in the same pattern for negative signed numbers of unlimited size.
|
||
|
|
||
|
# bitstrings
|
||
|
|
||
|
Bitstrings in merkle patricia tree representing an sql index
|
||
|
are typically very short, so should be represented by a
|
||
|
variable length quantity, except for the leaf edge,
|
||
|
which is fixed size and large, so should not be
|
||
|
represented by variable length quantity.
|
||
|
|
||
|
We use the integer zero to represent this special case,
|
||
|
the integer one to represent the zero length bit string,
|
||
|
integers two and three to represent the one bit bitstring,
|
||
|
integers four to seven to represent the two bit bit string,
|
||
|
and so on and so forth.
|
||
|
|
||
|
In other words, we represent it as the integer obtained
|
||
|
by prepending a leading one bit to the bit string.
|