34 lines
2.8 KiB
HTML
34 lines
2.8 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
|
<style>
|
|
body {
|
|
max-width: 30em;
|
|
margin-left: 2em;
|
|
}
|
|
p.center {text-align:center;}
|
|
</style><title>Universal Code for Integers</title>
|
|
</head>
|
|
<body>
|
|
<h1>Universal Code for Integers</h1>
|
|
<p><a href="./index.html"> To Home page</a> </p>
|
|
<h2>The problem to be solved</h2>
|
|
<p>As computers and networks grow, any fixed length fields
|
|
in protocols tend to become obsolete. Therefore, for future
|
|
upwards compatibility, we want to have variable precision
|
|
numbers. This class of problem is that of a <a href="https://infogalactic.com/info/Universal_code_%28data_compression%29">
|
|
universal code for integers</a> </p>
|
|
|
|
<p>But all this stuff is too clever by half.</p>
|
|
|
|
<p>A way simpler solution in context is that all numbers are potentially sixty four bit numbers. Communication is always preceded by protocol negotiation, in which both sides agree on a protocol that guarantees they both know the schemas that will be used to represent records and what is to be done with those records. Each record will be preceded by a schema number, which tells the recipient how to interpret the ensuing record. A potentially sixty four bit number is represented by a group of up to nine bytes in little endian order, each byte containing seven bits, with its high order bit indicating whether the next byte is part of the group, except for ninth byte, if there is a ninth byte, in which case we shift left by eight instead of seven, and use the entire eight bits of the ninth byte, thus making overflow integers unrepresentable. This schema is good for schema identifiers, protocol identifiers, block numbers, length counts, database record numbers, and times dated from after the start of the information epoch. Blobs of known size will directly stored in the record. In the unlikely event that a blob is of variable size, its size will be length count in the schema, usually followed directly by the blob. If a record contains a variable number of records, again, a length count of the number of records.</p>
|
|
|
|
<p>Having spent an unreasonable amount of time and energy on figuring out optimal ways of representing variable precision numbers, and coming up with the extremely ingenious idea of representing the numbers from 2 to any positive number with an implied probability of
|
|
n*(n+1)/(4^n)/8, where n is the number of bits following the first non zero bit, I decided to throw all that stuff away.</p>
|
|
<p style="background-color : #ccffcc; font-size:80%">These documents are
|
|
licensed under the <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Creative
|
|
Commons Attribution-Share Alike 3.0 License</a></p>
|
|
</body>
|
|
</html>
|