1
0
forked from cheng/wallet
wallet/docs/libraries/serialization_and_canonical_form.html

96 lines
5.7 KiB
HTML
Raw Normal View History

2022-02-17 22:33:27 -05:00
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<style>
body {
max-width: 30em;
margin-left: 2em;
}
p.center {text-align:center;}
table {
border-collapse: collapse;
}
td, th {
padding: 6px;
border: solid 1px black;
}
</style>
<link rel="shortcut icon" href="../../rho.ico">
<title>Serialization and Canonical form</title>
</head>
<body>
<p><a href="../libraries.html"> To Home page</a> </p>
<h1>Serialization and Canonical form</h1><p>
On reflection, using a serialization library is massive overkill, since we are serializing records that always have a record type identifier, and we are serializing hashes, signatures, and utf8 strings, which should already be in network order, so the only thing we have to serialize is ints, for which we might as well write our own serialization code, in an object of type serialization buffer<pre>
namespace ro {
template&lt;class T&gt; class iserial : public gsl::span&lt;uint8_t&gt; {
public:
static_assert(std::is_integral&lt;T&gt;::value, "iserial is only for serializing integers");
std::array&lt;uint8_t, (sizeof(T)*8+6)/7&gt; blob;
iserial(T i) {
if constexpr (std::is_signed&lt;T&gt;::value) {
// Dont serialize an integer unless you know for sure it is positive;
assert(i &gt;= 0);
}
uint8_t* p = &amp;blob[0] + sizeof(blob);
*(--p) = i & 0x7f;
i &gt;&gt;= 7;
while (i != 0) {
*(--p) = (i & 0x7f) | 0x80;
i &gt;&gt;= 7;
}
assert(p &gt;= &amp;blob[0]);
*static_cast&lt;gsl::span&lt;uint8_t&gt;*&gt;(this) = gsl::span&lt;uint8_t&gt; (p, &amp;blob[0] + sizeof(blob));;
}
};
template&lt;typename T&gt; std::enable_if_t &lt;std::is_integral&lt;T&gt;::value, ro::iserial&lt;T&gt; &gt;serialize(T i) {
return iserial&lt;T&gt;(i);
}
inline auto ideserialize(uint8_t* p) {
uint_fast64_t i{ 0 };
while (*p & 0x80) {
i = (i | (*p++ & 0x7F)) &lt;&lt; 7;
}
return uint64_t(i | *p);
/* The serialized number format supports positive integers of any size, but interpretation of unreasonably large integers is limited by the implementation. */
}
inline auto ideserialize(gsl::span&lt;uint8_t&gt; in) {
return ideserialize(&amp;in[0]);
}
}
</pre><p>
But all our money amounts will typically be around 32 bits or longer, maximum 64 bits, hence untranslatable to valid utf8. Might represent them as a decimal exponent and small integer.</p><p>
We might need floating point for graph analysis, but that is a long way in the future. Sqlite3 uses big-endian IEEE 754-2008 64-bit floating point numbers as its canonical interchange format, (no longer widely used by modern computers) but even if we do the analysis in floating point, we may well find it more convenient to interchange the data as integers, since the floating point values of groups of interest are all likely to be in a narrow range, and we may not care to interchange the graph analysis data at all, only the groupings and rankings. Really you only care about one group, the cooperating group.
We also need to serialize signatures to human readable format, for embedding in human readable messages base 58, because we are suppressing O, o, I, and lower case l, or base 64, which we get by including -+_$!* (permitted in urls) For error check, prefigure with enough hash bits to round up to a multiple of six bits. and make the output fixed size. Human readable messages that include sensitive records will always end with a hash of the entire human readable message, truncated to a multiple of three bytes, hence a multiple of six bits. Perhaps thirty bytes, two hundred and forty bits, forty u characters of u encoding, and the signed transaction output inside the message will always have the full signature of the full record. A full signature of a full record will thirty three bytes, 32 bytes of signature and one byte of hash of the signature, to make human transmission of signatures possible, though very difficult. </p><p>
<a href="https://github.com/thekvs/cpp-serializers">Review of serializers</a>.</p><p>
We dont want the schema agility of protobuf and Avro. We want header only, and do not come with half a dozen tools that do half a dozen complicated things. We just want to serialize stuff to canonical form so that it can transported between different architectures and code generated by different compilers, and so that the same object always generates the same hash.</p>
<h2><a href="https://github.com/niXman/yas">yas</a></h2><p> is the fastest when compressing to ordinary density, and most compact when compressing to high density. And is header only, unit tested, and compiler agnostic.</p>
<table>
<tr><td>yas</td> <td>17.416 gigabytes</td> <td> 3.152 seconds</td></tr>
<tr><td>yas-compact</td> <td>13.321 gigabytes</td> <td>24.878 seconds</td></tr></table>
(implying about two terabytes per hour for yas-compact)<p>
A typical high speed connection is 1Gbps one Gigabit per second. (GBps is gigabyte persecond, Gbps is gigabit per second.</p><p>
Yas compact can handle 4Gbps, so storage and bandwidth are likely to be bottlenecks and we can probably throw more cpus at the system easier than more bandwidth and storage. So we make canonical yas-compact, or a variant thereof, with customization on index fields.</p>
<h2>libnop: C++ Native Object Protocols</h2><p> only seems to support Clang compiler. Visual Studio throws up.</p>
<h2><a href="./capnproto.html">Cap'n Proto</a></h2><p>Overkill. Too much stuff. But their zero compression is cool.</p>
<p style="background-color : #ccffcc; font-size:80%">These documents are licensed under the <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-Share Alike 3.0 License</a></p>
</body>
</html>