wallet/docs/libraries.md
reaction.la 7674b879eb
Started work on the SWIFT plan. Lot of whitespace fixes, hence so manny
many files updated with trivial fixes.
modified:   docs/design/TCP.md
modified:   docs/design/peer_socket.md
modified:   docs/design/proof_of_share.md
modified:   docs/estimating_frequencies_from_small_samples.md
modified:   docs/libraries.md
modified:   docs/libraries/scripting.md
modified:   docs/manifesto/May_scale_of_monetary_hardness.md
modified:   docs/manifesto/bitcoin.md
modified:   docs/manifesto/consensus.md
modified:   docs/manifesto/lightning.md
modified:   docs/manifesto/scalability.md
modified:   docs/manifesto/social_networking.md
modified:   docs/manifesto/sox_accounting.md
modified:   docs/manifesto/triple_entry_accounting.md
modified:   docs/manifesto/white_paper_YarvinAppendix.md
modified:   docs/names/multisignature.md
modified:   docs/names/petnames.md
modified:   docs/names/zookos_triangle.md
modified:   docs/notes/big_cirle_notation.md
modified:   docs/number_encoding.md
modified:   docs/scale_clients_trust.md
modified:   docs/setup/contributor_code_of_conduct.md
modified:   docs/setup/core_lightning_in_debian.md
modified:   docs/setup/set_up_build_environments.md
modified:   docs/setup/wireguard.md
modified:   docs/writing_and_editing_documentation.md
2024-06-16 10:28:08 +08:00

2202 lines
110 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Libraries
sidebar: true
notmine: false
...
A review of potentially useful libraries and utilities.
The material here is usually way out of date and frequently wrong.
It should be treated as a bunch of hints likely to point the reader
in the correct direction, so that the reader can do his homework
on the appropriate library. It should not be taken as gospel.
# Rust blockchain related libraries
Most important work in blockchains these days appears to be on rust.
I am behind the times.
[Awesome Blockchain Rust](https://rustinblockchain.org/awesome-blockchain-rust/#layer2){target="_blank"}
Rust is most ways the best system for large projects that can be worked on by many people and
installed easily by many people, but it has the huge defect of a almost endless learning curve,
and that its compile on very large programs takes a very long time. With C++, if you change one
file, it recompiles very quickly. Since Rust link basically does not work, it has to recompile
everything, which makes coding on large programs very slow. This is a killer if you are generating
an executable that does a lot of things. And anything with a gui *does* do a lot of things.
So if you try to create a gui program in Rust you wind up using a rust wrapper
around a gui written in some other language, which results in a gui that sucks.
Rust however, is best for a daemon running in the background that does networking.
# Recursive snarks
A horde of libraries are rapidly appearing on GitHub,
most of which have stupendously slow performance,
can only generate proofs for absolutely trivial things,
and take a very long time to do so.
[Blog full of the latest hot stuff](https://giapppp.github.io/posts/){target="_blank"}
[An Analysis of Polynomial Commitment Schemes: KZG10, IPA, FRI, and DARKS]
(https://medium.com/@ola_zkzkvm/an-analysis-of-polynomial-commitment-schemes-kzg10-ipa-fri-and-darks-a8f806bd3e12){target="_blank"}
[Inner Product Arguments](https://dankradfeist.de/ethereum/2021/07/27/inner-product-arguments.html){target="_blank"}
A basic explanation of polynomial commitments on ordinary curves
Verification is linear in the length of the polynomial, and logarithmic in the number of polynomials, so you want a commitment that is to quite a lot of short fixed length polynomials. All the polynomials are of the same fixed length
defined by the protocol, but the number of polynomials can
be variable. Halo 2 can reference relative fields -- you can have a proof that a value committed by N polynomials bears some relationship to the value in polynomial N+d
[most efficient pairing curves still standing]:https://arxiv.org/pdf/2212.01855
A whole lot of pairing curves have fallen to recent attacks.
The [most efficient pairing curves still standing] at the time that this paper was written are the BLS 12-381 curves. (126 bits security)
ZCash, Ethereum, Chia Netork, and Algorand, have all gone with BLS 12-381, so these probably have the best developed libraries.
[IETF pairing curve paper]:https://www.ietf.org/archive/id/draft-irtf-cfrg-pairing-friendly-curves-10.ht
The [IETF pairing curve paper] has a list of libraries
## Dory
[192 bit polynomial commits]:https://eprint.iacr.org/2020/1274.pdf
[192 bit polynomial commits].
> Dory is also concretely efficient: Using one core and
setting $n = 2^{20}$, commitments are 192 bytes.
Evaluation proofs are 18 kB, requiring
3 s to generate and 25 ms to verify.
For batches at $n = 2^{20}$, the marginal
cost per evaluation is <1 kB communication,
300 ms for the Prover and 1 ms for the Verifier.
Seems to generate a verkle tree of polynomial commits (a verkle trie being
the polynomial commitment equivalent of a merkle trie,
and prove something about the preimage of the root
-- which sounds like exactly what doctor ordered.
You prove the preimage of each vertex on the path,
and then prove the things about leaf part of the pre-image.
## Nova
[Nova]:https://github.com/microsoft/Nova
{target="_blank"}
[Nova white paper](https://eprint.iacr.org/2021/370.pdf){target="_blank"}
The folded proof has to contain an additional proof that the folding was done docrrectly.
Plonk, or Groff16 can be used as a proof system inside Nova. Recursion is a very low cost,
but its native language is inherently relaxed R1CS and plonkish and Groff16 gets
translated back and forth to relaxedR1CS
[Nova] claims to be fast, is being frequently updated, needs no trusted setup,
and other people are writing toy programs using [Nova]. You tube videos report some
real programs going into Nova -- to reduce the horrific cost of snarks and recursive snarks.
[Nova] claims you can plug in other elliptic curves, though it sounds like you
might need alarmingly considerable knowledge of elliptic curves in order to
do so.
Noval can be used, is intended to be used, and is being used as a preprocessing step to
give you the best possible snark, but should be considered as standalone,
as mimblewimble used polynomial commits alone.
The standard usage is incrementally verifiable computation, a linear chain,
but to get full trie computation, you have man instances doing the heavy lifting,
which communicate by "proof carrying data"
[You tube video](https://www.youtube.com/watch?v=SwonTtOQzAk) says nova,
bulletproofs *modified from R1CS to relaxed R1CS*,
and we can have a trie of provers. Well, if we have a trie of provers,
why should not anyone who wants to inject a transaction be a prover?
And if everyone is a prover, we need no snarks.
Everyone shares proving that he alone has done and only he needs encrypted a form
that only he can read among thirty two or so neighbors,
and similarly stuff that only two of them can read,
only four of them can read, in case he crashes, goes offline,
and loses his state.
Nova requires the vm to be standardized and repetitious.
Plonky had a special purpose hash, such that it was
easy to produce recursive proofs about Merkle trees.
I don't know if Nova can do hashes with useful speed, or hashes at all,
without which no recursive snark system is useful.
We need a hash that has a relatively small circuit.
And it appears that no such hash is known.
Nova is built out of commitments, which are about 256 times bigger than a hash.
A Nova proof is a proof about a merkle tree of commitments.
If we build our blockchain out of Nova commitments, it will about couple of
hundred times larger than one built out of regular hashes,
but will still only occupy about ten or twenty gigabytes of storage.
Bandwidth limits restrict us to about twenty transactions a second,
which is still faster than the bitcoin blockchain.
Plus, when we hit ten or twenty transactions per second,
we can shard the blockchain, which we can do because each shard can prove
it is telling the truth about transactions, whereas with bitcoin,
every peer has to evaluate every transaction,
lest one shard conspire to cheat the others.
[Nova] does not appear to have a language.
Representing a proof system as a Turing machine just seems like a bad idea.
It is not a Turing machine.
You don't calculate $a=b*c$ you instead prove that
$a=b*c$, when you already somehow knew $a$, $b$, and $c$.
A Turing machine is a state machine. A proof system is not.
It is often said, and is in a sense true, that prover produces a proof
that for a given computation he knows an input such that after a
correct execution of the computation he obtains a certain public output.
But no he his not. The proof system proves that relationships hold between values.
And because it can only prove certain rather arcane and special things about
relationships between values, you have to compute a very large number
of intermediate values such that the relationship you actually want to prove
between the input and the output corresponds to simple relationships between
these intermediate values. But computing those intermediate values belongs
in an another language, such as C++ or rust.
With Nova, we would get an algorithm such that you start out with your real input.
You create a bunch of intermediate values in a standard language like C++ or rust,
then you call the proof system to produce a data structure
that can be used to prove relationships between your input and those
intermediate values.
Then you produce the next set of intermediate values,
call your proof system to produce a data structure
that can be used to prove the next set of relationships,
fold those two proof generating data structures together,
rinse and repeat,
and at the end you generate a proof that the set of relationships
the fold represents is valid.
That is procedural, but expressing the relationships is not.
Since your fold is the size of the largest hamiltonian circuit so far,
you want the steps to be all of similar size.
## Halo 2
Halo 2 is a general purpose snark library created for ZCash,
replacing their earlier library and using a different curve.
It directly supports performing an SHA2 hash inside the proof and verification.
I don't know how fast that is, and I did not immediately find
any examples recursing over an SHA merkle tree and merkle chain.
This suggests a functional language (sql). There are, in reality,
no purely functional languages for Turing machines.
Haskell has its monads, sql has update, insert, and delete.
But the natural implementation for a proof system would be a truly purely functional language, an sql without update, insert, or delete, without any operations that actually wrote anything to memory or disk, that simply defined relationships without a state machine that changes state to write data into memory consistent with those changes.
The proof script has to be intellible, and the same for prover and verifier,
the difference being that the prover interleaves the proof language with ordinary code
in ordinary language, to produce the values that are going to be proven. The prover
drives the script along with ordinary language code, and verifier drives it along
with different ordinary language code, but the proof definition that is common
to both of them has no concept of being sequential and driven along,
no concept that things are done in any particular order.
It is a graph of relationships.
The proof language, as is typical of purely functional languages,
should consist of assertions of about relationships between immutable
data structures, without expressing the idea that some of these
data structures were created at one time, and destroyed at another.
Some of these values are defined recursively, which means that what
is actually going to happen in practice is that they are going to be
created by a loop, written in the ordinary procedural language
such as Rust or C++, but the proof language should have no concept of that.
if the proof language asserts that $1 \leq 0 \lor n<20 \implies f(n-1)= g(f(n))$,
the ordinary procedural language will likely need to
generate the values of $f(n) for n=1 to 19,
and will need to cause the proof language to generate proofs for each value
of $n$ for 1 to 19, but the resulting proof will be independent
of the order in which these these proofs were generated.
Purely functional languages like sql do not prescribe an algorithm, and need
a code generator that has to make guesses about what a good algorithm would
be, as exemplified by sqlite's likelihood, likely, and unlikely no-ops.
And with a proof system we will, at least at first, have the human
make the algorithm but if he changes the algorithm,
while leaving the proof system language unchanged, will still work.
The open source nature of Plonky is ... complicated.
The repository on Github has been frozen for two years, so likely
does not represent the good stuff.
# Peer to Peer
## Freenet
Freenet has long intended to be, and perhaps is, the social
application that you have long intended to write,
and has an enormous coldstart advantage over anything you could write,
no matter how great.
It also relies on udp, to enable hole punching, and routinely does hole punching.
So the only way to go, to compete, is to write a better freenet within
freenet.
One big difference is that I think that we want to go after the visible net,
where network addresses are associated with public keys - that the backbone should be
ips that have a well known and stable relationship to public keys.
Which backbone transports encrypted information authored by people whose
public key is well known, but the network address associated with that
public key cannot easily be found.
Freenet, by design, chronically loses data. We need reliable backup,
paid for in services or crypto currency.
filecoin provides this, but is useless for frequent small incremental
backups.
## Bittorrent DHT library
This is a general purpose library, not all that married to bittorrent
It is available of as an MSYS2 library , MSYS2 being a fork of
the semi abandoned mingw libary, with the result that the name of the
very dead project Mingw-w64 is all over it.
Its pacman name is mingw-w64-dht, but it has repos all over the plac under its own name
It is async, driven by being called on a timer, and called when
data arrives. It contains a simple example program, that enables you to publish any data you like.
## libp2p
[p2p]:https://github.com/elenaf9/p2p
{target="_blank"}
[libp2p]:ipns://libp2p.io/
{target="_blank"}
The generically named [p2p] does exactly what you want to do.
It is a thin wrapper around [libp2p] to allow participants in the
Kademlia cloud to find each other and send each other private messages.
Unfortunately Kademlia is broken and extremely vulnerable
to hostile action,
and [libp2p] has only encryption operations around a broken name system
which they are making even more user hostile than it already is
because they don't want anyone using it (because broken),
uses encryption libraries
for which there is strong reason to suspect enemy activity,
and does not support Schnorr keys, nor Ristretto,
which is an obstacle to scriptless scripts and joint signatures.
The reason they do not support Schnorr is that they are using nonprime groups,
and doing a Schnorr signature safely in a non prime group is incomprehensibly hard
and very easy to get subtly wrong.
The great strength of Ristretto is that it is prime,
which makes a whole lot of clever cryptography available to do clever things,
Schnorr signatures, scriptless scripts, lightning locks,
and compact joint signatures among them.
[libp2p] have a small set of name systems and public key systems,
and it should not be too hard to add yet another to that set.
It appears to be extensible,
because it has to support no end of old obsolete stuff.
Obviously new stuff has been added from time to time,
so it should be possible to find the additions in git and follow their example.
[multiple transport schemes]:ipns://docs.libp2p.io/concepts/transports/listen-and-dial/
{target="_blank"}
[libp2p] supports [multiple transport schemes], and can support a set
of peers using heterogeneous transport.
So you just have to add another transport scheme,
and not everyone has to update simultaneously.
They use TCP and web, but there is a plugin point for new transport schemes,
so just plug in UDP under an encryption and reliability layer.
[libp2p] should make it possible to access the IPFS
both to write and read stuff, though that might be far from trivial.
You could perhaps publish stuff on IPFS that looks like a normal html
document, but contains embedded cryptographic data giving it more forms
of interaction when viewed on your browser, than viewed on brave.
Replacing Kademlia for finding peers in the face of
enemy entryist action is a big project, though libp2p seems to have
taken the essential step of identifying peers by their public key,
rather than IP and port address.
[implementations]:http://libp2p-io.ipns.localhost:48084/implementations
{target="_blank"}
[distributed hash table]:https://github.com/libp2p/specs/blob/master/kad-dht/README.md
{target="_blank"}
Their [distributed hash table] (kad-dht) seems to be very much a work in progress. Checking their [implementations] page for the status of various libp2p components, *everything*
seems to be very much a work in progress. Some things are implemented
in some languages which are not implemented in other languages. Rust has
hole punching, C++ does not. But C++ has a whole lot of stuff that Rust
does not. And their documentation on kad-dht has its page blank. On the other hand,
ipfs works, and polkadot works. So, something is usable. libp2p-peer not implemented
in C++, but is implemented in rust, and is implemented in browser javascript.
Their rendevous protocol presupposes a single central and known server
which simply records everone's ID and network address. Unacceptable.
Anyone using this is fake and an enemy. Should be disabled in our fork.
libp2p is a pile of odds and ends and a framework for gluing them together.
But everything they do is crippled by the fact that you don't know the
likely uptime, downtime, or IP stability of an entity. Which cannot
in fact be known, but you can form probability of a probability estimates.
What is needed is that everyone forms their own probability of a probability.
And they compare what they know
(they have a high probability of a probability)
with other party's estimates, and rate the other party's reliability accordingly
If we add to that probability of probability estimate of IP and port stability
and use it to govern ping time and keep around time, that goes a large way
to solving the the problems with Kademlia
We can adapt it to the problem
by having them preferentially keep around the data for peers
that have stable ip and a stable port, and,
somewhat less preferentially, keep around peers that have a
stable nat penetration or tracking relationship with a peer that has a
stable ip and stable port. Selective pinging. You rarely ping peers
that have been around for a very long time with stable IP, and you
ping a peer that has a nat penetration relationship by not pinging it,
and instead asking the gateway peer how the relationship is going,
at infrequent intervals. Thus, a peer with stable IP
or stable relationship becomes very widely known.
Well, becomes widely known assuming shills do not register
one billion addresses that happen to be near him.
libp2p is something between actual code, and set of standards -
which standards you comply with so that you can use other people's code.
Someone writes something ad hoc for his use case, stuffs it into libp2p
somewhere somehow so that he can use other people's code.
Then other people use his code.
It is a pile of standards (many of them irritatingly stupid, incomplete, ad hoc,
or workarounds for using defective tools) that enable a
whole lot of people writing this sort of thing to copy a whole lot of
each other's code.
Their [NAT discovery algorithm](https://github.com/libp2p/specs/tree/master/autonat){target="_blank"}
Is particularly idiotic and broken. It is not a nat discovery algorithm,
but a closed port discovery algorithm, and a ludicrously laborious,
costly, indirect, error prone, and inefficient closed port discover algorithm.
NAT means the other guy sees your network address different from what you see.
In which case your port is probably closed, but could well be open.
If he sees the same network address as you, your port might be open,
but you don't know that,
and talking to the other guy might well temporarily open your port,
with the result that he might tell you that you are not behind a NAT,
when in fact you are, and your ports are normally closed.
The guys writing this stuff are dumb as posts,
and a whole lot of what they write is garbage.
But, nonetheless, a whole lot of people are using libp2p,
and a whole lot of people are doing a whole lot of work on it --
not all of which is ready for prime time.
# Wireguard, Tailwind, and identity
Wireguard is a secure vpn.
Tailwind is peer to peer built on Oauth2 and Wireguard. Lacks peering
through NAT facilities, though this is probably not hard to fix or add. The
Tailwind server just tells both peers to start pinging each other
simultaneously, and tells each peer when the other peer has acked a
meeting time. With most nats, the first ping to arrive after a ping has been
sent will get through. Wireguard has a provision to keep on pinging.
If a peer has a stable IP and port accessible from the internet, it does not
ping. The other guy has to ping. If he does not ping for a while, the
connection may stop working. If the peer with the stable IP and port finds
it cannot get through, discards the connection information, but if it never
attempts to use that information, it may hang around for a very long time.
Oauth is a generic interface to identity protocols. Anyone can implement
Oauth in any way.
The Zooko identity model is that each party has his own mapping between
non unique Zooko human readable, typeable, and memorable names, and
globally unique non human memorable, non human typeable, public keys.
We put on top of that a consensus mapping, which is mutable. The end
user sees his local name for an identity, that identity's name for itself, and
a recent consensus human readable human typeable name for that identity.
For major and important widely known identities these should all be the
same, and the end user should see a single short human readable name. If
the end user sees two or three different human readable names for a
counterparty, there is likely to be an issue. If he sees three different
human readable names, plus the public key, definitely an issue.
The end user's mapping from local petnames to global keys is locally unique, and mutable at the end user's discretion.
The consensus mapping is mutable by consensus.
For friends who are well known to himself, but not well known to others,
the global consensus name may merely be a distraction, and he may turn it
off. If someone is on his buddy list, people whitelisted the global
consensus name is turned off, unless it is the same, in which case it is
turned on, and if the consensus changes, the end user sees that change.
# Existing cryptographic social software
Maverick says:
[Manyverse]:https://www.manyver.se/
{target="_blank"}
[Scuttlebutt]:https://staltz.com/an-off-grid-social-network.html
{target="_blank"}
If looking for something to try out that begins to have the right shape, see [Manyverse] that uses the [Scuttlebutt] protocol. Jim is fond of Bitmessage and it is quite secure, but it has a big weakness in that it needs to flood every message to all nodes in the world so everyone can try their private key to see if it works for decryption. That won't scale. (Can't avoid passing every message on to all callers even if you know it's yours, as you don't want to let someone snooping see that you absorbed a message which is an information disclosure.)
Instead [Manyverse] and [Scuttlebutt] allow for publishing and reputation for a public key. The world can see what a key (its author really) publishes. Can publish public posts like a blog, signed by the private key for authenticity and verifiability. Also can publish private messages (DMs), visible to the world but whose contents are encrypted. Weakness in private messages is that the recipient public key is visible on the message - good for routing and avoiding testing every message, bad for privacy. Would be better to have a 3rd mode, private for "someone" but you have to test your private key to see if it's for you. That should not be hard to add to [Scuttlebutt] and [Manyverse].
Reputation is similar to what Jim has proposed: You can specify a list of primary keys/people (friends) you want to listen to and watch. You can also in the interface specify how many degrees of separation you want to see outward from your friends - the public messages from their friends and friends' friends. Presumably specifying 6 degrees gets you Kevin Bacon and the rest of the [Manyverse]. You can also block friends' friends so their connections are not visible to you - so if you don't like a friend's friend you at least don't have to listen any more.
Another advantage is that [Manyverse] works in a sometimes-connected universe: Turn off your computer or phone for days, turn back on, catch up to messages. You realy don't even have to be on the public Internet, you could sneakernet or local/private-net messages which is nice for, say, messaging in a disaster or SHTF scenario where you have a local wifi network while the main network connections are down. Bitmessage has a decay/lifetime for messages that means you need to be connected at least every 2-3 days.
Biggest weakness is hosting. Your service can be hosted by 3rd parties like any service, and you can host your own. Given the legal landscape as well as susceptibility to censorship via DDoS and hack attacks, you want to have your own server. There are some public servers but sensibly they don't want a rando or glowie from the net jumping on there to drop dank memes. But hosting is nontrivial to carve out your own network bubble that can see the Internet (at least periodically) while being fully patched and DDoS resistant.
Of course missing from this from Jim's long list of plans are DDoS protection, a name service that provides name mapping to key hierarchies for messaging and direct communications, and a coin tie-in. But [Manyverse] at least has the right shape for passing someone a message with a payment inside, while using a distributed network and sometimes connection with store-and-forward to let you avoid censorship-as-network-damage. A sovereign corporation can also message publicly or privately using its own sovereign name and key hierarchy and private ledger-coin.
The net is vast and deep. Maybe we need to start cobbling these pieces together. The era of centralized censorship needs to end. Musk will likely lose either way, and he's only one man against the might of so many paper tigers that happen to be winning the information war.
## Lightning node
[`rust-lightning`]:https://github.com/lightningdevkit/rust-lightning
{target="_blank"}
[`rust-lightning`] is a general purpose library for writing lightning nodes, running under Tokio, that is used in one actual lightning node implementation.
It is intended to be integrated into on-chain wallets.
It provides the channel state as "a binary blob that you can store any way you want" -- which is to say, ready to be backed up onto the social net.
# Consensus
I have no end of smart ideas about how a blockchain should work, but no
actual blockchain. Smart ideas are worth two cents a bale, but only if
already baled. Need to port in someone else's blockchain code, with a
bridge to their blockchain.
Then I can make a start on implementing my bright ideas as part of
working code.
[Near]:https://near.org/papers/the-official-near-white-paper/#introduction
{target="_blank"}
[Near] is actually implementing no end of things that I have been thinking
about, so seems like a good fit.
# Git submodules
Libraries are best dealt with as [Git submodules].
[Git submodules]: https://github.com/psi4/psi4/wiki/External-subprojects-using-Git-and-CMake
[build libraries]:https://git-scm.com/book/en/v2/Git-Tools-Submodules
Git submodules leak complexity and surprising and inconvenient
behaviour all over the place if one is trying to make a change that affects
multiple modules simultaneously. But having your libraries separate from
your git repository results in non portable surprises and complexity. Makes
it hard for anyone else to build your project, because they will have to, by
hand, tell your project where the libraries are on their system.
When one is developing code, you normally have a git branch. But the git
commit of the master project in which the submodule is contained does
not notice its subproject has changed, unless the subproject head has
changed. And the subject project head will not change if it points to a
name, rather than to a particular commit. For ones changes to a submodule
to be reflected in the master project in any consistent or predictable way,
the submodule has to be in detached head mode, with the head pointing
directly to a commit, rather than pointing to a branch that points to a
commit.
You will import a submodule from someone else's project, but eventually you and your team are going to make minor changes to it to customize it to your project. In which case you will need your own remote team repo, in place of the original other team's repo.
Construct the copied remote repositories so that their default branch is your tracking branch, not the upstream branch.
``` bash
git init --bare -b «our_branch»
```
Then, in your local repository where you created the new branch, reset
the remote in your local repository to the new remote by editing `.gitconfig`
and push «our_branch» from your local in which you created your team's new
submodule branch to the remote.
or
``` bash
git clone --bare «their_remote» -b«their_stable_branch_or_release_tag»
cd «our_new_remote»
git symbolic-ref HEAD refs/heads/«our_new_branch»
```
If you fail to set your remote to your team's default branch, then your
local repositories will keep getting reset back to their team's branch, and
chaos ensues.
When moving the remote of a submodule in your local repository (usually from their team's remote to your team's remote) you update `.gitmodules` in your superproject, and in each submodule that has submodules of its own, then
```bash
git submodule update --init --recursive --force
git submodule foreach --recursive 'git status && git switch «our_branch» && git status && git remote -v && git switch --detach'
```
Your submodule remotes propagate from `.gitmodules` file of your superproject, and their branch propagates from the superproject *and* from the remote default branch.
And the branch will *also* propagate from `.gitmodules` if `branch = ...` is set.
Because git is a distributed archive, it is perfectly possible, and often
necessary, to work with all these set to different values, *provided* that
everyone is mindful that they are set to different values, and the
consequences and implications of them being set to different values. Which
consequences and implications get complicated, unobvious, difficult to
predict, and surprising when you are working with submodules.
Git commands in master project do not look inside the subproject. They
just look at the subprojects head.
This means that signing off on changes to a submodule is irrelevant. One
signs off on the master project, which includes the hash of that submodule
commit.
When one is changing submodules for the use of a particular project,
making related changes in the master project and submodules, one should
not track the changes by creating and updating branch names in the
submodule, but by creating and
updating branch names in the containing module, so that the
commits in the submodule have no name in the submodule, the
submodule is always in detached head state, albeit the head may be
tagged. Names in submodules are primarily of value for
amendments to the submodule as an independent module,
intended to be used by multiple projects, and for this purpose, tags
are better than branch names. wxWidgets releases are identified by
tag, not by branch, and the names of branches are only used to
communicate a particular project on the submodule to other people
working on that project as their master project.
Branch names within a submodule, though very useful when you working
on a submodule, are not useful to the project as a whole, and except for the
primary fork name, should be temporary and local., not pushed to the
project repository, But when you are modifying the submodules in a
project as a single project, making related changes in the module and
submodule, the shared names that are common to all developers belong in
the primary project module,and when you have done with a submodule,
```bash
git switch --detach
```
From the point of view of the containing superproject, submodule
commits are nameless with detached head, except when you are working
on them, the name in primary module naming a group of related commits
in several submodules, which commits do not receive independent names
of their own, even though the commits have to be made within the
submodule, not in the containing module which names the complete set of
interrelated commits.
The submodule commits may well belong to different branches and tags in
the superproject, but the submodules know nothing of superproject
names, and the superproject knows nothing of submodules names.
The primary build should invoke the submodule build, which *will* check
each file in the submodule for changes, only when the submodule
detached head has changed. And therefore, you want it to change, you
want the submodule head to be nameless and detached, whenever you
modify a submodule as part of a larger project where you test your
changes by rebuilding the whole project to make sure all your related
changes fit together.
When tracking an upstream submodule that has submodules of its
own, which have their own upstreams
Update your version with
```bash
git pull upstream --recurse-submodules=on-demand «their-latest-release»
```
Make sure things still work. Get everything working. (You do have unit test, right?)
When you are working a submodule, your branch has to have a name, or
when you push it and pull it, strange things will happen. But the
superproject pushes and pulls by commit, not by name, so when you are
done,
then:
git submodule foreach --recursive 'git push`
```bash
git submodule foreach --recursive 'git switch --detach'
git submodule foreach --recursive 'git push`
```
As its own thing, a submodule has branches with names. As a component
of a superproject, it has nameless commits.
If you are in a submodule directory of the superproject, and you push and
pull, what you are pushing and pulling had better have a name, or else
unpleasant surprises will happen. If you are in the superproject directory
and pushing and pulling the whole thing, that commit better be detached.
You pull a named release of the project that is a submodule of your project
from `upstream`, diddling with it to make it work with your project, then
you push it to `origin` under its own name, the you detach it from its name,
so the superproject will know that the submodule has been changed.
All of which, of course, presupposes you have already set unit tests,
upstream, origin, and your tracking branch appropriately.
Even if your local modifications are nameless in your local submodule
repository, on your remote submodule repository they need to have a name
to be pushed to, hence you need to have a tracking branch in each of your
remote images of each of your submodules, and that tracking branch will
need to point to the root of a tree of all the nameless commits that the
names and commits in your superproject that contains this submodules point to.
You want `.gitmodules` in your local image of the repository to
reflect the location and fork of your new remote repository, with
your remote as its `origin` and their remote as its `upstream`.
You need an enormous pile of source code, the work of many people over
a very long time, and GitSubmodules allows this to scale, because the
local great big pile of source code references many independent and
sovereign repositories in the cloud. If you have one enormous pile of
source code in one enormous git repository, things get very very slow. If
you rely someone else's compiled code, things break and you get
accidental and deliberate backdoors, which is a big concern when you are
doing money and cryptography.
When your submodules are simply your copy of someone else code, it gets
little bit messy. When you change them, it gets messier.
And visual studio's handling of submodules is just broken and buggy. A
command that works in git-bash will produce unexpected surprising, and
unpleasant results in visual studio's git. I really need to give up on
visual studio, it is closed source code, and turning bad.
When one developer makes minor changes in submodule to make it work
with the whole project on which several developers are working on, no
end of mysterious grief ensues, because strange and curiously difficult to
identify differences appear between builds that Git would normally ensure
are the same build. Submodules are a halfway house between completely
absorbing the other party's code into your code, and using it as a prebuilt
library. Instead, we have walls dividing the project into pieces, which is a
lot less grief than on big pile of code, but managing those walls winds up
taking a lot of time, and mistakes get made because a git commit in a
project with submodules that have changed does not mean quite the same
thing, nor have quite the same behaviour, as git commit in a project with
unchanging submodules. But then truly integrating a project that is the
product of a great deal of time by a great many of people, and managing it
thereafter, is likely to take up a great deal more time.
Git Submodules is hierarchical, but source code has strange loops. The
Bob module uses the Alice module and the Carol module, but Alice uses
Bob and Carol, and Carol uses Alice and Bob. How do you make sure that
all your modules are using the same commit of Alice?
Well, if modules have strange loops you make one of them the master, and
the rest of them direct submodules of that master, brother subs to each
other, and they are all using the same commit of Alice as the master. And
you should try to write or modify the source code so that they all call their
brother submodules through the one parent module above them in the
hierarchy, that they use the source code of their brothers through the
source code of their master, rather than directly incorporating the header
files of their brothers at compile time, albeit the header file of the master
that they include may well include the header of their brother, so that they
are indirectly, through the master header file, including the brother header
file.
# Git subtrees
Git subtrees are an alternative to submodules, and many people
recommend them because they do not break the git model the way
submodules do.
But subtrees do not scale. If you have an enormous pile of stuff in your
repository, Git has to check every file to see if it has changed every time,
which rather rapidly becomes painfully slow if one is incorporating a lot
of projects reflecting a lot of work by a lot of people. GitSubmodules
means you can incorporate unlimited amounts of stuff, and Git only has to
check the particular module that you are actually working on.
Maybe subtrees would work better if one was working on a project where
several parts were being developed at once, thus a project small enough
that scaling is not an issue. But such projects, if successful, grow into
projects where scaling is an issue. And if you are a pure consumer of a
library, you don't care that you are breaking the git model, because you are
seldom making synchronized changes in module and submodule, exept to absorb
updates written by the upstream party or adjust build parameters that they
thave made
The submodule model works fine, provided the divisions between one
submodule and the next are such that one is only likely to make changes in
one module at at time.
# Passphrases
All wallets now use random words - but you cannot carry an eighteen word random phrase though an airport in you head
Should use [grammatically correct passphrases](https://github.com/lungj/passphrase_generator).
That library does not contain a collection of words organized by part of speech. Instead it calls a python library (word.net) of english, which has the information you actually need.
Using those dictionaries, the phrase (adjective noun adverb verb adjective
noun) can encode sixty eight bits of entropy. Two such phrases suffice,
being stronger than the underlying elliptic curve. With password
strengthening, we can randomly leave out one of the adjectives or adverbs
from one of the passphrases.
# Polkadot, Near, substack and gitcoin
It has become painfully apparent that building a blockchain is a very large project.
Polkadot is a blockchain ecosystem, and substack a family of libraries for
constructing blockchains. It is a lot a easier to refactor an existing
blockchain than to start entirely from scratch. [Near] is way ahead of me,
because not suffering from not invented here syndrome.
Polkadot is designed to make its ecosystem subordinate to the primary
blockchain, which I do not want - but it also connects its ecosystem to
bitcoin by De-Fi (or promises to do so, I don't know how well it works) so
accepting that subordination is a liquidity event. We can fix things so
that the tail will wag the dog once the tail gets big enough, as China licensed
from ARM, then formed a joint venture with ARM, then hijacked the joint
venture, once it felt it no longer needed to keep buying the latest ARM
intellectual property. Licensing was a fully subordinate relationship, the
joint venture was cooperation between unequal parties, and now ARM
China is a fully independent and competing technology, based on the old
ARM technology, but advancing it separately, independently, and in its
own direction. China forked the ARM architecture.
Accepting a fully subordinate relationship to get connected, and then
defecting on subordination when strong enough, is a sound strategy.
[Gitcoin]:https://gitcoin.co/
"Build and Fund the Open Web Together"
And talking about connections: [Gitcoin]
Gitcoin promises connection to money, and connection to a community of
open source developers. It is Polkadot's money funnel from VCs to
developers. The amount of cash in play is rather meagre, but it provides a
link to the real money, which is ICOs.
I suspect that its git hosting has been co-opted by the enemy, but that is
OK, provided our primary repo is not co-opted by the enemy.
# Installers
Looking at cmake, choco, deb, git, and rust crates, I see a development
environment being born, as people irregularly and ad hoc integrate with
each other's features.
Wine to run Windows 10 software under Linux is a bad idea, and
Windows Subsystem for Linux to run Linux software under Windows 10
is a much worse idea it is the usual embrace and extend evil plot by
Microsoft against open source software, considerably less competently
executed than in the past.
## The standard gnu installer from source
```bash
./configure && make && make install
```
## The standard cmake installer from source
After long and arduous struggle with CMake, I concluded:
That it is the hardest path from MSVC to linux.
That no one uses it as their first choice to go from linux to windows, so it
is likely to be a hard journey in the other direction.
I also found that the CMake scripting language was one of those
accidental languages.
CMakeLists.text was intended as a simple list of every file. And then one
feature after another was added, ad hoc, with no coherent plan and vision,
and eventually so many features as to become Turing Complete, but like
most accidental Turing complete languages, inconsistent, unpredictable, and
the code entirely opaque, and the whole way the developers did not
want their language to be used as a language.
CMake has gone down the wrong path, should have started with a known
language whose first class types are strings, list of strings, maps of
strings, maps of named maps of strings, and maps of maps, and CMake should
create a description of the build environment that it discovers, and a
description of the directory in which it was run in the native types of that
language, and attempt to create a hellow world program in that language
that invokes the compiler and the linker. Which program the developer
modifies as needed.
That MSVC's embrace of cmake is one of those embrace and extend
weirdness's, and will take you on a path to ever closer integration with
non free software, rather than off that path. Either that or the people
integrating it were just responding to an adhoc list of integration features.
That attempting a CMake build of the project using MSVC was a bad idea.
MingGW first, then MingGW integrated into vscode, in an all choco windows
environment without MSVC present.
```bat
choco install mingw pandoc git vscode gpg4win -y
```
Cmake does not really work all that well with the MSVC environment.\
If we eventually take the CMake path, it will be after wc and build on
MingGW, not before.
## vscode
Vscode has taken the correct path, for one always winds up with a full
language and full program running the build from source, and they went
with javascript. Javascript is an unworkable language that falls apart on
any large complex program, but one can use typescript which compiles to javascript.
A full language is needed to govern the compile from source of a large
complex program - and none of the ad hoc languages have proven very useful.
So, I now belatedly conclude the correct path is to build everthing under vscode.
On the other hand, the central attribute of both the makefile language and
the cmake language is dependency scanning, and we shall have to see how
good vscode's toolset is at this big central job.
## The standard Linux installer
`*.deb`
`debhelper` and `dh-make` provide a somewhat user friendly tool for
making deb files.
`*.deb` files are commonly built from `*.dsc` files, which are also
available in the repository.
Which gives you the option, under debian, of building your entire
toolchain, something not possible in windows. It is half way to the
goal of building your own linux from scratch, without the elaborate
process where you type in a hundred commands, and if you
mistype a single one of them, everything goes to hell and you do
not know where in the process you went off the rails. But if you
want people to build from source, you probably want them to
develop, in which case git is better than `*.dsc` files
The standard deb file builder integrated into debian is `git-buildpackage`.
But other systems like a `*.rpm` package, which is built by `git-buildpackage-rpm`
But desktop integration is kind of random.
Under Mate and KDE Plasma, bitcoin implements run-on-login by generating a
`bitcoin.desktop` file and writing it into `~/.config/autostart`
It does not, however, place the `bitcoin.desktop` file in any of the
expected other places. Should be in `/usr/share/applications`
The wasabi desktop file cat `/usr/share/applications/wassabee.desktop` is
```config
[Desktop Entry]
Type=Application
Name=Wasabi Wallet
StartupWMClass=Wasabi Wallet
GenericName=Bitcoin Wallet
Comment=Privacy focused Bitcoin wallet.
Icon=wassabee
Terminal=false
Exec=wassabee
Categories=Office;Finance;
Keywords=bitcoin;wallet;crypto;blockchain;wasabi;privacy;anon;awesome;qwe;asd;
```
To be in the menus for all users, should be in
`/usr/share/applications` with its `Categories=` entry set appropriately. Wasabi appears in the category `Office` on mate.
But what about the menu for just one user?
The documentation says `~/.local/share/applications`. Which I
do not entirely trust.
### autotools
Has a poorly documented and unexplained pipeline to `*.deb` files.
Plausibly `cmake` also has a pipeline, but I have not found it.
autotools is linux standard, is said to have a straightforward pipeline
into making `*.deb` files, and everyone uses it, including most of your
libraries, but I hear it cursed as a complex mess, and no one wants to
get into it. They find the far from easy `cmake` easier. And `cmake`
runs on all systems, while autotools only runs on linux.
MSYS2, which runs on Windows, supports autotools. So, maybe it does run
on windows.
[autotools documentation]:https://thoughtbot.com/blog/the-magic-behind-configure-make-make-install
{target="_blank"}
Despite the complaints about autotools, there is [autotools documentation]
on the web that does not make it sound too bad.
I believe `cmake` has a straightforward pipeline into `*.deb` files,
but if it has, the autotools pipleline is far more common and widely used.
## The standard windows installer
Requires an `*.msi` file. If the install is something other than an msi
file, it is broken.
[Help Desk Geek reviews tools for creating `*.msi`]: https://helpdeskgeek.com/free-tools-review/4-tools-to-create-windows-installer-packages/
{target="_blank"}
[Help Desk Geek reviews tools for creating `*.msi`]
1. First and formost, Nullsoft Scriptable Install System (NSIS) Small, simple, and powerful.
1. Last and least Wix and Wax: it requires the biggest learning
curve. You can create some very complex installers with it, but youll be coding quite a bit and using a command line often.\
And word on the internet is that complex installs created with
Wix and Wax create endless headaches and even if you get it
working in your unit test environment, it then breaks your
customer's machine irreversibly and no one can figure out why.
### [NSIS] Nullsoft Scriptable Install System
NSIS can create msi files for windows, and is open source.
[NSIS]:https://nsis.sourceforge.io/Download
{target="_blank"}
[NSIS Open Source repository]:https://sourceforge.net/projects/nsis/files/NSIS%203/3.08/RELEASE.html/view
{target="_blank"}
[NSIS Open Source repository]
NSIS is also available as an MSYS package
People who know what they are doing seem to use this open
source install system, and they write nice installs with it.
Unlike `Wix`, I hear no whining that any attempt to use its power will
leave you buggered and hopeless.
When I most recently checked, the most recent release was thirty
five days previous, which is moderately impressive, given that their
release process is somewhat painful and arduous.
### Wix
`Wix` is suffering from bitrot. The wix toolset relies on a framework
that is no longer default installed on windows, and has not been for
a very very long time.
But no end of people say that sucky though it is, it is the standard
way to create install files.
[Hello World for Wix]:https://stackoverflow.com/questions/47970743/wix-installer-msi-not-installing-the-winform-app-created-with-visual-studio-2017/47972615#47972615
{target="_blank"}
[Hello World for Wix] is startling nontrivial. It does not default create
a minimal useful install for you. So even if you get it working, still
looks like it is broken.
[Common Design Flaws]:https://stackoverflow.com/questions/45840086/how-do-i-avoid-common-design-flaws-in-my-wix-msi-deployment-solution
{target="_blank"}
[Common Design Flaws] do not sound entirely like design flaws. It
sounds like it is easy to create `*.msi` files whose behaviour is
complex, unpredictable, unexpected, and apt to vary according to
circumstances on the target machine in incomprehensible and
unexpected ways. "Works great when we test it. Passes unit test."
[Some practical Wix advice]:https://stackoverflow.com/questions/6060281/windows-installer-and-the-creation-of-wix/12101548#12101548
{target="_blank"}
[Some practical Wix advice] advises that trying to do anything
complicated on Wix is hell on wheels, and will lead to unending
broken installs out in the field that fuck over the target systems.
While Wix in theory permits arbitrarily complex and powerful
installs, in practice, no one succeeds.
"certain things are still coded on a case by case basis. These ad hoc
solutions are implemented as 'custom actions` in Windows Installer,"
And custom actions that involve writing anything other than file
properties, die horribly.
Attempts to install Wix on Visual Studio repeatedly failed, and
sometimes trashed my Visual Studio installation.
After irreversibly destroying Visual Studio far too many times,
attempted to install on a fresh clean virtual machine.
Clean install of Visual Studio on a vm worked, loaded my project,
compiled and built it almost as fast as my real machine. The
program it built ran fine and passed unit test. And then Visual
Studio crashed on close. Investigating the hung Visual Studio, it had
freed up almost all memory, and then just stopped running. Maybe
the problem is not Wix bitrot, but Visual Studio bitrot, since I did
not even get as far as trying to install Wix.
If the Wix installer is horribly broken, is it not likely that any install
created by Wix will be horribly broken?
The Wix Toolset, requires the net framework 3.5 in order to install it
and use it, which is the cobblers children going barefoot. You want
a banana, and have to install a banana tree, a monkey, and a jungle.
Network Framework 3.5.1 can be installed with Control
Panel/programs and programs/features.
You have to install the extension after the framework in that order,
or else everything breaks. Or maybe everything just breaks anyway
far too often and people develop superstitions about how to avoid
such cases.
## Choco
Choco, Chocolatey, is the Windows Package manager system. Does not use `*.msi` as its packaging system. A chocolatey package consists of an `*.nuget`, `chocolateyInstall.ps1`, `chocolateyUninstall.ps1`, and `chocolateyBeforeModify.ps1` (the latter script is run before upgrade or uninstall, and is to reverse stuff done by is accompanying
`chocolateyInstall.ps1 `)
Interaction with stuff installed by `*.msi` is apt to be bad.
The community distribution redirects requests to particular servers,
which have to be maintained by particular people - which requires
an 8GB ram, 50GB disk Windows server. I could have `nginx` in the
cloud reverse proxying that to a physically local server over
wireguard, which solves the certificate problem, or I could use a
commercial service, which is cheap, but leaks identity all over the
place and is likely to be subject to hostile interdiction and state sponsored identity theft.
Getting on the `choco` list is largely automatic. Your package has to
install on their standard image, which is a deliberately obsolete
2012 windows server - and your install script may have to install
windows update packages. Your package is unlikely to successfully
install until you have first tested it on an imitation of their test
environment, which is a great deal of work and skill to set up.
Human curation exists, but is normally routine and superficial.
Installs, has license, done.
[whole lot more checks]:https://docs.chocolatey.org/en-us/information/security#chocolatey.org-packages
{target="_blank"}
[whole lot more rules]:https://docs.chocolatey.org/en-us/community-repository/moderation/package-validator/rules/
{target="_blank"}
Well, actually there are a [whole lot more checks], which enforce a [whole lot more rules], sixty eight rules and growing, but they are robotically checked and the outcome reported to human. If the robot OKs it, it normally goes through automatically into the community distribution.
A Choco package is immutable. Can be superseded, but cannot
change. Could have the program check for a Zooko signature of its package file against a list, and look for indications of broad
approval, thus solving the identity problem and eating my own dogfood.
Choco packages would be very handy to automatically install my build environment.
### Cmake
It is now apparent that CMake is the new standard. Unix makefiles
(`.configure && make && make install`) have become unworkable, are dying
under bitrot, and no one is maintaining them any more.
Every ide has its own replacement for makefiles, most of them also broken
to a greater or lesser extent, and now ides are moving to CMake. If a folder has
a CMakeLists.txt file in its root, or the CMake build file, it is a project, and
the existing project files in the existing format are now obsolete, even though
they will continue to be used for a very long time.
`cmake` has a pipeline for building choco files.
[wxWidgets has instructions for building with Cmake]:https://docs.wxwidgets.org/trunk/overview_cmake.html
{target="_blank"}
[wxWidgets has instructions for building with Cmake]. My other
libraries do not, and require their own idiosyncratic build scripts,
and I doubt that I can do what the authors were disinclined to do.
Presumably I could fix this with `add_custom_target` and
`add_custom_command`, where the custom command is bash script
that just invokes the author's scripts, but I just do not understand
the documentation for these commands, which documentation
resupposes knowledge of the incomprehensible domain specific language.
`Cmake` runs on both Windows and Linux, and is a replacement for autotools, that runs only on Linux.
Going with `cmake` means you have a defined standard cross platform development environment,
vscode` which is wholly open source, and a defined standard cross platform packaging system,
or rather four somewhat equivalent standard packaging systems, two for each platform.
`vscode` has the worlds worst build system, but is now, like every ide, moving to cmake.
It is the build system that is the most important part of an ide --
or it used to be, but now it is, or soon will be, the quality of its integration with cmake.
Instead of
```bash
./configure
make
make install
```
We have
```bat
cmake ..
cmake --build .
cmake --install .
```
`cmake --install` installs from source, and has a pipeline (`cpack`)
to generate `*.msi` through [NSIS]. Notice it does *not* have a pipeline
through Wix and Wax. It also has a pipeline to Choco, and, on linux,
to `*.deb` and `*.rpm`.
No uninstall, which has to be hand written for your distribution.
`cmake` has the huge advantage that with certain compilers, far from
all of them, it integrates with the vscode ide, including a graphical
debugger that runs on both windows and linux. Which otherwise
you really do not have on linux.
It thus provides maximum cross platform portability. On the other
hand, all of my libraries rely on `.configure && make && make install`
on linux, and on visual studio on Windows. In my previous
encounter with `cmake`, I found mighty good reason for doing it that
way. The domain specific language of `CMakeLists.txt` is arcane,
unreadable, unwriteable, and subject to frequent, arbitrary,
inconsistent, and illogical ad hoc change. It inexplicably does
remarkably complicated things without obvious reason or purpose,
which strange complexity usually does things you do not want.
Glancing through their development blog, I keep seeing major
breaking changes being corrected by further major breaking
changes. Internals are undocumented, subject to surprising change,
and likely to change further, and you have to keep editing them,
without any clearly knowable boundary between what is internal
stuff that you should not need to look at and edit, and what is the
external language that you are supposed to use to define what
`cmake` is supposed to accomplish. It is not obvious how to tell `cmake` to do a certain thing, and looking at a `CmakeLists.txt` file, not at all obvious what `cmake` is going to do. And when the next
version comes out, probably going to do something different.
But allegedly the domain specific language of `./configure` has
grown a multitude of idiosyncrasies, making it even worse.
`ccmake` is a graphical tool that will do some editing of
`CMakeLists.txt` with respect for the mysterious undocumented
arcane syntax of the nowhere explained or documented domain
specific language.
# Library Package managers
Cmake is the new source code package manager, obsoleting all others.
# Multiprecision Arithmetic
I will need multiprecision arithmetic if I represent information in a base or
dictionary that is not a power of two.
[MPIR]:]http://mpir.org/
{target="_blank"}
[GMP]:https://gmplib.org
{target="_blank"}
The best libraries are [GMP] for Linux and
[MPIR] for windows. These are reasonably
compatible, and generally only require very trivial changes to produce a Linux
version and a windows version. Boost attempts to make the changes invisible,
but adds needless complexity and overhead in doing so, and obstructs control.
MPIR has a Visual Studio repository on Github, and a separate Linux repository
on Github. GMP builds on a lot of obscure platforms, but not really supported
on Windows.
For supporting Windows and Linux only, MPIR all the way is the way to go. For
compatibility with little used and obscure environments, you might want to
have your own custom thin layer that maps GMP integers and MPIR integers to
your integers, but that can wait till we have conquered the world.
My most immediate need for MPIR is the extended Euclidean algorithm
for modular multiplicative inverse, which it, of course, supports,
`mpz_gcdext`, greatest common divisor extended, but which is deeply
hidden in the [documentation](http://www.mpir.org/mpir-3.0.0.pdf).
# [wxWidgets](./libraries/building_and_using_libraries.html#instructions-for-wxwidgets){target="_blank"}
# Secure compilation
I am currently using Visual Studio, the most powerful, convenient,
and useful code development system around. But increasingly owned by
enemies of increasing wickedness and diminishing competence. Also,
completely different, and not altogether compatible with, what is needed
to build code on linux.
I attempted to build wxWidgets using MingGW which is open source, and failed.
Git is open source, and operated by good people, but its hash function is
insecure, and its signing system relies on Gpg, which is designed to be
part of the Web of Trust, which no longer exists and never was entirely
working, and never designed for the use to which Git uses it.
After we get a signing and security system, which will not be for a while,
we should create a fork of Git that actually is secure.
[Build environment for Git for Windows]:https://github.com/git-for-windows/build-extra{target="_blank"}
[Build environment for Git for Windows] is a package that builds a package
manager that installs packages that can build Git on windows. But it is a
package manager, not a pile of compilers and build tools, a package
manager that merely installs precompiled files from who knows where,
compiled by who knows who?
What we actually need is a full development environment that can build a
full development environment, and you can have multiple versions of the
tools on the same machine, and can select one or more all of the newly
modified tools for your build environment, or build a full install package
from source, including compilers, make utilities, ide, and git. When a
source file for the ide or one of its components changes, the default full
build action being to build the new component, and switch to it, but the
release components are not overwritten, and you can switch back until you
explicitly overwrite the release version by running a newly built install
package or manually copy a newly built component over the release
component.
There should be a unit test, of course, and should unit test fail, the default
action should be to switch back to the release version, and open up the
source code hinted by the unit test failure.
It should be a development environment that provides special case
handling for development of the development environment.
All this is very far indeed from what [Build environment for Git for Windows]
provides, and creating it would be an enormous project, but the
only way to prevent toolchain attacks is to make toolchain development
readily available to everyone.
# Networking
## notbit client
A bitmessage client written in C. Designed to run on a linux mail server
and interface bitmessage to mail. Has no UI, intended to be used with the linux mail UI.
Unfortunately, setting up a linux mail server is a pain in the ass. Needs the Zooko UI.
But its library contains everything you need to share data around a group of people, many of them behind NATs.
Does not implement NAT penetration. Participants behind a NAT are second class unless they implement port forwarding, but participants with unstable IPs are not second class.
## Game Networking sockets
[Game Networking Sockets](https://github.com/ValveSoftware/GameNetworkingSockets)
A reliable udp library with congestion control which has vastly more development work done on it than any other reliable udp networking library, but which is largely used to work with Steam gaming, and Steam's closed source code. Has no end of hooks to closed source built into it, but works fine without those hooks.
Written in C++. Architecture overly specific and married to Steam. Would
have to be married to Tokio to have massive concurrency. But you don't
need to support hundreds of clients right away.
Well, perhaps I do, because in the face of DDOS attack, you need to keep
a lot of long lived inactive connections around for a long time, any of
which could receive a packet at any time. I need to look at the
GameNetworkingSockets code and see how it listens on lots and lots of
sockets. If it uses [overlapped IO], then it is golden. Get it up first, and it put inside a service later.
[Overlapped IO]:design/server.html#the-select-problem
{target="_blank"}
The nearest equivalent Rust application gave up on congestion control, having programmed themselves into a blind alley.
## Tokio
Tokio is a Rust framework for writing highly efficient highly scalable
services. Writing networking for a service with large numbers of clients is
very different between Windows and Linux, and I expect Tokio to take care
of the differences.
There really is not any good C or C++ environment for writing services
except Wt, which is completely specialized for the case of writing a web
service whose client is the browser, and which runs only on Linux.
## wxWidgets
wxWidgets has basic networking capability built in and integrated with its
event loop, but it is a bit basic, and is designed for a gui app, not for a
server though probably more than adequate for initial release. It only
supports http, but not https and websockets.
[LibSourcery](https://sourcey.com/libsourcey) is a far more powerful
networking library, which supports https and websockets, and is designed to
interoperate with nginx and node.js. But integrating it with wxWidgets is
likely to be nontrivial.
WxWidgets sample code for sockets is in %WXWIN%/samples/sockets. There is a
[recently updated version on github]. Their example code supports TCP and
UDP. But some people argue that the sampling is insufficiently responsive -
you really need a second thread that damned well sits on the socket, rather
than polling it. And that second thread cannot use wxSockets.
[recently updated version on github]:https://github.com/wxWidgets/wxWidgets/tree/master/samples/sockets
Programming sockets and networking in C is a mess. The [much praised guide
to sockets](https://beej.us/guide/bgnet/html/single/bgnet.html) goes on for
pages and pages describing a simple example client server. Trouble is that
C, and old type Cish C++ exposes all the dangly bits. The [QT client server
example](https://stackoverflow.com/questions/5773390/c-network-programming),
on the other hand, is elegant, short, and self explanatory.
The code project has [example code written in C++](https://www.codeproject.com/Articles/13071/Programming-Windows-TCP-Sockets-in-C-for-the-Begin), but it is still mighty intimidating compared to the QT client server example. I have yet to look at the wxWidgets client server examples but looking for wxWidgets networking code has me worried that it is a casual afterthought, not adequately supported or adequately used.
ZeroMQ is Linux, C, and Cish C++.
Boost Asio is highly praised, but I tried it, and concluded its architecture
is broken, trying to make simplicity and elegance where it cannot be made,
resulting in leaky abstractions which leak incomprehensible complexity the
moment you stray off the beaten path I feel they have lost control of their
design, and are just throwing crap at it trying to make something that
cannot work, work. I similarly found the Boost time libraries failed, leaking
complexity that they tried to hide, with the hiding merely adding complexity.
[cpp-httplib](https://github.com/yhirose/cpp-httplib) is wonderful in its
elegance, simplicity, and ease of integration. You just include a single
header. Unfortunately, it is strictly http/https, and we need something that
can deal with the inherently messy lower levels.
[Poco](http://pocoproject.org/) does everything, and is C++, but hey, let us first see how far we can get with wxWidgets.
Further, the main reason for doing https integration with the existing
browser web ecosystem, whose security is fundamentally broken, due the
states capacity to seize names, and the capacity of lots of entities to
intercept ssl. It might well be easier to fork opera or embed chromium. I
notice that Chromium has features supporting payment built into it, a bunch
of PaymentMethod\*\*\*\*\*Event
The best open source browser, and best privacy browser, is Opera, in that it comes from an entity less evil than Google.
[Opera](https://bit.ly/2UpSTFy) needs to be configured with [a bunch of privacy add ons](https://gab.com/PatriotKracker80/posts/c3kvL3pBbE54NEFaRGVhK1ZiWCsxZz09) [HTTPS Everywhere Add-on](https://bit.ly/2ODbPeE),
[uBlock](https://bit.ly/2nUJLqd), [DisconnectMe](https://bit.ly/2HXEEks), [Privacy-Badger](https://bit.ly/2K5d7R1), [AdBlock Plus](https://bit.ly/2U81ddo), [AdBlock for YouTube](https://bit.ly/2YBzqRh), two tracker blockers, and three ad blockers.
It would be great if we could make our software another addon, possibly chatting by websocket to the wallet.
The way it would work be to add another protocol to the browser:
ro://name1.name2.name3/directory/directory/endpoint. When you connect to such
an endpoint, your wallet, possibly a wallet with no global name, connects to
the named wallet, and gets IP, a port, a virtual server name, a cookie
unique for your wallet, and the hash of the valid ssl certificate for that
name, and then the browser makes a connection to the that server, ignoring
the CA system and the DNS system. The name could be a DNS name and the
certificate a CA certificate, in which case the connection looks to the
server like any other, except for the cookie which enables it to send
messages, typically a payment request, to the wallet.
# zk-snarks
The most advanced, and most useful for blockchains, zk-snark technology
is polygon, which claims to have finally found the holy grail: the
actually useful generation and verification of proofs of verification.
So that Bob can not only verify that Ann's information is what she says
it is without knowing that information, Carol can verify that Bob
verified, and Dave can verify that Carol verified it.
Which gives us scaling. Bob can verify that several people's
transactions are valid, Carol can verify several Bobs, and Dave
can verify several Carols.
I have seen no end of claims that zk-snark system can do so and so, when,
though it can in principle do so and so, actually getting it to do
so and so is very hard and they have not quite managed to get it quite
working, or they have actually gotten it to work but there are a bunch of complicated gotchas that make it impractical, or unwise, or not very
useful to do so and so.
But I have also seen a great deal of real progress in solving these
problems, albeit the progress tends to be overpromised and underdelivered,
but the for all that, the progress is real and substantial.
[Aurora]:https://eprint.iacr.org/2018/828.pdf
{target="_blank"}
Supposedly there is a language, R1CS, such that you can express a
program that gives a true false answer, such that [Aurora] can execute
the program and generate a prover and a verifier.
[starkware]:https://iacr.org/submit/files/slides/2021/rwc/rwc2021/1005/slides.pdf
{target="_blank"}
According to [starkware], they have the fastest proving time, but their
proofs are rather large, 138KiB, Groth16 Snarks have the most compact
proofs.
Not actually seeing it as a useful library yet that I could actually use, but
more like a proof of principle that someone could build such a library.
To be actually useful, a zk-snark system needs to be a compiler, that
compiles a program written in what Starkware calls R1CS, and other
people are calling script, and generates two programs, a prover and a\
verifier.
The prover operates on two blobs, the public blob and private blob, and
produces a boolean result, true or false, pass or fail, and a proof that it\
did so.
The proof is approximately constant size, regardless of how much
computation is required and regardless of how large the private blob was,
but takes a very long time.
The verifier operates on the public blob and the proof, takes a short and
approximately constant time to do so, regardless of how big the
computation was, and regardless of how big the private data was and
determines, with 2^(126) likelihood of error, what result the prover got.
But at present I get the impression that neither script nor R1CS have any
real existence, though I have seen a script language that operates on a
stack, and, though it has no variables, can dupe any item on the stack to
the top of the stack. It seems to have only been ever used to generate one
prover and one verifier, because actually creating the prover and verifier
still required some coding by hand. Also lacked certain control structures.
At present, people seem to be writing the prover and the verifier by hand, a
very difficult operation with a very high likelihood of bugs. The prover and
the verifier do very simple tasks like proving the encoded inputs to a
transaction are greater than or equal to the encoded outputs and that no
numeric underflow or overflow occurred.
Another problem is that we would really like the public data to be the root
hash of a merkle tree, and no one seems to have a script language that
contains a useful hash function Stackware is built out of hash functions,
but last time I looked, you could not call a hash function from R1CS.
We need a script language that can not merely add and subtract, but can also
do hashes and elliptic point operations. zk-stark systems are built out of
hashes and elliptic point operations, but it seems to be uphill trying to
generate proofs that prove something about the results of hashes and
elliptic point operations, making very difficult to produce a proof that a
pile of proofs in the pre-image of a merkle tree have been verified. I
suspect that a prover might take a very very long time to produce such a
proof.
The proofs are succinct, in that you can prove something about a gigantic
pile of data and the size of the proof and the time taken to verify scarcely
grows - about 128 KiB, for the smallest that anyone would care about, to
utterly gigantic proofs. But proof generation is not all that fast, and grows
with the matter to be proven, so to be useful for utterly gigantic proofs,
you would need to be able to distribute proof generation over an enormous
multitude of untrusting shards. Which you can obviously do by proving a
verification. Not sure how long it takes to produce a proof that a large
number of proofs were verified.
What you want is to be able to prove that a final hash is the root of of an
enormous merkle tree, some generalization of a Merkle-patricia tree,
representing an immutable append only data structure consisting of a
sequence of piles of transactions, and the state generated by these
transactions, represents a valid branch of a chain of signatures, that the
final state is correctly derived by applying the batch of transactions to
the previous state.
And then you want to do this for states so enormous, and piles of
transactions so enormous, that no one person has all of them.
And then you still have the problem of resolving forks.
You would like to have a blockchain of blockchains of blockchains, such
that your state, and your transactions, are divided into a product of
substates, with consensus on each substate advancing a bit ahead of the
consensus on the combination of several substates, so that transactions
within a substate finalize fast, but transactions between substates take
longer. (because the number of forks of the product state is the product of
the number of forks of each substate)
Each of the substates very quickly comes up with a proof that a transaction
within a substate is valid and quickly comes up with consensus as to
which fork everyone is on, but the proof for a transaction between
substates is finalized quickly in the paying substate, and quickly affects
the paying substate, but the transaction does not get included in the state
that is a product of the receiving and paying substate for a while, does not
get proven valid in the product substate for a while, and does not get
included in the receiving substate till a bit after than it is included in the
product substate, whereupon it is in due course quickly proven to be
a valid addition of value to the receiving substate. So that the consensus
problem remains manageable, we need insulation and delay between the
states, so that the product state has its own pile of state, representing the
delay between a transaction affecting a the payer factor state, and the
transaction affecting the payee factor state. A transaction has no immediate
affect. The payer mutable substate changes in a way reflecting the
transaction block at the next block boundary. And that change then has
effect on product mutable state at a subsequent product state block
boundary, changing the shares possessed by the substate.
Which then has effect on the payee mutable substate at its next
block boundary when the payee substate links back to the previous
product state.
# wxSqlite3
wxSqlite integrates a third free open source encryption library that appears
to use libSodium encryption algorithms into Sqlite to provide encrypted
databases, and integrates sqlite3 databases into one of the wxWidgets
tools, but not, however, the one that I actually want, wGrid.
More layers lead to more attack surface, so it would be better to use
wxSqlite as a model for the integration, rather than using it directly, and
then use a fork of the third party library, rather than using it directly.
# Safe maths
[Safeint]:https://github.com/dcleblanc/SafeInt
{target="_blank"}
We could implement transaction outputs and inputs as a fixed amount of
fungible tokens, limited to $2^{64}-1$ tokens, using [Safeint] That will be
future proof for a long time, but not forever.
Indeed, anything that does not use Zksnarks is not future proof for the
indefinite future.
Or we could implement decimal floating point with unlimited exponents
and mantissa implemented on top of [MPIR]
Or we could go ahead with the canonical representation being unlimited
decimal exponent and unlimited mantissa, but the wallet initially only
generates, and only can handle, transactions that can be represented by[Safeint], and always converts the mantissa plus decimal exponent to and
from a safeint.
if we rely on safeint, and our smallest unit is the microrho, that is room for
eighteen trillion rho. We can start actually using the unlimited precision of
the exponent and the mantissa in times to come - not urgent, merely
architect it into the canonical format.
From the point of view of the end user, this will merely be an upgrade that
allows nanorho, picorho, femptorho, attorho, zeptorho, yoctorho, and allows a decimal point in yoctorho quantities. And then we go to a new unit, the jim, with one thousand yottajim equals one yoctorho, a billion yoctojim equals one attorho, a trillion exajim equals one attorho.
To go all the way around to two byte exponents, for testing purposes, will
need some additional new units after the jim. (And we should impose a
minimum unit size of $10^{-195}$ rho or $10{-6} rho, thereby ensuring
that transaction size is bounded while allowing compatibility for future expansion.)
Except in test and development code, any attempt to form a transaction
involving quantities with exponents less than $1000^{-2}$ will cause a
gracefully handled exception, and in all code any attempt to display
or perform calculations on transaction inputs and outputs for which no
display units exist will cause an ungracefully handled exception.
In the first release configuration parameters, the lowest allowed exponent
will be $1000^{-2}$, corresponding to microrho, and the highest allowed
exponent $1000^4$, corresponding to terarho, and machines will be
programmed to vote "incapable" and "no" on any proposal to change those
parameters. However they will correctly handle transactions beyond those
limits provided that when quantities are expressed in the smallest unit of
any of the inputs and outputs, the sum of all the inputs and of all the
outputs remains below $2^{64}$. To ensure that all releases are future
compatible, the blockchain should have some exajim transactions, and
unspent transaction outputs but the peers should refuse to form any more
of them. The documentation will say that arbitrarily small and large new
transaction outputs used to be allowed, but are currently not allowed, to
reduce the user interface attack surface that needs to be security checked
and to limit blockchain bloat, and since there is unlikely to be demand for
this, this will probably not be fixed for a very long time.
Or perhaps it would be less work to support humungous transactions from
the beginning, subject to some mighty large arbitrary limit to prevent
denial of service attack, and eventually implementing native integer
handling of normal sized transactions as an optimization, for transactions where all quantities fit within machine sized words, and rescaled intermediate outputs will be less than $64 - \lceil log_2($number of inputs and outputs$) \rceil$ bits.
Which leads me to digress how we are going to handle protocol updates:
## handling protocol updates
1. Distribute software capable of handling the update.
1. A proposed protocol update transaction is placed on the blockchain.
1. Peers indicate capability to handle the protocol update. Or ignore it,
or indicate that they cannot. If a significant number of peers
indicate capability, peers that lack capability push their owners for
an update.
1. A proposal to start emitting data that can only handled by more
recent peers is placed on the blockchain.
1. If a significant number of peers vote yes, older peers push more
vigorously for an update.
1. If a substantial supermajority votes yes by a date specified in the
proposal, then they start emitting data in the new format on a date
shortly afterwards. If no supermajority by the due date, the
proposal is dead.
# [Zlib compression libraries.](./libraries/zlib.html)
Built it, easy to use, easy to build, easy to link to. Useful for large amounts of text, provides, but does not use, CRC32
[Cap\'n Proto](./libraries/capnproto.html)
[Crypto libraries](./libraries/crypto_library.html)
[Memory Safety](./libraries/memory_safety.html).
[C++ Automatic Memory Management](./libraries/cpp_automatic_memory_management.html)
[C++ Multithreading](./libraries/cpp_multithreading.html)
[Catch testing library](https://github.com/catchorg/Catch2)
[Boost](https://github.com/boostorg/boost)
------------------------------------------------------------------------
## Boost
My experience with Boost is that it is no damned good: They have an over
elaborate pile of stuff on top of the underlying abstractions, which pile has high runtime cost, and specializes the underlying stuff in ways that only
work with boost example programs and are not easily generalized to do what
one actually wishes done.
Their abstractions leak.
[Boost high precision arithmetic `gmp_int`]:https://gmplib.org/
[Boost high precision arithmetic `gmp_int`] A messy pile built on top of
GMP. Its primary benefit is that it makes `gmp` look like `mpir` Easier to use [MPIR] directly.
The major benefit of boost `gmp` is that it runs on some machines and
operating systems that `mpir` does not, and is for the most part source code
compatible with `mpir`.
A major difference is that boost `gmp` uses long integers, which are on sixty
four bit windows `int32_t`, where `mpir` uses `mpir_ui` and `mpir_si`, which are
on sixty four bit windows `uint64_t` and `int64_t`. This is apt to induce no
end of major porting issues between operating systems.
Boost `gmp` code running on windows is apt to produce radically different
results to the same boost `gmp` code running on linux. Long `int` is just not
portable, and should never be used. This kind of issue is absolutely typical
of boost.
In addition to the portability issue, it is also a typical example of boost
abstractions denying you access to the full capability of the thing being
abstracted away. It is silly to have a thirty two bit interface between sixty
four bit hardware and unlimited arithmetic precision software.
------------------------------------------------------------------------
## Database
The blockchain is a massively distributed database built on top of a pile of
single machine, single disk, databases communicating over the network. If you
want a single machine, single disk, database, go with SQLite, which in WAL
mode implements synch interaction on top of hidden asynch.
[SQLite](https://www.Sqlite.org/src/doc/trunk/README.md) have their own way of doing things, that does not play nice with Github.
The efficient and simple way to handle interaction with the network is via
callbacks rather than multithreading, but you usually need to handle
databases, and under the hood, all databases are multithreaded and blocking.
If they implement callbacks, it is usually on top of a multithreaded layer,
and the abstraction is apt to leak, apt to result in unexpected blocking on a
supposedly asynchronous callback.
SQLite recommends at most one thread that writes to the database, and
preferably only one thread that interacts with the database.
## The Invisible Internet Project (I2P)
[Comes](https://geti2p.net/en/) with an I2P webserver, and the full api for streaming stuff. These
appear as local ports on your system. They are not tcp ports, but higher
level protocols, *and* UDP. (Sort of UDP - obviously you have to create a
durable tunnel, and one end is the server, the other the client.)
Inconveniently, written in java.
## Internet Protocol
[QUIC] UDP with flow control and reliability. Intimately married to http/2,
https/2, and google chrome. Cannot call as library, have to analyze code,
extract their ideas, and rewrite. And, looking at their code, I think they
have written their way into a blind alley.
But QUIC is http/2, and there is a gigantic ecosystem supporting http/2.
We really have no alternative but to somehow interface to that ecosystem.
[QUIC]: https://github.com/private-octopus/picoquic
[QUIC] is UDP with flow control, reliability, and SSL/TLS encryption, but no
DDoS resistance, and total insecurity against CA attack.)
## Boost Asynch
Boost implements event oriented multithreading in IO service, but dont like
it because it fails to interface with Microsofts implementation of asynch
internet protocol, WSAAsync, and WSAEvent. Also because brittle,
incomprehensible, and their example programs do not easily generalize to
anything other than that particular example.
To the extent that you need to interact with a database, you need to process
connections from clients in many concurrent threads. Connection handlers are
run in thread, that called `io_service::run()`.
You can create a pool of threads processing connection handlers (and waiting
for finalizing database connection), by running `io_service::run()` from
multiple threads. See Boost.Asio docs.
## Asio
I tried boost asio, and concluded it was broken, trying to do stuff that cannot be done,
and hide stuff that cannot be hidden in abstractions that leak horribly.
But Asio by itself (comes with MSYS2) might work.
## Asynch Database access
MySQL 5.7 supports [X Plugin / X Protocol, which allows asynchronous query execution and NoSQL But X devapi was created to support node.js and stuff. The basic idea is that you send text messages to mysql on a certain port, and asynchronously get text messages back, in google protobuffs, in php, JavaScript, or sql. No one has bothered to create a C++ wrapper for this, it being primarily designed for php or node.js](https://dev.mysql.com/doc/refman/5.7/en/document-store-setting-up.html)
SQLite nominally has synchronous access, and the use of one read/write
thread, many read threads is recommended. But under the hood, if you enable
WAL mode, access is asynchronous. The nominal synchrony sometimes leaks into
the underlying asynchrony.
By default, each `INSERT` is its own transaction, and transactions are
excruciatingly slow. Wal normal mode fixes this. All writes are writes to the
writeahead file, which gets cleaned up later.
The authors of SQLite recommend against multithreading writes, but we
do not want the network waiting on the disk, nor the disk waiting on the
network, therefore, one thread with asynch for the network, one purely
synchronous thread for the SQLite database, and a few number crunching
threads for encryption, decryption, and hashing. This implies shared
nothing message passing between threads.
------------------------------------------------------------------------
[Facebook Folly library]provides many tools, with such documentation as
exists amounting to read the f\*\*\*\*\*g header files”. They are reputed
to have the highest efficiency queuing for interthread communication, and it
is plausible that they do, because facebook views efficiency as critical.
Their [queuing header file]
(https://github.com/facebook/folly/blob/master/folly/MPMCQueue.h) gives us
`MPMCQueue`.
[Facebook Folly library]:https://github.com/facebook/folly/blob/master/folly/
On the other hand, boost gives us a lockless interthread queue, which should
be very efficient. Assuming each thread is an event handler, rather than
pseudo synchronous, we queue up events in the boost queue, and handle all
unhandled exceptions from the event handler before getting the next item from
the queue. We keep enough threads going that we do not mind threads blocking
sometimes. The queue owns objects not currently being handled by a
particular thread. Objects are allocated in a particular thread, and freed in
a particular thread, which process very likely blocks briefly. Graphic
events are passed to the master thread by the wxWindows event code, but we
use our own mutltithreaded event code to handle everything else. Posting an
event to the gui code will block briefly.
I was looking at boosts queues and lockless mechanisms from the point of
view of implementing my own thread pool, but this is kind of stupid, since
boost already has a thread pool mechanism written to handle the asynch IO
problem. Thread pools are likely overkill. Node.js does not need them,
because its single thread does memory to memory operations.
Boost provides us with an [`io_service` and `boost::thread` group], used to
give effect to asynchronous IO with a thread pool. `io_service` was specifically written to perform io, but can be used for any
thread pool activity whatsoever. You can post tasks to the io_service,
which will get executed by one of the threads in the pool. Each such task has
to be a functor.
[`io_service` and `boost::thread` group]:http://thisthread.blogspot.com/2011/04/multithreading-with-asio.html
Since supposedly nonblocking operations always leak and block, all we can do
is try to have blocking minimal. For example nonblocking database operations
always block. Thus our threadpool needs to be many times larger than our set
of hardware threads, because we will always wind up doing blocking operations.
The C++11 multithreading model assumes you want to do some task in parallel,
for example you are multiplying two enormous matrices, so you spawn a bunch
of threads, then you wait for them all to complete using `join`, or all to
deliver their payload using futures and promises. This does not seem all that
useful, since the major practical issue is that you want your system to
continue to be responsive while it is waiting for some external hardware to
reply. When you are dealing with external events, rather than grinding a
matrix in parallel, event oriented architecture, rather than futures,
promises, and joins is what you need.
Futures, promises, and joins are useful in the rather artificial case that
responding to an remote procedure call requires you to make two or more
remote procedure calls, and wait for them to complete, so that you then have
the data to respond to a remote procedure call.
Futures, promises, and joins are useful on a server that launches one thread
per client, which is often a sensible way to do things, but does not fit that
well to the request response pattern, where you dont have a great deal of
client state hanging around, and you may well have ten thousand clients If
you can be pretty sure you are only going to have a reasonably small number
of clients at any one time, or and significant interaction between clients,
one thread per client may well make a lot of sense.
I was planning to use boost asynch, but upon reading the boost user threads,
sounds fragile, a great pile of complicated unintelligible code that does
only one thing, and if you attempt to do something slightly different,
everything falls apart, and you have to understand a lot of arcane details,
and rewrite them.
[Nanomsg](http://nanomsg.org/)is a socket library, that provides a layer on
top of everything that makes everything look like sockets, and provides
sockets specialized to various communication patterns, avoiding the roll your
own problem. In the zeroMQ thread, people complained that [a simple hello
world TCP-IP program tended to be disturbingly large and complex]
Looks to me that [Nanomsg] wraps a lot of that complexity.
[a simple hello world TCP-IP program tended to be disturbingly large and complex]:http://250bpm.com/blog
# Sockets
A simple hello world TCP-IP program tends to be disturbingly large and
complex, and windows TCP-IP is significantly different from posix TCP-IP.
Waiting on network events is deadly, because they can take arbitrarily large
time, but multithreading always bites. People who succeed tend to go with
single thread asynch, similar to, [or part of, the window event handling
loop].
[or part of, the window event handling loop]:https://www.codeproject.com/Articles/13071/Programming-Windows-TCP-Sockets-in-C-for-the-Begin
Asynch code should take the form of calling a routine that returns
immediately, but passing it a lambda callback, which gets executed in the
most recently used thread.
Interthread communication bites you dont want several threads accessing
one object, as synch will slow you down, so if you multithread, better to
have a specialist thread for any one object, with lockless queues passing
data between threads. One thread for all writes to SQLite, one thread for
waiting on select.
Boost Asynch supposedly makes sockets all look alike, but I am frightened of
their work guard stuff looks to me fragile and incomprehensible. Looks to
me that no one understands boost asynch work guard, not even the man who
wrote it. And they should not be using boost bind, which has been obsolete
since lambdas have been available, indicating bitrot.
Because work guard is incomprehensible and subject to change, will just keep
the boost io object busy with a polling timer.
And I am having trouble finding boost asynch documented as a sockets library.
Maybe I am just looking in the wrong place.
[A nice clean tutorial depicting strictly synchronous tcp.](https://www.binarytides.com/winsock-socket-programming-tutorial/)
[Libpcap and Win10PCap](https://en.wikipedia.org/wiki/Pcap#Wrapper_libraries_for_libpcap) provide very low level, OS independent, access to packets, OS independent because they are below the OS, rather than above it. [Example code for visual studio.](https://www.csie.nuk.edu.tw/~wuch/course/csc521/lab/ex1-winpcap/)
[Simple sequential procedural socket programming for windows sockets.](https://www.binarytides.com/winsock-socket-programming-tutorial/)
If I program from the base upwards, the bottom most level would be a single
thread sitting on a select statement. Whenever the select fired, would
execute a corresponding functor transfering data between userspace and system
space.
One thread, and only one thread, responsible for timer events and
transferring network data between userspace and systemspace.
If further work required in userspace that could take significant time (disk
operations, database operations, cryptographic operations) that functor under
that thread would stuff another functor into a waitless stack, and a bunch
of threads would be waiting for that waitless stack to be signaled, and one
of those other threads would execute that functor.
The reason we have a single userpace thread handling the select and transfers
between userpace and systemspace is that that is a very fast and very common
operation, and we dont want to have unnecessary thread switches, wherein
one thread does something, then immediately afterwards another thread does
almost the same thing. All quickie tasks should be handled sequentially by
one thread that works a state machine of functors.
The way to do asynch is to wrap sockets in classes that reflect the intended
use and function of the socket. Call each instance of such a class a
connection. Each connection has its own state machine state and its own
**message dispatcher, event handler, event pump, message pump**.
A single thread calls select and poll, and drives all connection instances in
all transfers of data between userspace and systemspace. Connections also
have access to a thread pool for doing operations (such as file, database and
cryptography, that may involve waits.
The hello world program for this system is to create a derived server class
that does a trivial transformation on input, and has a path in server name
space, and a client class that sends a trivial input, and displays the result.
Microsoft WSAAsync\[Socketprocedure\] is a family of socket procedures
designed to operate with, and be driven by, the Window ui system, wherein
sockets are linked to windows, and driven by the windows message loop. Could
benefit considerably by being wrapped in connection classes.
I am guessing that wxWidgets has a similar system for driving sockets,
wherein a wxSocket is plugged in to the wxWidget message loop. On windows,
wxWidget wraps WSASelect, which is the behavior we need.
Microsoft has written the asynch sockets you need, and wxWidgets has wrapped
them in an OS independent fashion.
WSAAsyncSelect
WSAEventSelect
select
Using wxSockets commits us to having a single thread managing everything. To
get around the power limit inherent in that, have multiple peers under
multiple names accessing the same database, and have a temporary and
permanent redirect facility so that if you access `peername,` your
connection, and possibly your link, get rewritten to `p2.peername` by peers
trying to balance load.
Microsoft tells us:
> receiving, applications use the WSARecv or WSARecvFrom functions to supply
buffers into which data is to be received. If one or more buffers are posted
prior to the time when data has been received by the network, that data could
be placed in the users buffers immediately as it arrives. Thus, it can
avoid the copy operation that would otherwise occur at the time the recv or
recvfrom function is invoked.
Moral is, we should use the sockets that wrap WSA.
# Tcl
Tcl is a really great language, and I wish it would become the language of my new web, as JavaScript is the language of the existing web.
When I search for Tcl, I am apt to find a long out of date repository
preserved for historical reasons, but there is an active repository
obscured by the existence of the out of date repository.
Javascript is a great language, and has a vast ecosystem of tools, but
it is controlled from top to bottom by our enemies, and using it is
inherently insecure.
Tcl consists of a string (which is implemented under the hood as a copy on
write rope, with some substrings of the rope actually being run time typed
C++ types that can be serialized and deserialized to strings) and a name
table, one name table per interpreter, and at least one interpreter per
thread. The entries in the name table can be strings, C++ functions, or run
time typed C++ types, which may or may not be serializable or deserializable,
but conceptually, it is all one big string, and the name table is used to
find C and C++ functions which interpret the string following the command.
Execution consists of executing commands found in the string, which transform
it into a new string, which in turn gets transformed into a new string,
until it gets transformed into the final result. All code is metacode. If
elements of the string need to be deserialized to and from a C++ run time
type, (because the command does not expect that run time type) but cannot be,
because there is no deserialization for that run time type, you get a run
time error, but most of the time you get, under the hood, C++ code executing
C++ types it is only conceptually a string being continually transformed
into another string. The default integer is infinite precision, because
integers are conceptually arbitrary length strings of numbers.
To sandbox third party code, including third party gui code, just restrict
the nametable to have no dangerous commands, and to be unable to load c++
modules that could provide dangerous commands.
It is faster to bring up a UI in Tcl than in C. We get, for free, OS
independence.
Tcl used to be the best level language for attaching C programs to, and for
testing C programs, or it would be if SWIG actually worked. The various C
components of Tcl provide an OS independent layer on top of both Linux and
Windows, and it has the best multithread and asynch system.
It is also a metaprogramming language. Every Tcl program is a metaprogram you always write code that writes code.
The Gui is necessarily implemented as asynch, something like the JavaScript
dom in html, but with explicit calls to the event/idle loop. Multithreading
is implemented as multiple interpreters, at least one interpreter per thread,
sending messages to each other.
# Time
After spending far too much time on this issue, which is has sucked in far
too many engineers and far too much thought, and generated far too many
libraries, I found the solution was c++11 Chrono: For short durations, we
use the steady time in milliseconds, where each machine has its own
epoch, and no two machines have exactly the same milliseconds. For
longer durations, we use the system time in seconds, where all machines
are expected to be within a couple of seconds of each other. For the human
readable system time in seconds to be displayed on a particular machine,
we use the ISO format 20120114_15:39:34+10:00 (timezone with 10
hour offset equivalent to Greenwich time 20120114_05:39:34+00:00)
[For long durations, we use signed system time in seconds, for short durations unsigned steady time in milliseconds.](./libraries/rotime.cpp)
Windows and Unix both use time in seconds, but accessed and manipulated in
incompatible ways.
Boost has numerous different and not altogether compatible time libraries,
all of them overly clever and all of them overly complicated.
wxWidgets has OS independent time based on milliseconds past the epoch
which however fails to compress under Cap\'n Proto.
I was favourably impressed by the approach to time taken in tcp packets,
that the time had to be approximately linear, and in milliseconds or larger,
but they were entirely relaxed about the two ends of a tcp connection
using different clocks with different, and variable, speeds.
It turns out you can go a mighty long way without a global time, and to the
extent that you do need a global time, should be equivalent to that used in
email, which magically hides the leap seconds issue.
# UTF8 strings
Are supported by the wxWidgets wxString, which provide support to and
from wide character variants and locale variants. (We don't want locale
variants, they are obsolete. The whole world is switching to UTF, but
our software and operating environments lag)
Locales still matter in case insensitive compare, collation order,
canonicalization of utf-8 strings, and a rats nest of issues,
which linux and sqlite avoids by doing binary compares, and if it cannot
avoid capitalization issues, only considering A-Z to be capitals.
If you tell sqlite to incorporate the ICU library, sqlite will attempt to
do case lowering and collation for all of utf-8 - which strikes me
as something that cannot really be done, and I am not at all sure how
it will interact with wxWidgets attempting to do the same thing.
What happens is that operations become locale dependent. It will
have a different view of what characters are equivalent in different
places. And changing the locale on a database will break an index or
table that has a non binary collation order. Which probably will not
matter much because we are likely to have few entries that only differ
in capitalization. The sql results will be wrong, but the database will
not crash, and when we have a lot of entires that affected by non latin
capitalization rules, it is probably going to be viewed only in that
locale. But any collation order that is global to all parties on the blockchain
has to be latin or binary.
wxWidgets does *not* include the full unicode library, so cannot do this
stuff. But sqlite provides some C string functions that are guaranteed to
do whatever it does, and if you include the ICU library it attempts
to handle capitalization on the entire unicode set\
`int sqlite3_stricmp(const char *, const char *);`\
`sqlite3_strlike(P,X,E)`\
The ICU library also provides a real regex function on unicode
(`sqlite3_strlike` being the C equivalent of the SQL `LIKE`,
providing a rather truncated fragment of regex capability)
Pretty sure the wxWidgets regex does something unwanted on unicode
`wString::ToUTF8()` and `wString::FromUTF8()` do what you would expect.
`wxString::c_str()` does something too clever by half.
On visual studio, need to set your source files to have bom, so that Visual
Studio knows that they are UTF8, need to set the compiler environment in
Visual Studio to UTF8 with `/Zc:__cplusplus /utf-8 %(AdditionalOptions)`
And you need to set the run time environment of the program to UTF8
with a manifest. Not at all sure how codelite will handle manifests,
but there is a codelite build that does handle utf-8, presumably with
a manifest. Does not do it in the standard build on windows.
You will need to place all UTF8 string literals and string constants in a
resource file, which you will use for translated versions.
If you fail to set the compilation and run time environment to UTF8 then
for extra confusion, your debugger and compiler will *look* as if they are
handling UTF8 characters correctly as single byte characters, while at
least wxString alerts you that something bad is happening by run time
translating to the null string.
Automatic string conversion in wxWidgets is *not* UTF8, and if you have
any unusual symbols in your string, you get a run time error and the empty
string. So wxString automagic conversions will rape you in the ass at
runtime, and for double the confusion, your correctly translated UTF8
strings will look like errors. Hence the need to make sure that the whole
environment from source code to run time execution is consistently UTF8,
which has to be separately ensured in three separate place.
When wxWidgets is compiled using `#define wxUSE_UNICODE_UTF8 1`,
it provides UTF8 iterators and caches a character index, so that accessing
a character by index near a recently used character is fast. The usual
iterators `wx.begin()`, `wx.end()`, const and reverse iterators are available.
I assume something bad happens if you advance a reverse iterator after
writing to it.
wxWidgets compiled with `#define wxUSE_UNICODE_UTF8 1` is the
way of the future, but not the way of the present. Still a work in progress
Does not build under Windows. Windows now provide UTF8 entries to all
its system functions, which should make it easy.
# [UTF8-CPP](http://utfcpp.sourceforge.net/ "UTF-8 with C++ in a Portable Way")
A powerful library for handling UTF8. This somewhat duplicates the
facilities provided by wxWidgets with `wxUSE_UNICODE_UTF8==1`
For most purposes, wxString should suffice, when it actually works with
UTF8. Which it does not yet on windows. We shall see. wxWidgets
recommends not using wxString except to communicate with wxWidgets,
and not using it as general UTF8 system. Which is certainly the current
state of play with wxWidgets.
For regex to work correctly, probably need to do it on wxString's native
UTF16 (windows) or UTF32 (unix), but it supposedly works on `UTF8`,
assuming you can successfully compile it, which you cannot.
# Cap\'n Proto
[Designed for a download from github and run cmake install.](https://capnproto.org/install.html) As all software should be.
But for mere serialization to of data to a form invariant between machine
architectures and different compilers and different compilers on the same
machine, overkill for our purposes. Too much capability.
# Awesome C++
[Awesome C++] A curated list of awesome C/C++ frameworks, libraries, resources, and shiny things
[Awesome C++]:https://cpp.libhunt.com
"A curated list of awesome C/C++ frameworks, libraries, resources, and shiny things"
{target="_blank"}
I encountered this when looking at the Wt C++ Web framework, which seems to be mighty cool except I don't think I have any use for a web framework. But [Awesome C++] has a very pile of things that I might use.
Wt has the interesting design principle that every open web page maps to a
windows class, every widget on the web page, maps to a windows class,
every row in the sql table maps to a windows class. Cool design.
# Opaque password protocol
[Opaque] is PAKE done right.
[Opaque]:https://blog.cryptographyengineering.com/2018/10/19/lets-talk-about-pake/
"Lets talk about PAKE" {target="_blank"}
Server stores a per user salt, the users public key, and the user's secret key
encrypted with a secret that only the user ever learns.
Secret is generated by the user from the salt and his password by
interaction with the server without the the user learning the salt, nor the hash of the salt, nor the server the password or the hash of the password.
User then strengthens the secret generated from salt and password
applying a large work factor to it, and decrypts the private key with it.
User and server then proceed with standard public key cryptography.
If the server is evil, or the bad guys seize the server, everything is still
encrypted and they have to run, not a hundred million trial passwords
against all users, but a hundred million passwords against *each* user. And
user can make the process of trying a password far more costly and slow than
just generating a hash. Opaque zero knowledge is designed to be as
unfriendly as possible to big organizations harvesting data on an industrial
scale. The essential design principle of this password protocol is that
breaking a hundred million passwords by password guessing should be a
hundred million times as costly as breaking one password by password
guessing. The protocol is primarily designed to obstruct the NSA's mass
harvesting.
It has the enormous advantage that if you have one strong password which
you use for many accounts, one evil server cannot easily attack your
accounts on other servers. To do that, it has to try every password - which
runs into your password strengthening.