diff --git a/.gitattributes b/.gitattributes index 5635875..d5d380e 100644 --- a/.gitattributes +++ b/.gitattributes @@ -35,6 +35,8 @@ Makefile text eol=lf encoding=utf-8 *.vcxproj text eol=crlf encoding=utf-8 whitespace=trailing-space,space-before-tab,tabwidth=4 *.vcxproj.filters text eol=crlf encoding=utf-8 whitespace=trailing-space,space-before-tab,tabwidth=4 *.vcxproj.user text eol=crlf encoding=utf-8 whitespace=trailing-space,space-before-tab,tabwidth=4 +*.props text eol=crlf encoding=utf-8 whitespace=trailing-space,space-before-tab,tabwidth=4 + # Force binary files to be binary diff --git a/.gitconfig b/.gitconfig index 0b6dbaa..806017c 100644 --- a/.gitconfig +++ b/.gitconfig @@ -11,4 +11,13 @@ alias = ! git config --get-regexp ^alias\\. | sed -e s/^alias\\.// -e s/\\ /\\ =\\ / | grep -v ^'alias ' | sort [commit] gpgSign = true - +[push] + signed=true +[merge] + verify-signatures = true +[pull] + verify-signatures = true +[submodule] + active = * +[diff] + submodule = log diff --git a/.gitignore b/.gitignore index 52149e2..4f016fd 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,16 @@ sqlite3/sqlite-doc/ -*.bat + +## ignore Microsof Word stuff, as no one should use it in the project +## pandoc can translate it to markdown, the universal format. +*.doc +*.DOC +*.docx +*.DOCX +*.dot +*.DOT +*.rtf +*.RTF + ## Ignore Visual Studio temporary files, build results, and ## files generated by popular Visual Studio add-ons. *.bak diff --git a/.gitmodules b/.gitmodules index b632f5a..d9869ef 100644 --- a/.gitmodules +++ b/.gitmodules @@ -9,4 +9,4 @@ [submodule "wxWidgets"] path = wxWidgets url = git@rho.la:~/wxWidgets.git - branch = rho-fork \ No newline at end of file + branch = rho-fork diff --git a/docs/Efficient_Error-Propagating_Block_Chaining.pdf b/docs/Efficient_Error-Propagating_Block_Chaining.pdf deleted file mode 100644 index 3c1cd86..0000000 Binary files a/docs/Efficient_Error-Propagating_Block_Chaining.pdf and /dev/null differ diff --git a/docs/icon.pandoc b/docs/icon.pandoc new file mode 100644 index 0000000..141fc0c --- /dev/null +++ b/docs/icon.pandoc @@ -0,0 +1 @@ + diff --git a/docs/index.md b/docs/index.md index 62a1463..bbb3480 100644 --- a/docs/index.md +++ b/docs/index.md @@ -145,4 +145,4 @@ worth, probably several screens. - [How to do VPNs right](how_to_do_VPNs.html) - [How to prevent malware](safe_operating_system.html) - [The cypherpunk program](cypherpunk_program.html) -- [Replacing TCP and UDP](replacing_TCP.html) +- [Replacing TCP and UDP](names/TCP.html) diff --git a/docs/libraries.md b/docs/libraries.md index 6636e93..6a778e7 100644 --- a/docs/libraries.md +++ b/docs/libraries.md @@ -183,7 +183,7 @@ primary fork name, should be temporary and local., not pushed to the project repository, But when you are modifying the submodules in a project as a single project, making related changes in the module and submodule, the shared names that are common to all developers belong in -the primary project module,and when you have done with a submodule, +the primary project module,and when you have done with a submodule, ```bash git switch --detach @@ -249,7 +249,7 @@ you push it to `origin` under its own name, the you detach it from its name, so the superproject will know that the submodule has been changed. All of which, of course, presupposes you have already set unit tests, -upstream, origin, and your tracking branch appropriately. +upstream, origin, and your tracking branch appropriately. Even if your local modifications are nameless in your local submodule repository, on your remote submodule repository they need to have a name @@ -259,7 +259,7 @@ need to point to the root of a tree of all the nameless commits that the names and commits in your superproject that contains this submodules point to. You want `.gitmodules` in your local image of the repository to -reflect the location and fork of your new remote repository, with +reflect the location and fork of your new remote repository, with your remote as its `origin` and their remote as its `upstream`. You need an enormous pile of source code, the work of many people over @@ -386,6 +386,10 @@ OK, provided our primary repo is not co-opted by the enemy. # Installers +Looking at cmake, choco, deb, git, and rust crates, I see a development +environment being born, as people irregularly and ad hoc integrate with +each other's features. + Wine to run Windows 10 software under Linux is a bad idea, and Windows Subsystem for Linux to run Linux software under Windows 10 is a much worse idea – it is the usual “embrace and extend” evil plot by @@ -400,19 +404,46 @@ executed than in the past. ## The standard cmake installer from source -```bash -cmake .. && cmake --build && make && make install +After long and arduous struggle with CMake, I concluded: + +That it is the hardest path from MSVC to linux. + +That no one uses it as their first choice to go from linux to windows, so it +is likely to be a hard journey in the other direction. + +I also found that the CMake scripting language was one of those +accidental languages. + +CMakeLists.text was intended as a simple list of every file. And then one +feature after another was added, ad hoc, with no coherent plan and vision, +and eventually so many features as to become Turing Complete, but like +most accidental Turing complete languages, inconsistent, unpredictable, and +the code entirely opaque, and the whole way the developers did not +want their language to be used as a language. + +CMake has gone down the wrong path, should have started with a known +language whose first class types are strings, list of strings, maps of +strings, maps of named maps of strings, and maps of maps, and CMake should +create a description of the build environment that it discovers, and a +description of the directory in which it was run in the native types of that +language, and attempt to create a hellow world program in that language +that invokes the compiler and the linker. Which program the developer +modifies as needed. + +That MSVC's embrace of cmake is one of those embrace and extend +weirdness's, and will take you on a path to ever closer integration with +non free software, rather than off that path. Either that or the people +integrating it were just responding to an adhoc list of integration features. + +That attempting a CMake build of the project using MSVC was a bad idea. +MingGW first, then MingGW integrated into vscode, in an all choco windows +environment without MSVC present. + +```bat +choco install mingw pandoc git vscode gpg4win -y ``` -To support this on linux, Cmakelists.txt needs to contain - -```default -project (Test) -add_executable(test main.cpp) -install(TARGETS test) -``` - -On linux, `install(TARGETS test)` is equivalent to `install(TARGETS test DESTINATION bin)` +That Cmake does not really work all that well with the MSVC environment. If we eventually take the CMake path, it will be after wc and build on MingGW, not before. ## The standard Linux installer @@ -595,7 +626,7 @@ such cases. ## Choco -Choco, Chocolatey, is the Windows Package manager system. Does not use `*.msi` as its packaging system. A chocolatey package consists of an `*.nuget`, `chocolateyInstall.ps1`, `chocolateyUninstall.ps1`, and `chocolateyBeforeModify.ps1` (the latter script is run before upgrade or uninstall, and is to reverse stuff done by is accompanying +Choco, Chocolatey, is the Windows Package manager system. Does not use `*.msi` as its packaging system. A chocolatey package consists of an `*.nuget`, `chocolateyInstall.ps1`, `chocolateyUninstall.ps1`, and `chocolateyBeforeModify.ps1` (the latter script is run before upgrade or uninstall, and is to reverse stuff done by is accompanying `chocolateyInstall.ps1 `) Interaction with stuff installed by `*.msi` is apt to be bad. @@ -833,7 +864,7 @@ which could receive a packet at any time. I need to look at the GameNetworkingSockets code and see how it listens on lots and lots of sockets. If it uses [overlapped IO], then it is golden. Get it up first, and it put inside a service later. -[Overlapped IO]:client_server.html#the-select-problem +[Overlapped IO]:server.html#the-select-problem {target="_blank"} The nearest equivalent Rust application gave up on congestion control, having programmed themselves into a blind alley. diff --git a/docs/pandoc_templates/icon.pandoc b/docs/libraries/icon.pandoc similarity index 100% rename from docs/pandoc_templates/icon.pandoc rename to docs/libraries/icon.pandoc diff --git a/docs/libraries/review_of_crypto_libraries.md b/docs/libraries/review_of_crypto_libraries.md index b865adc..f620cf8 100644 --- a/docs/libraries/review_of_crypto_libraries.md +++ b/docs/libraries/review_of_crypto_libraries.md @@ -109,7 +109,7 @@ mangle together then ends gracefully, and the next stream and the next concurrent process starts when there is something to do. While a stream lives, both ends maintain state, albeit in a request reply, the state lives only briefly. -1. A message. +1. A message. Representing all this as a single kind of port, and packets going between ports of a single kind, inherently leads to the mess that we diff --git a/docs/mkdocs.sh b/docs/mkdocs.sh index e2ed3ef..170015d 100644 --- a/docs/mkdocs.sh +++ b/docs/mkdocs.sh @@ -11,8 +11,9 @@ elif [[ "$OSTYPE" == "cygwin" ]]; then elif [[ "$OSTYPE" == "msys" ]]; then osoptions="--fail-if-warnings --eol=lf " fi -templates="./pandoc_templates" -options=$osoptions"--toc -N --toc-depth=5 --wrap=preserve --metadata=lang:en --include-in-header=$templates/icon.pandoc --include-before-body=$templates/before.pandoc --css=$templates/style.css -o" +templates=$(pwd)"/pandoc_templates" +options=$osoptions"--toc -N --toc-depth=5 --wrap=preserve --metadata=lang:en --include-in-header=icon.pandoc --include-before-body=$templates/before.pandoc --css=$templates/style.css -o" +pwd for f in *.md do len=${#f} @@ -41,8 +42,8 @@ if [[ $line =~ notmine$ ]]; fi done cd libraries -templates="../pandoc_templates" -options=$osoptions"--toc -N --toc-depth=5 --wrap=preserve --metadata=lang:en --include-in-header=$templates/icondotdot.pandoc --include-before-body=$templates/beforedotdot.pandoc --css=$templates/style.css --include-after-body=$templates/after.pandoc -o" +options=$osoptions"--toc -N --toc-depth=5 --wrap=preserve --metadata=lang:en --include-in-header=./icon.pandoc --include-before-body=$templates/before.pandoc --css=$templates/style.css --include-after-body=$templates/after.pandoc -o" +pwd for f in *.md do len=${#f} @@ -58,16 +59,16 @@ do katex=" --katex=./" fi done <$f - pandoc $katex $options $base.html $base.md - echo "$base.html from $f" + echo "generating $base.html from $f" + pandoc $katex $options $base.html $base.md #else # echo " $base.html up to date" fi done cd .. cd names -templates="../pandoc_templates" -options=$osoptions"--toc -N --toc-depth=5 --wrap=preserve --metadata=lang:en --include-in-header=$templates/icondotdot.pandoc --include-before-body=$templates/beforedotdot.pandoc --css=$templates/style.css --include-after-body=$templates/after.pandoc -o" +options=$osoptions"--toc -N --toc-depth=5 --wrap=preserve --metadata=lang:en --include-in-header=./icon.pandoc --include-before-body=$templates/before.pandoc --css=$templates/style.css --include-after-body=$templates/after.pandoc -o" +pwd for f in *.md do len=${#f} @@ -83,24 +84,25 @@ do katex=" --katex=./" fi done <$f + echo "generating $base.html from $f" pandoc $katex $options $base.html $base.md - echo "$base.html from $f" #else # echo " $base.html up to date" fi done cd .. cd rootDocs -templates="../pandoc_templates" +pwd +katex="" for f in *.md do len=${#f} base=${f:0:($len-3)} if [ $f -nt ../../$base.html ]; then - pandoc $osoptions --wrap=preserve --from markdown --to html --metadata=lang:en --css=$templates/style.css --self-contained -o ../../$base.html $base.md + echo "generating $base.html from $f" + pandoc $katex $options ../../$base.html $base.md #--include-in-header=style.css - echo "../..$base.html from $f" #else # echo " $base.html up to date" fi diff --git a/docs/names/replacing_TCP.md b/docs/names/TCP.md similarity index 99% rename from docs/names/replacing_TCP.md rename to docs/names/TCP.md index 4cee798..8fc471f 100644 --- a/docs/names/replacing_TCP.md +++ b/docs/names/TCP.md @@ -91,7 +91,7 @@ ECN tagged packets are dropped Raw sockets provide greater control than UDP sockets, and allow you to do ICMP like things through ICMP. -I also have a discussion on NAT hole punching, [peering through nat](peering_through_nat.html), that +I also have a discussion on NAT hole punching, [peering through nat](nat.html), that summarizes various people's experience. To get an initial estimate of the path MTU, connect a datagram socket to diff --git a/docs/names/icon.pandoc b/docs/names/icon.pandoc new file mode 100644 index 0000000..f43d0b9 --- /dev/null +++ b/docs/names/icon.pandoc @@ -0,0 +1 @@ + diff --git a/docs/names/name_system.md b/docs/names/names.md similarity index 73% rename from docs/names/name_system.md rename to docs/names/names.md index 27a6960..59f1872 100644 --- a/docs/names/name_system.md +++ b/docs/names/names.md @@ -11,10 +11,10 @@ to its name server, which will henceforth direct people to that wallet. If the wallet has a network accessible tcp and/or UDP address it directs people to that address (one port only, protocol negotiation will occur once the connection is established, rather than protocols being defined by the port -number). If not, will direct them to a UDT4 rendevous server, probably itself. +number). If not, will direct them to a UDT4 rendezvous server, probably itself. We probably need to support [uTP for the background download of bulk data]. -This also supports rendevous routing, though perhaps in a different and +This also supports rendezvous routing, though perhaps in a different and incompatible way, excessively married to the bittorrent protocol.We might find it easier to construct our own throttling mechanism in QUIC, accumulating the round trip time and square of the round trip time excluding @@ -205,6 +205,129 @@ hosting your respectable username that you do not use much. We also need a state religion that makes pretty lies low status, but that is another post. +#True Names and TCP + +Vernor Vinge [made the point](http://www.amazon.com/True-Names-Opening-Cyberspace-Frontier/dp/0312862075) that true names are an instrument of +government oppression. If the government can associate your true name +with your actions, it can punish you for those actions. If it can find the true +names associated with a transaction, it is a lot easier to tax that transaction. + +Recently there have been moves to make your cell phone into a wallet. A +big problem with this is that cell phone cryptography is broken. Another +problem is that cell phones are not necessarily associated with true names, +and as soon as the government hears that they might control money, it +starts insisting that cell phones *are* associated with true names. The phone +companies don’t like this, for if money is transferred from true name to +true name, rather than cell phone to cell phone, it will make them a servant +of the banking cartel, and the bankers will suck up all the gravy, but once +people start stealing money through flaws in the encryption, they will be +depressingly grateful that the government can track account holders down +and punish them – except, of course, the government probably will not be +much good at doing so. + +TCP is all about creating connections. It creates connections between +network addresses, but network addresses correspond to the way networks +are organized, not the way people are organized, so on top of networks we +have domain names. + +TCP therefore establishes a connection *to* a domain name rather than a +mere network address – but there is no concept of the connection coming +*from* anywhere humanly meaningful. + +Urns are “uniform resource names”, and uris are “uniform resource identifiers” and urls are “uniform resource locators”, and that is what the +web is built out of. + +There are several big problems with urls: + +1. They are uniform: Everyone is supposed to agree on one domain + name for one entity, but of course they don’t. There is honest and + reasonable disagreement as to which jim is the “real” jim, because + in truth there is no one real jim, and there is fraud, as in lots of + people pretending to be Paypal or the Bank of America, in order to + steal your money. + +2. They are resources: Each refers to only a single interaction, but of + course relationships are built out of many interactions. There is no + concept of a connection continuing throughout many pages, no + concept of logon. In building urls on top of TCP, we lost the + concept of a connection. And because urls are built out of TCP + there is no concept of the content depending on both ends of the + connection – that a page at the Bank might be different for Bob than + it is for Carol – that it does in reality depend on who is connected is + a kluge that breaks the architecture. + + Because security (ssl, https) is constructed below the level of a + connection, because it lacks a concept of connection extending + beyond a single page or a single url, a multitude of insecurities + result. We want https and ssl to secure a connection, but https and + ssl do not know there are such things as logons and connections. + +That domain names and hence urls presuppose agreement, agreement +which can never exist, we get cybersquatting and phishing and +suchlike. + +That connections and logons exist, but are not explicitly addressed by the +protocol leads to such attacks as cross site scripting and session fixation. + +A proposed fix for this problem is yurls, which apply Zooko’s triangle to +the web: One adds to the domain name a hash of a rule for validating the +public key, making it into Zooko’s globally unique identifier. The +nickname (non unique global identifier) is the web page title, and the +petname (locally unique identifier) is the title under which it appears in +your bookmark list, or the link text under which it appears in a web page. + +This, however, breaks normal form. The public key is an attribute of the +domain, while the nickname and petnames are attributes of particular web +pages – a breach of normal form related to the loss of the concept of +connection – a breach of normal form reflecting the fact that that urls +provide no concept of a logon, a connection, or a user. + +OK, so much for “uniform”. Instead of uniform identifiers, we should +have zooko identifiers, and zooko identifiers organized in normal form. +But what about “resource”, for “resource” also breaks normal form. + +Instead of “resources”, we should have “capabilities”. A resource +corresponds to a special case of a capability, a resource is a capability +that that resembles a read only file handle. But what exactly are "capabilities”? + +People with different concepts about what is best for computer security +tend to disagree passionately and at considerable length about what the +word “capability” means, and will undoubtedly tell me I am a complete +moron for using it in the manner that I intend to use it, but barging ahead anyway: + +A “capability” is an object that represents one end of a communication +channel, or information that enables an entity to obtain such a channel, or +the user interface representation of such a channel, or such a potential +channel. The channel enables the possessor of the capability to do stuff to +something, or get something. Capabilities are usually obtained by being +passed along the communication channel. Capabilities are usually +obtained from capabilities, or inherited by a running instance of a program + when the program is created, or read from storage after originally being + obtained by means of another capability. + +This definition leaves out the issue of security – to provide security, +capabilities need to be unforgeable or difficult to guess. Capabilities are +usually defined with the security characteristics central to them, but I am +defining capabilities so that what is central is connections and managing +lots of potential connection. Sometimes security and limiting access is a +very important part of management, and sometimes it is not. + +A file handle could be an example of a capability – it is a communication +channel between a process and the file management system. Suppose we +are focussing on security and access management to files: A file handle +could be used to control and manage permissions if a program that has the +privilege to access certain files could pass an unforgeable file handle to +one of those files to a program that lacks such access, and this is the only +way the less privileged program could get at those files. + +Often the server wants to make sure that the client at one end of a +connection is the user it thinks it is, which fits exactly into the usual +definitions of capabilities. But more often, the server does not care who +the client is, but the client wants to make sure that the server at the other +end of the connection is the server he thinks it is, which, since it is the +client that initiates the connection, does not fit well into many existing +definitions of security by capabilities. + # Mapping between globally unique human readable names and public keys The blockchain provides a Merkle-patricia dac of human readable names. Each diff --git a/docs/names/peering_through_nat.md b/docs/names/nat.md similarity index 98% rename from docs/names/peering_through_nat.md rename to docs/names/nat.md index 883dd11..b4d110a 100644 --- a/docs/names/peering_through_nat.md +++ b/docs/names/nat.md @@ -4,11 +4,11 @@ title: Peering through NAT ... A library to peer through NAT is a library to replace TCP, the domain name system, SSL, and email. This is covered at greater length in -[Replacing TCP](replacing_TCP.html) +[Replacing TCP](TCP.html) # Implementation issues -There is a great [pile of RFCs](./replacing_TCP.html) on issues that arise with using udp and icmp +There is a great [pile of RFCs](TCP.html) on issues that arise with using udp and icmp to communicate. ## timeout diff --git a/docs/names/client_server.md b/docs/names/server.md similarity index 98% rename from docs/names/client_server.md rename to docs/names/server.md index 324d1e2..9356db0 100644 --- a/docs/names/client_server.md +++ b/docs/names/server.md @@ -1,24 +1,24 @@ --- -title: Client Server Data Representation +title: Server Data Representation ... # related -[Replacing TCP, SSL, DNS, CAs, and TLS](replacing_TCP.html){target="_blank"} +[Replacing TCP, SSL, DNS, CAs, and TLS](TCP.html){target="_blank"} -# clients and hosts, masters and slaves +# clients and hosts, masters and servers -A slave does the same things for a master as a host does for a client. +A server does the same things for a master as a host does for a client. -The difference is how identity is seen by third parties. The slaves identity -is granted by the master, and if the master switches slaves, third parties +The difference is how identity is seen by third parties. The servers identity +is granted by the master, and if the master switches servers, third parties scarcely notice. It the same identity. The client's identity is granted by the host, and if the client switches hosts, the client gets a new identity, as for example a new email address. If we use [Pake and Opaque](libraries.html#opaque-password-protocol) for client login, then all other functionality of the server is unchanged, regardless of whether the server is a host or a -slave. It is just that in the client case, changing servers is going to change +server. It is just that in the client case, changing servers is going to change your public key. Experience with bitcoin is that a division of responsibilities, as between Wasabi wallet and Bitcoin core, is the way to go - that the peer to peer networking functions belong in another process, possibly running on @@ -31,16 +31,18 @@ desires, and contradictory functions. Ideally one would be in a basement and generally turned off, the other in the cloud and always on. Plus, I have come to the conclusion that C and C++ just suck for -networking apps. Probably a good idea to go Rust for the slave or host. +networking apps. Probably a good idea to go Rust for the server or host. The wallet is event oriented, but only has a small number of concurrent -tasks. A host or slave is event oriented, but has a potentially very large +tasks. A host or server is event oriented, but has a potentially very large number of concurrent tasks. Rust has no good gui system, there is no wxWidgets framework for Rust. C++ has no good massive concurrency system, there is no Tokio for C++. -Where do we put the gui for controlling the slave? In the master, of +Where do we put the gui for controlling the server? In the master, of course. +Where do we put the networking stuff? in the server. + # the select problem To despatch an `io` event, the standard is `select()`. Which standard sucks @@ -102,7 +104,7 @@ Linux people recommended a small number of threads, reflecting real hardware thr I pray that that wxWidgets takes care of mapping windows asynchronous sockets to their near equivalent functionality on Linux. -But writing a server/host/slave for Linux is fundamentally different to +But writing a server/host/server for Linux is fundamentally different to writing one for windows. Maybe we can isolate the differences by having pure windows sockets, startup and shutdown code, pure Linux sockets, startup and shutdown code, having the sockets code stuff data to and from diff --git a/docs/names/true_names_and_TCP.md b/docs/names/true_names_and_TCP.md deleted file mode 100644 index c6d51c9..0000000 --- a/docs/names/true_names_and_TCP.md +++ /dev/null @@ -1,117 +0,0 @@ ---- -lang: en -title: True Names and TCP ---- -Vernor Vinge [made the point](http://www.amazon.com/True-Names-Opening-Cyberspace-Frontier/dp/0312862075) that true names are an instrument of -government oppression. If the government can associate -your true name with your actions, it can punish you for -those actions. If it can find the true names associated -with a transaction, it is a lot easier to tax that -transaction. - -Recently there have been moves to make your cell phone -into a wallet. A big problem with this is that cell -phone cryptography is broken. Another problem is that -cell phones are not necessarily -associated with true names, and as soon as the government hears -that they might control money, it starts insisting that cell phones -*are* associated with true names. The phone companies don’t like -this, for if money is transferred from true name to true name, rather -than cell phone to cell phone, it will make them a servant of the -banking cartel, and the bankers will suck up all the gravy, but once -people start stealing money through flaws in the encryption, they -will be depressingly grateful that the government can track account -holders down and punish them – except, of course, the government -probably will not be much good at doing so. - -TCP is all about creating connections. It creates connections between network addresses, but network addresses correspond to -the way networks are organized, not the way people are organized, -so on top of networks we have domain names. - -TCP therefore establishes a connection *to* a domain name rather -than a mere network address – but there is no concept of the -connection coming *from* anywhere humanly meaningful. - -Urns are “uniform resource names”, and uris are “uniform resource identifiers” and urls are “uniform resource locators”, and that is what the web is built out of. - -There are several big problems with urls: - -1. They are uniform: Everyone is supposed to agree on one domain name for one entity, but of course they don’t. There is honest and reasonable disagreement as to which jim is the “real” jim, becaŭse in truth there is no one real jim, and there is fraŭd, as in lots of people pretending to be Paypal or the Bank of America, in order to steal your money. - -2. They are resources: Each refers to only a single interaction, - but of course relationships are built out of many - interactions. There is no concept of a connection continuing - throughout many pages, no concept of logon. In building - urls on top of TCP, we lost the concept of a connection. And - because urls are built out of TCP there is no concept of the - content depending on both ends of the connection – that a - page at the Bank might be different for Bob than it is for - Carol – that it does in reality depend on who is connected is - a kluge that breaks the architecture. - - Because security (ssl, https) is constructed below the level of - a connection, because it lacks a concept of connection - extending beyond a single page or a single url, a multitude of - insecurities result. We want https and ssl to secure a - connection, but https and ssl do not know there are such - things as logons and connections. - -That domain names and hence urls presuppose agreement, agreement -which can never exist, we get cybersquatting and phishing and -suchlike. - -That connections and logons exist, but are not explicitly addressed -by the protocol leads to such attacks as cross site scripting and -session fixation. - -A proposed fix for this problem is yurls, which apply Zooko’s -triangle to the web: One adds to the domain name a hash of a rule -for validating the public key, making it into Zooko’s globally unique -identifier. The nickname (non unique global identifier) is the web -page title, and the petname (locally unique identifier) is the title -under which it appears in your bookmark list, or the link text under -which it appears in a web page. - -This, however, breaks normal form. The public key is an attribute of the domain, while the nickname and petnames are attributes of particular web pages – a breach of normal form related to the loss of the concept of connection – a breach of normal form reflecting the fact that that urls provide no concept of a logon, a connection, or a user. - -OK, so much for “uniform”. Instead of uniform identifiers, we -should have zooko identifiers, and zooko identifiers organized in -normal form. But what about “resource”, for “resource” also breaks -normal form. - -Instead of “resources”, we should have “capabilities”. A resource -corresponds to a special case of a capability, a resource is a -capability that that resembles a read only file handle. But what -exactly are “capabilities”? - -People with different concepts about what is best for computer security tend to disagree passionately and at considerable length about what the word “capability” means, and will undoubtedly tell me I am a complete moron for using it in the manner that I intend to use it, but barging ahead anyway: - -A “capability” is an object that represents one end of a -communication channel, or information that enables an entity to -obtain such a channel, or the user interface representation of such a -channel, or such a potential channel. The channel enables the -possessor of the capability to do stuff to something, or get -something. Capabilities are usually obtained by being passed along -the communication channel. Capabilities are usually obtained from -capabilities, or inherited by a running instance of a program when -the program is created, or read from storage after originally being -obtained by means of another capability. - -This definition leaves out the issue of security – to provide security, capabilities need to be unforgeable or difficult to guess. Capabilities are usually defined with the security characteristics central to them, but I am defining capabilities so that what is central is connections and managing lots of potential connection. Sometimes security and limiting access is a very important part of management, and sometimes it is not. - -A file handle could be an example of a capability – it is a -communication channel between a process and the file -management system. Suppose we are focussing on security and -access management to files: A file handle could be used to control -and manage permissions if a program that has the privilege to -access certain files could pass an unforgeable file handle to one of -those files to a program that lacks such access, and this is the only -way the less privileged program could get at those files. - -Often the server wants to make sure that the client at one end of a -connection is the user it thinks it is, which fits exactly into the usual -definitions of capabilities. But more often, the server does not care -who the client is, but the client wants to make sure that the server -at the other end of the connection is the server he thinks it is, -which, since it is the client that initiates the connection, does not fit -well into many existing definitions of security by capabilities. diff --git a/docs/pandoc_templates/beforedotdot.pandoc b/docs/pandoc_templates/beforedotdot.pandoc deleted file mode 100644 index 3509826..0000000 --- a/docs/pandoc_templates/beforedotdot.pandoc +++ /dev/null @@ -1 +0,0 @@ -
diff --git a/docs/pandoc_templates/icondotdot.pandoc b/docs/pandoc_templates/icondotdot.pandoc deleted file mode 100644 index 85695be..0000000 --- a/docs/pandoc_templates/icondotdot.pandoc +++ /dev/null @@ -1 +0,0 @@ - diff --git a/docs/parsers.md b/docs/parsers.md index d80e6d5..9f2098d 100644 --- a/docs/parsers.md +++ b/docs/parsers.md @@ -74,7 +74,7 @@ polish order, thus implicitly executing a stack of run time typed operands, which eventually get compiled and eventually executed as just-in-time typed or statically typed operands and operators. -For [identity](identity.html), we need Cryptographic Resource Identifiers, +For [identity](names/identity.html), we need Cryptographic Resource Identifiers, which cannot conform the “Universal” Resource Identifier syntax and semantics. Lexers are not powerful enough, and the fact that they are still used diff --git a/docs/recognizing_categories_and_instances.md b/docs/recognizing_categories_and_instances.md index 371b979..92ef18b 100644 --- a/docs/recognizing_categories_and_instances.md +++ b/docs/recognizing_categories_and_instances.md @@ -38,16 +38,47 @@ form $1$, with probability $\frac{1}{6}$, $0$ with probability $\frac{4}{6}$, $-1$ with probability $\frac{1}{6}$, though a sparse matrix is apt to distort a sparse vector -There exists a set of points of size $m$ that needs dimension -$$\displaystyle{O(\frac{\log(m)}{ε^2})}$$ +There exists a set of $m$ points that needs dimension +$$\displaystyle{\LARGE\bigcirc\normalsize\frac{\ln(m)}{ε^2}}$$ in order to preserve the distances between all pairs of points within a factor of $1±ε$ -The time to find the nearest neighbour is logarithmic in the number of points, -but exponential in the dimension of the space. So we do one pass with rather -large epsilon, and another pass, using an algorithm proportional to the small -number of candidate neighbours times the dimensionality with a small number -of candidate neighbours found in the first pass. +This is apt to be a lot. We might well have ten million points, and wish to +preserve distances within twenty five percent, in which case we need two +hundred and fifty six dimensions. So a dimensionally reduced point is not +necessarily reduced by a whole lot. + +For spaces of dimension higher than fifteen or so, clever methods of +nearest neighbour search generally fail, and people generally wind up with +brute force search, comparing each point to each of the others, and then +they aggregate into groups by making near neighbours into groups, and +near groups into supergroups. + +Wikipedia reports two open source methods, Locality Sensitive Hashing, +one of them used for exactly this problem, finding groups in emails. + +The problem of finding near neighbours in social space, mapping the +[Kademlia]{#Kademlia} algorithm to social space, is similar but little different, since +every vertex already is more likely to have connection to neighbours, and +for an arbitrary vertex, whose connections we do not know, we want to +find a vertex among those we do know that is more likely to have a +connection to it, or to someone that has a connection to it, that is likely +to be nearer in terms of number of vertex traversals. + +In which case everyone reports a value that reflects his neighbours, and +their neighbours, and their neighbours neighbours, with a neighbourhood +smell that grows more similar when we find a vertex likely to be nearest +neighbour of our target, and the problem is to construct this smell as a +moderately sized blob of data, that can be widely shared, so that each +vertex has unique smell of, say 256 or 512 bits, that reflects who it stably +has the information to quickly connect with, so you look at who you have +a connection with to find who is likely to have a connection to your target, +and he looks up those he is connected with to find someone more likely to +have a connection. + +Locality sensitive hashing, LSH, including the open source email +algorithm, Nilsimsa Hash, attempts to distribute all points that are near +each other into the same bin. So in a space of unenumerably large dimension, such as the set of substrings of an email, or perhaps substrings of bounded length with bounds at spaces, @@ -59,7 +90,7 @@ The optimal instance recognition algorithm, for normally distributed attributes, and for already existent, already known categories, is Mahalanobis distance -Is not the spam characteristic of an email just its $T.(S-G)$, where $T$ is +Is not the spam probability of an email just its $T.(S-G)$, where $T$ is the vector of the email, and $S$ and $G$ are the average vectors of good email and spam email? @@ -70,11 +101,20 @@ unenumerably large dimension, where distributions are necessarily non normal. But variance is, approximately, the log of probability, so Mahalanobis is -more or less Bayes filtering. +more or less Bayes filtering, or at least one can be derived in terms of the other. -So we can reasonably reduce each email into twenty questions space, or, just -to be on the safe side, forty questions space. (Will have to test how many -dimensions empirically retain angles and distances) +So we can reasonably reduce each email into twenty questions space, albeit in practice, a great deal more than twenty. Finding far from random +dimensions that reduce it to a mere twenty or so is an artificial intelligence +hard problem. If random dimensions, need $\bigcirc20\log{(n)}$ dimensions +where $n$ is the number of things. And $n$ is apt to be very large. + +Finding interesting and relevant dimensions, and ignoring irrelevant and +uninteresting dimensions, is the big problem. It is the tie between +categorizing the world into natural kinds and seeing what matters in the +perceptual data while ignoring what is trivial and irrelevant. This requires +non trivial non local and non linear combinations of data, for example +adjusting the perceived colour of the apple for shadow and light colour, so +see the ample, rather than merely the light scattered by the apple into the eye. We then, in the reduced space, find natural groupings, a natural grouping being an elliptic region in high dimensional space where the density is diff --git a/docs/rootDocs/icon.pandoc b/docs/rootDocs/icon.pandoc new file mode 100644 index 0000000..cb3cce0 --- /dev/null +++ b/docs/rootDocs/icon.pandoc @@ -0,0 +1 @@ + diff --git a/docs/set_up_build_environments.md b/docs/set_up_build_environments.md index 7a1d0d3..13877bb 100644 --- a/docs/set_up_build_environments.md +++ b/docs/set_up_build_environments.md @@ -172,6 +172,8 @@ cp -rv ~/.ssh /etc/skel # Actual server +## disable password entry + Setting up an actual server is similar to setting up the virtual machine modelling it, except you have to worry about the server getting overloaded and locking up. @@ -204,9 +206,7 @@ but have enabled passwordless sudo for one special user, you can still get You can always undo the deliberate corruption by setting a new password, providing you can somehow get into root. -```bash -passwd -D cherry -``` +## never enough memory If a server is configured with an [ample swap file] an overloaded server will lock up and have to be ungracefully powered down, which can corrupt the data @@ -1428,7 +1428,7 @@ chown -R www-data:www-data /var/www/blog.reaction.la Replace the defines for `DB_NAME`, `DB_USER`, and `DB_PASSWORD` in `wp_config.php`, as described in [Wordpress on Lemp] -#### To import datbase by command line +#### To import database by command line ```bash systemctl stop nginx @@ -1930,6 +1930,8 @@ postqueue -p You probably will not see any TLS activity. You want to configure Postfix to always attempt SSL, but not require it. +Modify `/etc/postfix/main.cf` using the postconf command: + ```bash # TLS parameters # diff --git a/docs/set_upstream.sh b/docs/set_upstream.sh index 50d1750..0d49cc9 100644 --- a/docs/set_upstream.sh +++ b/docs/set_upstream.sh @@ -3,50 +3,50 @@ set -e set -x echo intended to be run in the event of moving repositories git remote -v -git remote set-url origin git@cpal.pw:~/wallet.git +git remote set-url --push upstream git@rho.la:~/wallet.git git submodule foreach --recursive 'git remote -v' cd libsodium -git remote set-url origin git@cpal.pw:~/libsodium.git +git remote set-url --push upstream git@rho.la:~/mpir.git git remote set-url upstream https://github.com/jedisct1/libsodium.git cd .. cd mpir -git remote set-url origin git@cpal.pw:~/mpir.git +git remote set-url --push upstream git@rho.la:~/mpir.git git remote set-url upstream https://github.com/BrianGladman/mpir.git cd .. cd wxWidgets -git remote set-url origin git@cpal.pw:~/wxWidgets.git +git remote set-url --push upstream git@rho.la:~/wxWidgets.git git remote set-url upstream https://github.com/wxWidgets/wxWidgets.git cd .. cd wxWidgets/3rdparty/catch -git remote set-url origin git@cpal.pw:~/Catch.git +git remote set-url --push upstream git@rho.la:~/Catch.git git remote set-url upstream https://github.com/wxWidgets/Catch.git cd ../../.. cd wxWidgets/3rdparty/nanosvg -git remote set-url origin git@cpal.pw:~/nanosvg +git remote set-url --push upstream git@rho.la:~/nanosvg git remote set-url upstream https://github.com/wxWidgets/nanosvg cd ../../.. cd wxWidgets/3rdparty/pcre -git remote set-url origin git@cpal.pw:~/pcre +git remote set-url --push upstream git@rho.la:~/pcre git remote set-url upstream https://github.com/wxWidgets/pcre cd ../../.. cd wxWidgets/src/expat -git remote set-url origin git@cpal.pw:~/libexpat.git +git remote set-url --push upstream git@rho.la:~/libexpat.git git remote set-url upstream https://github.com/wxWidgets/libexpat.git cd ../../.. cd wxWidgets/src/jpeg -git remote set-url origin git@cpal.pw:~/libjpeg-turbo.git +git remote set-url --push upstream git@rho.la:~/libjpeg-turbo.git git remote set-url upstream https://github.com/wxWidgets/libjpeg-turbo.git cd ../../.. cd wxWidgets/src/png -git remote set-url origin git@cpal.pw:~/libpng.git +git remote set-url --push upstream git@rho.la:~/libpng.git git remote set-url upstream https://github.com/wxWidgets/libpng.git cd ../../.. cd wxWidgets/src/tiff -git remote set-url origin git@cpal.pw:~/libtiff.git +git remote set-url --push upstream git@rho.la:~/libtiff.git git remote set-url upstream https://github.com/wxWidgets/libtiff.git cd ../../.. cd wxWidgets/src/zlib -git remote set-url origin git@cpal.pw:~/zlib.git +git remote set-url --push upstream git@rho.la:~/zlib.git git remote set-url upstream https://github.com/wxWidgets/zlib.git cd ../../.. winConfigure.sh diff --git a/docs/social_networking.md b/docs/social_networking.md index 4fc98d8..084efcd 100644 --- a/docs/social_networking.md +++ b/docs/social_networking.md @@ -1,6 +1,7 @@ --- -title: - Social networking +# katex +title: >- + Social networking ... # the crisis of censorship @@ -263,48 +264,113 @@ of a million shills, scammers, and spammers. So, you can navigate to whole world’s public conversation through approved links and reply-to links – but not every spammer, scammer, and shill in the world can fill your feed with garbage. +## Algorithm and data structure for Zooko name network address - approved links and reply-to links – but not every spammer, scammer, and - shill in the world can fill your feed with garbage. +For this to work, the underlying structure needs to be something based on +the same principles as Git and git repositories, except that Git relies on +SSL and the Certificate Authority system to locate a repository, which +dangerous centralization would fail under the inevitable attack. It needs to +find the network addresses of remote repositories on the basis of the public +key of a Zooko identity of a person who pushed a tag or a branch to that +repository, a branch being a thread, and the branch head in this case being +the most recent response to a thread by a person you are following. + +We want to support a zooko identity for a machine whose owner wants +anyone in the world to be able to connect to, perhaps because he wants an +audience for his ideas, or for what he is selling. And we also want to +support machines for which the connection information is a shared +secret, distributed on a need to know basis. + +And we do not want a central authority with the capability to decide +what that address is. + +For the normal case, zooko identity with public connect information, the +controller of the machine makes public a small number of other zooko +identities which have publicly accessible connect information, and very +long lived network addresses, so that lots of entities are going to have their +current network address cached, which identities he promises to regularly +inform of his current network address. And those entities make public +what they know of that network address, how recently they were informed +of it, the likelihood of that network address suddenly changing, and +apparent uptime of the entity whose network address they are reporting on. + +The final authority for each such item of information is the Zooko +signature of the entity that wishes to be found, and the Zooko signatures of +the other entities that he keeps informed. Thus, the authority over the +informations is fully distributed. + +But in order to collect, distribute, and efficiently obtain this potentially +very large pile of data, there has to be a fair bit of centralization in +practice. Git source code distribution provides a model of how to handle +this without dangerous or harmful centralization, or a central point of +failure. + +For any one library on Git, there are an enormous number of branches. But +in practice, everone winds up following one branch of that library, and if +you make changes in it, and want your changes included, you have to get +the man (and it is always a man) who runs that branch to pull your branch +into his. And that is the centralization that is needed for efficient +distribution of data. But if he is not doing a very good job, some people, +and eventually everyone, eventually winds up following a repository with a +different name, reflecting the fact that a different man is running it. And +that is decentralization that is needed to prevent misconduct or single point of failure. + +So, if someone is providing the service of making other people's network +addresses publicly available, he has to get that one man to pull his data. +Or get another man, whose data is pulled by that one man, to pull his data. + +The reason git repositories scale is that the one man who in fact controls +the one repository that matters the most trusts several other men, each of +whom trust several others, and so information percolates through many +trusted people, eventually into the central repository, where everyone sees it. + +Perhaps that one man might fail to include some zooko identity data for +wicked reasons, that he does not want people to hear what those people are +saying, that he wants to get between the man who wishes to speak, and the +man who wishes to hear. Then some people will start pulling an additional +branch that does include those people, and eventually everyone winds up +pulling that branch, and after a while not pulling the old branch, and +eventually nearly everyone will do the same. + +On the other hand, that one man might fail to include some zooko identity +data because he thinks it is a pile of sybils, shills, scammers, and +entryists, and the pile is too large, wasting too much disk space and +bandwidth, and if he is right, most people will not want that useless +misinformation cluttering up their disks. Or some people might disagree, +and pull a branch that does include those questionable identitities. + +Once our system starts to attract substantial use and attention, a vast pile +of sybil Zooko identities will appear, whose network addresses seem to be +located all over the world, but a very large proportion of them will in fact +be located on one computer in the bowels of the State Department or +Harvard, and to avoid collecting and distributing a hundred gigabytes of +this worthless and meaningless information will require human judgement +and human action, which judgment and action will have to be done by a +rather small number of humans, who will thus have rather too much +power, and their failures rather too great consequences. But, following the +way that git is designed and in practice used, we do not have to give them +unlimited power, nor allow them to be a central point of failure. + +### runningt in schism, with many approximately equal branches + +Under attack, the system may well schism, with no one source that lists all +or most Zooko identities that people are interested in contacting, but it +should, like git, be designed to schism, and work well enough while +schismed. That is what makes Git centralization truly decentralized. +Sometimes, often, there is no one authoritative branch, and things still work. + +The messages of the people you are following are likely to be in a +relatively small number of repositories, even if the total number of +repositories out there is enormous and the number of hashes in each +repository is enormous, so this algorithm and data structure will scale, and +the responses to that thread that they have approved, by people you are not +following, will be commits in that repository, that, by pushing their latest +response to that thread to a public repository, they did the equivalent of a +git commit and push to that repository. + +Each repository contains all the material the poster has approved, resulting +in considerable duplication, but not enormous duplication. --## Algorithm and data structure. -- --For this to work, the underlying structure needs to be something based on --the same principles as Git and git repositories, except that Git relies on --SSL and the Certificate Authority system to locate a repository, which --dangerous centralization would fail under the inevitable attack. It needs to --have instead for its repository name system a Kademlia distributed hash --table within which local repositories find the network addresses of remote --repositories on the basis of the public key of a Zooko identity of a person --who pushed a tag or a branch to that repository, a branch being a thread, --and the branch head in this case being the most recent response to a thread --by a person you are following. -- --So the hashes of identities are tracked by the distributed hash table, but the --hashes of posts are not, because that would result in excessive numbers of --lookups in a table that would very quickly hit its scaling limits. The hashes --of posts are tracked by the repository of the feed that you are looking at, so --require only local lookup, which is faster and less costly than a distributed --lookup. This is equivalent to a fully distributed hash table where the key is --not hash of post, but global name of area of interest, zooko nickname, --zooko public key followed by his human readable thread name (branch --name or tag name in git terminology) followed by hash of post, so that --items that are likely to be looked up together are likely to be located --physically close together on the same disk and will be sent along the same --network connection. -- --The messages of the people you are following are likely to be in a --relatively small number of repositories, even if the total number of --repositories out there is enormous and the number of hashes in each --repository is enormous, so this algorithm and data structure will scale, and --the responses to that thread that they have approved, by people you are not --following, will be commits in that repository, that, by pushing their latest --response to that thread to a public repository, they did the equivalent of a --git commit and push to that repository. -- --Each repository contains all the material the poster has approved, resulting --in considerable duplication, but not enormous duplication. -- The underlying protocol and mechanism is that when you are following Bob, you get a Bob feed from a machine controlled by Bob, or controlled by someone that Bob has chosen to act on his behalf, and that when Bob @@ -320,11 +386,203 @@ state of that tree, with the continually changing root of Bob’s Merkle-patrici tree signed by Bob using his secret key which is kept in a BIP39 style wallet. +When Dave replies to a text in Carol's feed, the Carol text and the reply by +default goes into his feed, and if it does there will be metadata in his feed +about his social network connection to Carol, which, if Bob is following +Dave's feed, can be used by Bob's client to navigate the distribute hash +table to Carol's feed. + +And if Carol approves Dave's reply, or is following Dave or has buddied +Dave, and Bob is following Carol, but not following Dave, then there will +be in metadata in Carol's feed that can be used by Bob's client to navigate +the distribute hash table to Carol's feed. + The metadata in the feed sharing reveals what network addresses are following a feed, but the keys are derived from user identity keys by a one way hash, so are not easily linked to who is posting in the feed. -This handles public posts. + + ### Replacing Kademlia + + [social distance metric]:recognizing_categories_and_instances.html#Kademlia +{target="_blank"} + + I will describe the Kademlia distributed hash table algorithm not in the + way that it is normally described and defined, but in such a way that we + can easily replace its metric by [social distance metric], assuming that we + can construct a suitable metric, which reflects what feeds a given host is + following, and what network addresses it knows and the feeds they are + following, a quantity over which a distance can be found that reflects how + close a peer is to an unstable network address, or knows a peer that is + likely to know a peer that is likely to know an unstable network address. + +A distributed hash table works by each peer on the network maintaining a +large number of live and active connections to computers such that the +distribution of connections to computers distant by the distributed hash +table metric is approximately uniform by distance, which distance is for +Kademlia the $log_2$ of the exclusive-or between his hash and your hash. + + And when you want to connect to an arbitrary computer, you asked the + computers that are nearest in the space to the target for their connections + that are closest to the target. And then you connect to those, and ask the + same question again. + + In the course of this operation, you acquire more and more active + connections, which you purge from time to time to keep the total number + of connections reasonable, the distribution approximately uniform, the + connections preferentially to computers with long lived network addresses + and open ports, and connections that are distant from you distant from + each other. + + The reason that the Kademlia distributed hash table cannot work in the + face of enemy action, is that the shills who want to prevent something + from being found create a hundred entries with a hash close to their target + by Kademlia distance, and then when your search brings you close to + target, it brings you to a shill, who misdirects you. Using social network + distance resists this attack. + +The messages of the people you are following are likely to be in a +relatively small number of repositories, even if the total number of +repositories out there is enormous and the number of hashes in each +repository is enormous, so this algorithm and data structure will scale, and +the responses to that thread that they have approved, by people you are not +following, will be commits in that repository, that, by pushing their latest +response to that thread to a public repository, they did the equivalent of a +git commit and push to that repository. + +Each repository contains all the material the poster has approved, resulting +in considerable duplication, but not enormous duplication. + + + approved links and reply-to links – but not every spammer, scammer, and + shill in the world can fill your feed with garbage. + + +### Kademlia in social space + +The vector of an identity is $+1$ for each one bit, and $-1$ for each zero bit. + +We don't use the entire two hundred fifty six dimensional vector, just +enough of it that the truncated vector of every identity that anyone might +be tracking has a very high probability of being approximately orthogonal +to the truncated vector of every other identity. + +We do not have, and do not need, an exact consensus on how much of the +vector to actually use, but everyone needs to use roughly the same amount +as everyone else. The amount is adjusted according to what is, over time, +needed, by each identity adjusting according to circumstances, with the +result that over time the consensus adjusts to what is needed. + +Each party indicates what entities he can provide a direct link to by +publishing the sum of the vectors of the parties he can link to - and also +the sum of the their sums, and also the sum of their ... to as many deep as +turns out to be needed in practice, which is likely to two or three such +vector sums, maybe four or five. What is needed will depend on the +pattern of tracking that people engage in in practice. + +If everyone behind a firewall or with an unstable network address arranges +to notify a well known peer with stable network address whenever his +address changes, and that peer, as part of the arrangement, includes him in +that peer's sum vector, the number of well known peers with stable +network address offering this service is not enormously large, they track +each other, and everyone tracks some of them, we only need the sum and +the sum of sums. + +When someone is looking to find how to connect to an identity, he goes +through the entities he can connect to, and looks at the dot product of +their sum vectors with target identity vector. + +He contacts the closest entity, or a close entity, and if that does not work +out, contacts another. The closest entity will likely be able to contact +the target, or contact an entity more likely to be able to contact the target. + +* the identity vector represents the public key of a peer +* the sum vector represents what identities a peer thinks he has valid connection information for. +* the sum of sum vectors indicate what identities that he thinks he can connect to think that they can connect to. +* the sum of the sum of the sum vectors ... + +A vector that provides the paths to connect to a billion entities, each of +them redundantly through a thousand different paths, is still sixty or so +thirty two bit signed integers, distributed in a normal distribution with a +variance of a million or so, but everyone has to store quite a lot of such +vectors. Small devices such as phones can get away with tracking a small +number of such integers, at the cost of needing more lookups, hence not being +very useful for other people to track for connection information. + +To prevent hostile parties from jamming the network by registering + identities that closely approximate identities that they do not want people + to be able to look up, we need the system to work in such a way that + identities that lots of people want to look up tend to heavily over + represented in sum of sums vectors relative to those that no one wants to + look up. If you repeatedly provide lookup services for a certain entity, + you should track that entity that had last stable network address on the + path that proved successful to the target entity, so that peers that + provide useful tracking information are over represented, and entities that + provide useless tracking information are under represented. + + If an entity makes publicly available network address information for an + identity whose vector is an improbably good approximation to an existing + widely looked up vector, a sybil attack is under way, and needs to be + ignored. + +To be efficient at very large scale, the network should contain a relatively +small number of large well connected devices each of which tracks the +tracking information of large number of other such computers, and a large +number of smaller, less well connected devices, that track their friends and +acquaintances, and also track well connected devices. Big fanout on on the +interior vertices, smaller fanout on the exterior vertices, stable identities +on all devices, moderately stable network addresses on the interior vertices, +possibly unstable network addresses on the exterior vertices. + +If we have a thousand identities that are making public the information +needed to make connection to them, and everyone tracks all the peers that +provide third party look up service, we need only the first sum, and only +about twenty dimensions. + +But if everyone attempts to track all the connection information network +for all peers that provide third party lookup services, there are soon going +to be a whole lot shill, entryist, and spammer peers purporting to provide +such services, whereupon we will need white lists, grey lists, and human +judgement, and not everyone will track all peers who are providing third +party lookup services, whereupon we need the first two sums. + +In that case random peer searching for connection information to another +random peer first looks to through those for which has good connection +information, does not find the target. Then looks through for someone +connected to the target, may not find him, then looks for someone +connected to someone connected to the target and, assuming that most +genuine peers providing tracking information are tracking most other +peers providing genuine tracking information, and the peer doing the +search has the information for a fair number of peers providing genuine +tracking information, will find him. + +Suppose there are a billion peers for which tracking information exists. In +that case, we need the first seventy or so dimensions, and possibly one +more level of indirection in the lookup (the sum of the sum of the sum of +vectors being tracked). Suppose a trillion peers, then about the first eighty +dimensions, and possibly one more level of indirection in the lookup. + +That is a quite large amount of data, but if who is tracking whom is stable, +even if the network addresses are unstable, updates are infrequent and small. + +If everyone tracks ten thousand identities, and we have a billion identities +whose network address is being made public, and million always up peers +with fairly stable network addresses, each of whom tracks one thousand +unstable network addresses and several thousand other peers who also +track large numbers of unstable addresses, then we need about fifty +dimensions and two sum vectors for each entity being tracked, about a +million integers, total -- too big to be downloaded in full every time, but +not a problem if downloaded in small updates, or downloaded in full +infrequently. + +But suppose no one specializes in tracking unstable network addresses. +If your network address is unstable, you only provide updates to those +following your feed, and if you have a lot of followers, you have to get a +stable network address with a stable open port so that you do not have to +update them all the time. Then our list of identities whose connection +information we track will be considerably smaller, but our level of +indirection considerably deeper - possibly needing six or so deep in sum of +the sum of ... sum of identity vectors. ## Private messaging diff --git a/docs/writing_and_editing_documentation.md b/docs/writing_and_editing_documentation.md index 2e817f0..cab51ac 100644 --- a/docs/writing_and_editing_documentation.md +++ b/docs/writing_and_editing_documentation.md @@ -307,13 +307,15 @@ In this table, edited in a fixed font, you are using whitespace and blank lines ### Grid tables -Allows multiline, and alignment, but visual studio does not like it, and you still have to count those spacees +Allows multiline, and alignment, but visual studio does not like it, and you still have to count those spaces +---------------+---------------+--------------------+ | Fruit | Price | Advantages | +===============+==============:+====================+ -| Bananas | $1.34 | - built-in wrapper | -| | | - bright color | +| Bananas | $1.34 | Mary had a little lamb whose fleece was white as snow, and everywhere that | +| | | Mary went the lamb was sure to go | +| | | | +| | | bright color | +---------------+---------------+--------------------+ | Oranges | $2.10 | - cures scurvy | | | | - tasty | @@ -479,7 +481,16 @@ defined by very small source code. font-weight="400" stroke-width="2" style="text-decoration:underline; cursor:pointer;" > -