Deleted DHT design from social networking preparatory to writing

up the new design Added nfs to setup documentation
2024-02-12 00:12:58 +00:00 · 2024-02-12 00:12:58 +00:00 · 495f667c6f
commit 495f667c6f
parent a856d438e7
3 changed files with 69 additions and 178 deletions
--- a/docs/manifesto/social_networking.md
+++ b/docs/manifesto/social_networking.md
@ -455,185 +455,9 @@ way hash, so are not easily linked to who is posting in the feed.
 ### Replacing Kademlia
- [social distance metric]:recognizing_categories_and_instances.html#Kademlia
+This design deleted, because its scaling properties turned out to be unexpectedly bad.
 {target="_blank"}
- I will describe the Kademlia distributed hash table algorithm not in the
+I am now writing up a better design.
 way that it is normally described and defined, but in such a way that we
 can easily replace its metric by [social distance metric], assuming that we
 can construct a suitable metric, which reflects what feeds a given host is
 following, and what network addresses it knows and the feeds they are
 following, a quantity over which a distance can be found that reflects how
 close a peer is to an unstable network address, or knows a peer that is
 likely to know a peer that is likely to know an unstable network address.
 A distributed hash table works by each peer on the network maintaining a
 large number of live and active connections to computers such that the
 distribution of connections to computers distant by the  distributed hash
 table metric is approximately uniform by distance, which distance is for
 Kademlia the $log_2$  of the exclusive-or between his hash and your hash.
 And when you want to connect to an arbitrary computer, you asked the
 computers that are nearest in the space to the target for their connections
 that are closest to the target.  And then you connect to those, and ask the
 same question again.
 This works if each computer has approximately the same number of connections 
 close to it by a metric as distant from it by some metric.  So it will be
 connected to almost all of the computers that are nearby to it by that metric.
 In the course of this operation, you acquire more and more active
 connections, which you purge from time to time to keep the total number
 of connections reasonable and the distribution approximately uniform by the
 metric of distance used.
  The reason that the Kademlia distributed hash table cannot work in the
 face of enemy action, is that the shills who want to prevent something
 from being found create a hundred entries with a hash close to their target
 by Kademlia distance, and then when your search brings you close to
 target, it brings you to a shill, who misdirects you.  Using social network
 distance resists this attack.
 The messages of the people you are following are likely to be in a
 relatively small number of repositories, even if the total number of
 repositories out there is enormous and the number of hashes in each
 repository is enormous, so this algorithm and data structure will scale, and
 the responses to that thread that they have approved, by people you are not
 following, will be commits in that repository, that, by pushing their latest
 response to that thread to  a public repository, they did the equivalent of a
 git commit and push to that repository.
 Each repository contains all the material the poster has approved, resulting
 in considerable duplication, but not enormous duplication, approved links and
 reply-to links – but not every spammer, scammer, and
 shill in the world can fill your feed with garbage.
 ### Kademlia in social space
 The vector of an identity is $+1$ for each one bit, and $-1$ for each zero bit.
 We don't use the entire two hundred fifty six dimensional vector, just
 enough of it that the truncated vector of every identity that anyone might
 be tracking has a very high probability of being approximately orthogonal
 to the truncated vector of every other identity.
 We do not have, and do not need, an exact consensus on how much of the
 vector to actually use, but everyone needs to use roughly the same amount
 as everyone else.  The amount is adjusted according to what is, over time,
 needed, by each identity adjusting according to circumstances, with the
 result that over time the consensus adjusts to what is needed.
 Each party indicates what entities he can provide a direct link to by
 publishing the sum of the vectors of the parties he can link to - and also
 the sum of the their sums, and also the sum of their ...  to as many deep as
 turns out to be needed in practice, which is likely to  two or three such
 vector sums, maybe four or five.  What is needed will depend on the
 pattern of tracking that people engage in in practice.
 If everyone behind a firewall or with an unstable network address arranges
 to notify a well known peer with stable network address whenever his
 address changes, and that peer, as part of the arrangement, includes him in
 that peer's sum vector, the number of well known peers with stable
 network address offering this service is not enormously large, they track
 each other, and everyone tracks some of them, we only need the sum and
 the sum of sums.
 When someone is looking to find how to connect to an identity, he goes
 through the entities he can connect to, and looks at the dot product of
 their sum vectors with target identity vector.
 He contacts the closest entity, or a close entity, and if that does not work
 out, contacts another.  The closest entity will likely be able to contact
 the target, or contact an entity more likely to be able to contact the target.
 * the identity vector represents the public key of a peer
 * the sum vector represents what identities a peer thinks he has valid connection information for.
 * the sum of sum vectors indicate what identities that he thinks he can connect to think that they can connect to.
 * the sum of the sum of the sum vectors ...
 A vector that provides the paths to connect to a billion entities, each of
 them redundantly through a thousand different paths, is still sixty or so 
 thirty two bit signed integers, distributed in a normal distribution with a
 variance of a million or so, but everyone has to store quite a lot of such
 vectors.  Small devices such as phones can get away with tracking a small
 number of such integers, at the cost of needing more lookups, hence not being
 very useful for other people to track for connection information.
 To prevent hostile parties from jamming the network by registering
 identities that closely approximate identities that they do not want people
 to be able to look up, we need the system to work in such a way that
 identities that lots of people want to look up tend to heavily over
 represented in sum of sums vectors relative to those that no one wants to
 look up.  If you repeatedly provide lookup services for a certain entity,
 you should track that entity that had last stable network address on the
 path that proved successful to the target entity, so that peers that
 provide useful tracking information are over represented, and entities that
 provide useless tracking information are under represented.
 If an entity makes publicly available network address information for an
 identity whose vector is an improbably good approximation to an existing
 widely looked up vector, a sybil attack is under way, and needs to be
 ignored.
 To be efficient at very large scale, the network should contain a relatively
 small number of large well connected devices each of which tracks the
 tracking information of large number of other such computers, and a large
 number of smaller, less well connected devices, that track their friends and
 acquaintances, and also track well connected devices.  Big fanout on on the
 interior vertices, smaller fanout on the exterior vertices, stable identities
 on all devices, moderately stable network addresses on the interior vertices,
 possibly unstable network addresses on the exterior vertices.
 If we have a thousand identities that are making public the information
 needed to make connection to them, and everyone tracks all the peers that
 provide third party look up service, we need only the first sum, and only
 about twenty dimensions.
 But if everyone attempts to track all the connection information network
 for all peers that provide third party lookup services, there are soon going
 to be a whole lot shill, entryist, and spammer peers purporting to provide
 such services, whereupon we will need white lists, grey lists, and human
 judgement, and not everyone will track all peers who are providing third
 party lookup services, whereupon we need the first two sums.
 In that case random peer searching for connection information to another
 random peer first looks to through those for which has good connection
 information, does not find the target.  Then looks through for someone
 connected to the target, may not find him, then looks for someone
 connected to someone connected to the target and, assuming that most
 genuine peers providing tracking information are tracking most other
 peers providing genuine tracking information, and the peer doing the
 search has the information for a fair number of peers providing genuine
 tracking information, will find him.
 Suppose there are a billion peers for which tracking information exists.  In
 that case, we need the first seventy or so dimensions, and possibly one
 more level of indirection in the lookup (the sum of the sum of the sum of
 vectors being tracked).  Suppose a trillion peers, then about the first eighty
 dimensions, and possibly one more level of indirection in the lookup.
 That is a quite large amount of data, but if who is tracking whom is stable,
 even if the network addresses are unstable, updates are infrequent and small.
 If everyone tracks ten thousand identities, and we have a billion identities
 whose network address is being made public, and million always up peers 
 with fairly stable network addresses, each of whom tracks one thousand
 unstable network addresses and several thousand other peers who also
 track large numbers of unstable addresses, then we need about fifty
 dimensions and two sum vectors for each entity being tracked, about a
 million integers, total -- too big to be downloaded in full every time, but
 not a problem if downloaded in small updates, or downloaded in full
 infrequently. 
 But suppose no one specializes in tracking unstable network addresses. 
 If your network address is unstable, you only provide updates to those
 following your feed, and if you have a lot of followers, you have to get a
 stable network address with a stable open port so that you do not have to
 update them all the time.  Then our list of identities whose connection
 information we track will be considerably smaller, but our level of
 indirection considerably deeper - possibly needing six or so deep in sum of
 the sum of ... sum of identity vectors.
 ##   Private messaging
--- a/docs/pandoc_templates/vscode.css
+++ b/docs/pandoc_templates/vscode.css
@ -1,3 +1,10 @@
 body {
 	max-width: 30em;
 	margin-left: 1em;
 	font-family:"DejaVu Serif", "Georgia", serif;
 	font-style: normal;
 	font-variant: normal;
 	font-weight: normal;
 	font-stretch: normal;
 	font-size: 100%;
 	}
--- a/docs/setup/set_up_build_environments.md
+++ b/docs/setup/set_up_build_environments.md
@ -3839,3 +3839,63 @@ Not much work has been done on this project recently, though development and mai
 ## Freenet
 See [libraries](../libraries.html#freenet)
 # Network file system
 This is most useful when you have a lot of real and
 virtual machines on your local network
 ## Server
 ```bash
 sudo apt update && sudo apt upgrade -qy
 sudo apt install -qy nfs-kernel-server nfs-common.
 sudo nano /etc/default/nfs-common
 ```
 In the configuration file `nfs-common` change the paramter NEED_STATD to no and NEED_IDMAPD to yes. The NFSv4 required NEED_IDMAPD that will be used as the ID mapping daemon and provides functionality between the server and client.
 ```terminal_image
 NEED_STATD="no"
 NEED_IDMAPD="yes"
 ```
 Then to disable nfs3 `sudo nano /etc/default/nfs-kernel-server`
 ```terminal_image
 RPCNFSDOPTS="-N 2 -N 3"
 RPCMOUNTDOPTS="--manage-gids -N 2 -N 3"
 ```
 then to export the root of your nfs file system: `sudo nano /etc/exports`
 ```terminal_image
 /nfs     192.168.1.0/24(rw,async,fsid=0,crossmnt,no_subtree_check,no_root_squash)
 ```
 ```bash
 sudo systemctl restart nfs-server
 sudo showmount -e
 ```
 ## client
 ```bash
 sudo apt update && sudo apt upgrade -qy
 sudo apt install -qy nfs-common
 sudo mkdir «mydirectory»
 sudo nano /etc/fstab
 ```
 ```terminal_image
 # <file system>       <mount point> <type> <options> <dump> <pass>
 «mynfsserver».local:/ «mydirectory» nfs4   _netdev   0      0
 ```
 Where the «funny brackets», as always, indicate mutas mutandis.
 ```bash
 sudo systemctl daemon-reload
 sudo mount -a
 sudo df -h
 ```