Deleted DHT design from social networking preparatory to writing

up the new design

Added nfs to setup documentation
This commit is contained in:
reaction.la 2024-02-12 00:12:58 +00:00
parent a856d438e7
commit 495f667c6f
No known key found for this signature in database
GPG Key ID: 99914792148C8388
3 changed files with 69 additions and 178 deletions

View File

@ -455,185 +455,9 @@ way hash, so are not easily linked to who is posting in the feed.
### Replacing Kademlia ### Replacing Kademlia
[social distance metric]:recognizing_categories_and_instances.html#Kademlia This design deleted, because its scaling properties turned out to be unexpectedly bad.
{target="_blank"}
I will describe the Kademlia distributed hash table algorithm not in the I am now writing up a better design.
way that it is normally described and defined, but in such a way that we
can easily replace its metric by [social distance metric], assuming that we
can construct a suitable metric, which reflects what feeds a given host is
following, and what network addresses it knows and the feeds they are
following, a quantity over which a distance can be found that reflects how
close a peer is to an unstable network address, or knows a peer that is
likely to know a peer that is likely to know an unstable network address.
A distributed hash table works by each peer on the network maintaining a
large number of live and active connections to computers such that the
distribution of connections to computers distant by the distributed hash
table metric is approximately uniform by distance, which distance is for
Kademlia the $log_2$ of the exclusive-or between his hash and your hash.
And when you want to connect to an arbitrary computer, you asked the
computers that are nearest in the space to the target for their connections
that are closest to the target. And then you connect to those, and ask the
same question again.
This works if each computer has approximately the same number of connections
close to it by a metric as distant from it by some metric. So it will be
connected to almost all of the computers that are nearby to it by that metric.
In the course of this operation, you acquire more and more active
connections, which you purge from time to time to keep the total number
of connections reasonable and the distribution approximately uniform by the
metric of distance used.
The reason that the Kademlia distributed hash table cannot work in the
face of enemy action, is that the shills who want to prevent something
from being found create a hundred entries with a hash close to their target
by Kademlia distance, and then when your search brings you close to
target, it brings you to a shill, who misdirects you. Using social network
distance resists this attack.
The messages of the people you are following are likely to be in a
relatively small number of repositories, even if the total number of
repositories out there is enormous and the number of hashes in each
repository is enormous, so this algorithm and data structure will scale, and
the responses to that thread that they have approved, by people you are not
following, will be commits in that repository, that, by pushing their latest
response to that thread to a public repository, they did the equivalent of a
git commit and push to that repository.
Each repository contains all the material the poster has approved, resulting
in considerable duplication, but not enormous duplication, approved links and
reply-to links but not every spammer, scammer, and
shill in the world can fill your feed with garbage.
### Kademlia in social space
The vector of an identity is $+1$ for each one bit, and $-1$ for each zero bit.
We don't use the entire two hundred fifty six dimensional vector, just
enough of it that the truncated vector of every identity that anyone might
be tracking has a very high probability of being approximately orthogonal
to the truncated vector of every other identity.
We do not have, and do not need, an exact consensus on how much of the
vector to actually use, but everyone needs to use roughly the same amount
as everyone else. The amount is adjusted according to what is, over time,
needed, by each identity adjusting according to circumstances, with the
result that over time the consensus adjusts to what is needed.
Each party indicates what entities he can provide a direct link to by
publishing the sum of the vectors of the parties he can link to - and also
the sum of the their sums, and also the sum of their ... to as many deep as
turns out to be needed in practice, which is likely to two or three such
vector sums, maybe four or five. What is needed will depend on the
pattern of tracking that people engage in in practice.
If everyone behind a firewall or with an unstable network address arranges
to notify a well known peer with stable network address whenever his
address changes, and that peer, as part of the arrangement, includes him in
that peer's sum vector, the number of well known peers with stable
network address offering this service is not enormously large, they track
each other, and everyone tracks some of them, we only need the sum and
the sum of sums.
When someone is looking to find how to connect to an identity, he goes
through the entities he can connect to, and looks at the dot product of
their sum vectors with target identity vector.
He contacts the closest entity, or a close entity, and if that does not work
out, contacts another. The closest entity will likely be able to contact
the target, or contact an entity more likely to be able to contact the target.
* the identity vector represents the public key of a peer
* the sum vector represents what identities a peer thinks he has valid connection information for.
* the sum of sum vectors indicate what identities that he thinks he can connect to think that they can connect to.
* the sum of the sum of the sum vectors ...
A vector that provides the paths to connect to a billion entities, each of
them redundantly through a thousand different paths, is still sixty or so
thirty two bit signed integers, distributed in a normal distribution with a
variance of a million or so, but everyone has to store quite a lot of such
vectors. Small devices such as phones can get away with tracking a small
number of such integers, at the cost of needing more lookups, hence not being
very useful for other people to track for connection information.
To prevent hostile parties from jamming the network by registering
identities that closely approximate identities that they do not want people
to be able to look up, we need the system to work in such a way that
identities that lots of people want to look up tend to heavily over
represented in sum of sums vectors relative to those that no one wants to
look up. If you repeatedly provide lookup services for a certain entity,
you should track that entity that had last stable network address on the
path that proved successful to the target entity, so that peers that
provide useful tracking information are over represented, and entities that
provide useless tracking information are under represented.
If an entity makes publicly available network address information for an
identity whose vector is an improbably good approximation to an existing
widely looked up vector, a sybil attack is under way, and needs to be
ignored.
To be efficient at very large scale, the network should contain a relatively
small number of large well connected devices each of which tracks the
tracking information of large number of other such computers, and a large
number of smaller, less well connected devices, that track their friends and
acquaintances, and also track well connected devices. Big fanout on on the
interior vertices, smaller fanout on the exterior vertices, stable identities
on all devices, moderately stable network addresses on the interior vertices,
possibly unstable network addresses on the exterior vertices.
If we have a thousand identities that are making public the information
needed to make connection to them, and everyone tracks all the peers that
provide third party look up service, we need only the first sum, and only
about twenty dimensions.
But if everyone attempts to track all the connection information network
for all peers that provide third party lookup services, there are soon going
to be a whole lot shill, entryist, and spammer peers purporting to provide
such services, whereupon we will need white lists, grey lists, and human
judgement, and not everyone will track all peers who are providing third
party lookup services, whereupon we need the first two sums.
In that case random peer searching for connection information to another
random peer first looks to through those for which has good connection
information, does not find the target. Then looks through for someone
connected to the target, may not find him, then looks for someone
connected to someone connected to the target and, assuming that most
genuine peers providing tracking information are tracking most other
peers providing genuine tracking information, and the peer doing the
search has the information for a fair number of peers providing genuine
tracking information, will find him.
Suppose there are a billion peers for which tracking information exists. In
that case, we need the first seventy or so dimensions, and possibly one
more level of indirection in the lookup (the sum of the sum of the sum of
vectors being tracked). Suppose a trillion peers, then about the first eighty
dimensions, and possibly one more level of indirection in the lookup.
That is a quite large amount of data, but if who is tracking whom is stable,
even if the network addresses are unstable, updates are infrequent and small.
If everyone tracks ten thousand identities, and we have a billion identities
whose network address is being made public, and million always up peers
with fairly stable network addresses, each of whom tracks one thousand
unstable network addresses and several thousand other peers who also
track large numbers of unstable addresses, then we need about fifty
dimensions and two sum vectors for each entity being tracked, about a
million integers, total -- too big to be downloaded in full every time, but
not a problem if downloaded in small updates, or downloaded in full
infrequently.
But suppose no one specializes in tracking unstable network addresses.
If your network address is unstable, you only provide updates to those
following your feed, and if you have a lot of followers, you have to get a
stable network address with a stable open port so that you do not have to
update them all the time. Then our list of identities whose connection
information we track will be considerably smaller, but our level of
indirection considerably deeper - possibly needing six or so deep in sum of
the sum of ... sum of identity vectors.
## Private messaging ## Private messaging

View File

@ -1,3 +1,10 @@
body { body {
max-width: 30em;
margin-left: 1em;
font-family:"DejaVu Serif", "Georgia", serif;
font-style: normal;
font-variant: normal;
font-weight: normal;
font-stretch: normal;
font-size: 100%; font-size: 100%;
} }

View File

@ -3839,3 +3839,63 @@ Not much work has been done on this project recently, though development and mai
## Freenet ## Freenet
See [libraries](../libraries.html#freenet) See [libraries](../libraries.html#freenet)
# Network file system
This is most useful when you have a lot of real and
virtual machines on your local network
## Server
```bash
sudo apt update && sudo apt upgrade -qy
sudo apt install -qy nfs-kernel-server nfs-common.
sudo nano /etc/default/nfs-common
```
In the configuration file `nfs-common` change the paramter NEED_STATD to no and NEED_IDMAPD to yes. The NFSv4 required NEED_IDMAPD that will be used as the ID mapping daemon and provides functionality between the server and client.
```terminal_image
NEED_STATD="no"
NEED_IDMAPD="yes"
```
Then to disable nfs3 `sudo nano /etc/default/nfs-kernel-server`
```terminal_image
RPCNFSDOPTS="-N 2 -N 3"
RPCMOUNTDOPTS="--manage-gids -N 2 -N 3"
```
then to export the root of your nfs file system: `sudo nano /etc/exports`
```terminal_image
/nfs 192.168.1.0/24(rw,async,fsid=0,crossmnt,no_subtree_check,no_root_squash)
```
```bash
sudo systemctl restart nfs-server
sudo showmount -e
```
## client
```bash
sudo apt update && sudo apt upgrade -qy
sudo apt install -qy nfs-common
sudo mkdir «mydirectory»
sudo nano /etc/fstab
```
```terminal_image
# <file system> <mount point> <type> <options> <dump> <pass>
«mynfsserver».local:/ «mydirectory» nfs4 _netdev 0 0
```
Where the «funny brackets», as always, indicate mutas mutandis.
```bash
sudo systemctl daemon-reload
sudo mount -a
sudo df -h
```