2022-02-16 00:53:01 -05:00
|
|
|
|
---
|
|
|
|
|
title: Recognizing categories, and recognizing particulars as forming a category
|
|
|
|
|
# katex
|
2022-05-06 22:49:33 -04:00
|
|
|
|
...
|
2022-02-16 00:53:01 -05:00
|
|
|
|
This is, of course, a deep unsolved problem in philosophy.
|
|
|
|
|
|
|
|
|
|
However, it seems to be soluble as computer algorithm. Programs that do
|
|
|
|
|
this, ought to look conscious.
|
|
|
|
|
|
|
|
|
|
There are a lot of programs solving things that I though were AI hard, for
|
|
|
|
|
example recognizing pornography, recognizing faces in images, predicting what
|
|
|
|
|
music, or what books, or what movies, a particular customer might like.
|
|
|
|
|
|
|
|
|
|
We have clustering algorithms that work in on points in spaces of reasonably
|
|
|
|
|
small dimension. However, instances are sparse vectors in space of
|
|
|
|
|
unenumerably large dimension.
|
|
|
|
|
|
|
|
|
|
Consider, for example, the problem of grouping like documents to like, for
|
|
|
|
|
spam filtering. Suppose the properties of the document are all substrings of
|
|
|
|
|
the document of twenty words or less and 200 characters or less. In that case,
|
|
|
|
|
there are as many dimensions as there are two hundred character strings.
|
|
|
|
|
|
|
|
|
|
# Dimensional reduction
|
|
|
|
|
|
|
|
|
|
The combinatorial explosion occurs because we have taken the wrong approach
|
|
|
|
|
to reducing problems that originate in the physical world of very large
|
|
|
|
|
dimension, large because each quality of the objects involved or potentially
|
|
|
|
|
involved is a dimension.
|
|
|
|
|
|
|
|
|
|
The cool magic trick that makes this manageable is dimensional reduction.
|
|
|
|
|
Johnson and Lindenstrauss discovered in the early 1980s that if one has
|
|
|
|
|
$O(2^n)$ points in a space of very large dimension, a random projection onto a
|
|
|
|
|
space of dimension $O(n)$ does not much affect distances and angles.
|
|
|
|
|
|
|
|
|
|
Achlioptas found that this is true for the not very random mapping wherein
|
|
|
|
|
elements of the matrix mapping the large space to the smaller space have the
|
|
|
|
|
form $1$, with probability $\frac{1}{6}$, $0$ with probability $\frac{4}{6}$,
|
|
|
|
|
$-1$ with probability $\frac{1}{6}$, though a sparse matrix is apt to
|
|
|
|
|
distort a sparse vector
|
|
|
|
|
|
2022-06-30 19:18:44 -04:00
|
|
|
|
There exists a set of $m$ points that needs dimension
|
|
|
|
|
$$\displaystyle{\LARGE\bigcirc\normalsize\frac{\ln(m)}{ε^2}}$$
|
2022-02-16 00:53:01 -05:00
|
|
|
|
in order to preserve the distances
|
|
|
|
|
between all pairs of points within a factor of $1±ε$
|
|
|
|
|
|
2022-06-30 19:18:44 -04:00
|
|
|
|
This is apt to be a lot. We might well have ten million points, and wish to
|
|
|
|
|
preserve distances within twenty five percent, in which case we need two
|
|
|
|
|
hundred and fifty six dimensions. So a dimensionally reduced point is not
|
|
|
|
|
necessarily reduced by a whole lot.
|
|
|
|
|
|
|
|
|
|
For spaces of dimension higher than fifteen or so, clever methods of
|
|
|
|
|
nearest neighbour search generally fail, and people generally wind up with
|
|
|
|
|
brute force search, comparing each point to each of the others, and then
|
|
|
|
|
they aggregate into groups by making near neighbours into groups, and
|
|
|
|
|
near groups into supergroups.
|
|
|
|
|
|
|
|
|
|
Wikipedia reports two open source methods, Locality Sensitive Hashing,
|
|
|
|
|
one of them used for exactly this problem, finding groups in emails.
|
|
|
|
|
|
|
|
|
|
The problem of finding near neighbours in social space, mapping the
|
|
|
|
|
[Kademlia]{#Kademlia} algorithm to social space, is similar but little different, since
|
|
|
|
|
every vertex already is more likely to have connection to neighbours, and
|
|
|
|
|
for an arbitrary vertex, whose connections we do not know, we want to
|
|
|
|
|
find a vertex among those we do know that is more likely to have a
|
|
|
|
|
connection to it, or to someone that has a connection to it, that is likely
|
|
|
|
|
to be nearer in terms of number of vertex traversals.
|
|
|
|
|
|
|
|
|
|
In which case everyone reports a value that reflects his neighbours, and
|
|
|
|
|
their neighbours, and their neighbours neighbours, with a neighbourhood
|
|
|
|
|
smell that grows more similar when we find a vertex likely to be nearest
|
|
|
|
|
neighbour of our target, and the problem is to construct this smell as a
|
|
|
|
|
moderately sized blob of data, that can be widely shared, so that each
|
|
|
|
|
vertex has unique smell of, say 256 or 512 bits, that reflects who it stably
|
|
|
|
|
has the information to quickly connect with, so you look at who you have
|
|
|
|
|
a connection with to find who is likely to have a connection to your target,
|
|
|
|
|
and he looks up those he is connected with to find someone more likely to
|
|
|
|
|
have a connection.
|
|
|
|
|
|
|
|
|
|
Locality sensitive hashing, LSH, including the open source email
|
|
|
|
|
algorithm, Nilsimsa Hash, attempts to distribute all points that are near
|
|
|
|
|
each other into the same bin.
|
2022-02-16 00:53:01 -05:00
|
|
|
|
|
|
|
|
|
So in a space of unenumerably large dimension, such as the set of substrings
|
|
|
|
|
of an email, or perhaps substrings of bounded length with bounds at spaces,
|
|
|
|
|
carriage returns, and punctuation, we deterministically hash each substring,
|
|
|
|
|
and use the hash to deterministically assign a mapping between the vector
|
|
|
|
|
corresponding to this substring, and a vector in the reduced space.
|
|
|
|
|
|
|
|
|
|
The optimal instance recognition algorithm, for normally distributed
|
|
|
|
|
attributes, and for already existent, already known categories, is Mahalanobis
|
|
|
|
|
distance
|
|
|
|
|
|
2022-07-02 02:44:09 -04:00
|
|
|
|
Is not the spam probability of an email just its $T.(S-G)$, where $T$ is
|
2022-02-16 00:53:01 -05:00
|
|
|
|
the vector of the email, and $S$ and $G$ are the average vectors of good
|
|
|
|
|
email and spam email?
|
|
|
|
|
|
|
|
|
|
Variance works, instead of probability – Mahalanobis distance, but this is
|
|
|
|
|
most reasonable for things that have reasonable dimension, like attributing
|
|
|
|
|
skulls to races, while dimensional reduction is most useful in spaces of
|
|
|
|
|
unenumerably large dimension, where distributions are necessarily non
|
|
|
|
|
normal.
|
|
|
|
|
|
|
|
|
|
But variance is, approximately, the log of probability, so Mahalanobis is
|
2022-06-30 19:18:44 -04:00
|
|
|
|
more or less Bayes filtering, or at least one can be derived in terms of the other.
|
2022-02-16 00:53:01 -05:00
|
|
|
|
|
2022-07-02 02:44:09 -04:00
|
|
|
|
So we can reasonably reduce each email into twenty questions space, albeit in practice, a great deal more than twenty. Finding far from random
|
|
|
|
|
dimensions that reduce it to a mere twenty or so is an artificial intelligence
|
|
|
|
|
hard problem. If random dimensions, need $\bigcirc20\log{(n)}$ dimensions
|
|
|
|
|
where $n$ is the number of things. And $n$ is apt to be very large.
|
|
|
|
|
|
|
|
|
|
Finding interesting and relevant dimensions, and ignoring irrelevant and
|
|
|
|
|
uninteresting dimensions, is the big problem. It is the tie between
|
|
|
|
|
categorizing the world into natural kinds and seeing what matters in the
|
|
|
|
|
perceptual data while ignoring what is trivial and irrelevant. This requires
|
|
|
|
|
non trivial non local and non linear combinations of data, for example
|
|
|
|
|
adjusting the perceived colour of the apple for shadow and light colour, so
|
|
|
|
|
see the ample, rather than merely the light scattered by the apple into the eye.
|
2022-02-16 00:53:01 -05:00
|
|
|
|
|
|
|
|
|
We then, in the reduced space, find natural groupings, a natural grouping
|
|
|
|
|
being an elliptic region in high dimensional space where the density is
|
|
|
|
|
anomalously large, or rather a normal distribution in high dimensional space
|
|
|
|
|
such that assigning a particular email to a particular normal distribution
|
|
|
|
|
dramatically reduces the entropy.
|
|
|
|
|
|
|
|
|
|
We label each such natural grouping with the statistically improbable phrase
|
|
|
|
|
that best distinguishes members of the grouping from all other such groupings.
|
|
|
|
|
|
|
|
|
|
The end user can then issue rules that mails belonging to certain groupings
|
|
|
|
|
be given particular attention – or lack of attention, such as being sent
|
|
|
|
|
direct to spam.
|
|
|
|
|
|
|
|
|
|
The success of face recognition, etc, suggests that this might be just a
|
|
|
|
|
problem of clever algorithms. Pile enough successful intelligence like
|
|
|
|
|
algorithms together, integrate them well, perhaps we will have sentience.
|
|
|
|
|
Analogously with the autonomous cars. They had no new algorithms, they just
|
|
|
|
|
made the old algorithms actually do something useful.
|
|
|
|
|
|
|
|
|
|
# Robot movement
|
|
|
|
|
|
|
|
|
|
Finding movement paths is full of singularities, looks to me that we force it
|
|
|
|
|
down to two and half dimensions, force the obstacles to stick figures, and
|
|
|
|
|
then find a path to the destination. Hence the mental limit on complex knot
|
|
|
|
|
problems.
|
|
|
|
|
|
|
|
|
|
Equivalently, we want to reduce the problem space to a collection of regions
|
|
|
|
|
in which pathfinding algorithms that assume continuity work, and then
|
|
|
|
|
construct graph of such regions where nodes correspond to such convex region
|
|
|
|
|
within which continuity works, and edges correspond an overlap between two
|
|
|
|
|
such convex regions. Since the space is enormous, drastic reduction is
|
|
|
|
|
needed.
|
|
|
|
|
|
|
|
|
|
In the case of robot movement we are likely to wind up with a very large
|
|
|
|
|
graph of such convex regions within which the assumption of singularity free
|
|
|
|
|
movement is correct, and because the graph is apt to be very large, finding
|
|
|
|
|
an efficient path through the graph is apt to be prohibitive, which is apt to
|
|
|
|
|
cause robot ground vehicles to crash because they cannot quickly figure out
|
|
|
|
|
the path to evade an unexpected object and makes it impractical for a robot
|
|
|
|
|
to take a can of beer from the fridge.
|
|
|
|
|
|
|
|
|
|
We therefore use the [sybil guard algorithm] to reduce the graph by treating
|
|
|
|
|
groups of highly connected vertices as a single vertex.
|
|
|
|
|
|
|
|
|
|
[sybil guard algorithm]:./sybil_attack.html
|
|
|
|
|
|
|
|
|
|
# Artificial Intelligence
|
|
|
|
|
|
|
|
|
|
[Gradient descent is not what makes Neural nets work] Comment by Bruce on
|
|
|
|
|
Echo State Networks.
|
|
|
|
|
|
|
|
|
|
[Gradient descent is not what makes Neural nets work]:https://scottlocklin.wordpress.com/2012/08/02/30-open-questions-in-physics-and-astronomy/
|
|
|
|
|
|
|
|
|
|
Echo state Network is your random neural network, which mixes a great pile of
|
|
|
|
|
randomness into your actual data, to expand it into a much larger pile of
|
|
|
|
|
data that implicitly contains all the uncorrupted original information in its
|
|
|
|
|
very great redundancy, albeit in massively mangled form. Then “You just fit
|
|
|
|
|
the output layer using linear regression. You can fit it with something more
|
|
|
|
|
complicated, but why bother; it doesn’t help.”
|
|
|
|
|
|
|
|
|
|
A generalization of “fitting the output layer using linear regression” is
|
|
|
|
|
finding groupings, recognizing categories, in the space of very large dimension
|
|
|
|
|
that consists of the output of the output layer.
|
|
|
|
|
|
|
|
|
|
Fitting by linear regression assumes we already have a list of instances that
|
|
|
|
|
are known to be type specimens of the category, assumes that the category is
|
|
|
|
|
already defined and we want an efficient way of recognizing new instances as
|
|
|
|
|
members of this category. But living creatures can form new categories,
|
|
|
|
|
without having them handed to them on a platter. We want to be able to
|
|
|
|
|
discover that a group of instances belong together.
|
|
|
|
|
|
|
|
|
|
So we generate a random neural network, identify those outputs that provide
|
|
|
|
|
useful information identifying categories, and prune those elements of the
|
|
|
|
|
network that do not contribute useful information identifying useful categories.
|
|
|
|
|
|
|
|
|
|
That it does not help tells me you are doing a dimensional reduction on the
|
|
|
|
|
outputs of an echo state network.
|
|
|
|
|
|
|
|
|
|
You are generating vectors in a space of uncountably large dimension, which
|
|
|
|
|
vectors describe probabilities, and probabilities of probabilities (Bayesian
|
|
|
|
|
regress, probability of a probability of a frequency, to correct priors, and
|
|
|
|
|
priors about priors) so that if two vectors are distant in your space, one is
|
|
|
|
|
uncorrelated with the other, and if two things are close, they are
|
|
|
|
|
correlated.
|
|
|
|
|
|
|
|
|
|
Because the space is of uncountably large dimension, the vectors are
|
|
|
|
|
impossible to manipulate directly, so you are going to perform a random
|
|
|
|
|
dimensional reduction on a set of such vectors to a space of manageably large
|
|
|
|
|
dimension.
|
|
|
|
|
|
|
|
|
|
At a higher level you eventually need to distinguish the direction of
|
|
|
|
|
causation in order to get an action network, a network that envisages action
|
|
|
|
|
to bring the external world through a causal path to an intended state, which
|
|
|
|
|
state has a causal correlation to *desire*, a network whose output state is
|
|
|
|
|
*intent*, and whose goal is desire. When the action network selects one
|
|
|
|
|
intended state of the external world rather than another, that selection is
|
|
|
|
|
*purpose*. When the action network selects one causal path rather than
|
|
|
|
|
another, that selection is *will*.
|
|
|
|
|
|
|
|
|
|
The colour red is not a wavelength, a phenomenon, but is a qualia, an
|
|
|
|
|
estimate of the likelihood that an object has a reflectance spectrum in
|
|
|
|
|
visible light peaked in that wavelength, but which estimate of probability
|
|
|
|
|
can then be used as if it were a phenomena in forming concepts, such as
|
|
|
|
|
blood, which in turn can be used to form higher level concepts, as when the
|
|
|
|
|
Old Testament says of someone who needed killing “His blood is on his own
|
|
|
|
|
head”.
|
|
|
|
|
|
|
|
|
|
Concepts are Hegelian Neural Networks: “Neurons that fire together, wire
|
|
|
|
|
together”
|
|
|
|
|
|
|
|
|
|
This is related to random dimensional reduction. You have a collection of
|
|
|
|
|
vectors in space of uncountably large dimension. Documents, emails, what you
|
|
|
|
|
see when you look in a particular direction, what you experience at a
|
|
|
|
|
particular moment. You perform a random dimensional reduction to a space of
|
|
|
|
|
manageable dimension, but large enough to probabilistically preserve
|
|
|
|
|
distances and angles in the original space – typically twenty or a hundred
|
|
|
|
|
dimensions.
|
|
|
|
|
|
|
|
|
|
By this means, you are able to calculate distances and angles in your
|
|
|
|
|
dimensionally reduced space which approximately reflect the distances and
|
|
|
|
|
angles in the original space, which was probably of dimension
|
|
|
|
|
$10^{100^{100^{100}}}$, the original space being phenomena that occurred
|
|
|
|
|
together, and collections of phenomena that occurred together that you have
|
|
|
|
|
some reason for bundling into a collection, and your randomly reduced space
|
|
|
|
|
having dimension of order that a child can count to in a reasonably short
|
|
|
|
|
time.
|
|
|
|
|
|
|
|
|
|
And now you have vectors such that you can calculate the inner product and
|
|
|
|
|
cross product on them, and perform matrix operations on them. This gives you
|
|
|
|
|
qualia. Higher level qualia are *awareness*
|
|
|
|
|
|
|
|
|
|
And, using this, you can restructure the original vectors, for example
|
|
|
|
|
structuring experiences into events, structuring things in the visual field
|
|
|
|
|
into objects, and then you can do the same process on collections of events,
|
|
|
|
|
and collections of objects that have something common.
|
|
|
|
|
|
|
|
|
|
Building a flying machine was very hard, until the Wright brothers said
|
|
|
|
|
“three axis control, pitch, yaw, and roll”
|
|
|
|
|
|
|
|
|
|
Now I have said the words “dimensional reduction of vectors in a space of
|
|
|
|
|
uncountably large dimension, desire, purpose, intent, and will”
|