220 lines
11 KiB
Markdown
220 lines
11 KiB
Markdown
|
---
|
|||
|
title: Recognizing categories, and recognizing particulars as forming a category
|
|||
|
# katex
|
|||
|
---
|
|||
|
This is, of course, a deep unsolved problem in philosophy.
|
|||
|
|
|||
|
However, it seems to be soluble as computer algorithm. Programs that do
|
|||
|
this, ought to look conscious.
|
|||
|
|
|||
|
There are a lot of programs solving things that I though were AI hard, for
|
|||
|
example recognizing pornography, recognizing faces in images, predicting what
|
|||
|
music, or what books, or what movies, a particular customer might like.
|
|||
|
|
|||
|
We have clustering algorithms that work in on points in spaces of reasonably
|
|||
|
small dimension. However, instances are sparse vectors in space of
|
|||
|
unenumerably large dimension.
|
|||
|
|
|||
|
Consider, for example, the problem of grouping like documents to like, for
|
|||
|
spam filtering. Suppose the properties of the document are all substrings of
|
|||
|
the document of twenty words or less and 200 characters or less. In that case,
|
|||
|
there are as many dimensions as there are two hundred character strings.
|
|||
|
|
|||
|
# Dimensional reduction
|
|||
|
|
|||
|
The combinatorial explosion occurs because we have taken the wrong approach
|
|||
|
to reducing problems that originate in the physical world of very large
|
|||
|
dimension, large because each quality of the objects involved or potentially
|
|||
|
involved is a dimension.
|
|||
|
|
|||
|
The cool magic trick that makes this manageable is dimensional reduction.
|
|||
|
Johnson and Lindenstrauss discovered in the early 1980s that if one has
|
|||
|
$O(2^n)$ points in a space of very large dimension, a random projection onto a
|
|||
|
space of dimension $O(n)$ does not much affect distances and angles.
|
|||
|
|
|||
|
Achlioptas found that this is true for the not very random mapping wherein
|
|||
|
elements of the matrix mapping the large space to the smaller space have the
|
|||
|
form $1$, with probability $\frac{1}{6}$, $0$ with probability $\frac{4}{6}$,
|
|||
|
$-1$ with probability $\frac{1}{6}$, though a sparse matrix is apt to
|
|||
|
distort a sparse vector
|
|||
|
|
|||
|
There exists a set of points of size $m$ that needs dimension
|
|||
|
$$\displaystyle{O(\frac{\log(m)}{ε^2})}$$
|
|||
|
in order to preserve the distances
|
|||
|
between all pairs of points within a factor of $1±ε$
|
|||
|
|
|||
|
The time to find the nearest neighbour is logarithmic in the number of points,
|
|||
|
but exponential in the dimension of the space. So we do one pass with rather
|
|||
|
large epsilon, and another pass, using an algorithm proportional to the small
|
|||
|
number of candidate neighbours times the dimensionality with a small number
|
|||
|
of candidate neighbours found in the first pass.
|
|||
|
|
|||
|
So in a space of unenumerably large dimension, such as the set of substrings
|
|||
|
of an email, or perhaps substrings of bounded length with bounds at spaces,
|
|||
|
carriage returns, and punctuation, we deterministically hash each substring,
|
|||
|
and use the hash to deterministically assign a mapping between the vector
|
|||
|
corresponding to this substring, and a vector in the reduced space.
|
|||
|
|
|||
|
The optimal instance recognition algorithm, for normally distributed
|
|||
|
attributes, and for already existent, already known categories, is Mahalanobis
|
|||
|
distance
|
|||
|
|
|||
|
Is not the spam characteristic of an email just its $T.(S-G)$, where $T$ is
|
|||
|
the vector of the email, and $S$ and $G$ are the average vectors of good
|
|||
|
email and spam email?
|
|||
|
|
|||
|
Variance works, instead of probability – Mahalanobis distance, but this is
|
|||
|
most reasonable for things that have reasonable dimension, like attributing
|
|||
|
skulls to races, while dimensional reduction is most useful in spaces of
|
|||
|
unenumerably large dimension, where distributions are necessarily non
|
|||
|
normal.
|
|||
|
|
|||
|
But variance is, approximately, the log of probability, so Mahalanobis is
|
|||
|
more or less Bayes filtering.
|
|||
|
|
|||
|
So we can reasonably reduce each email into twenty questions space, or, just
|
|||
|
to be on the safe side, forty questions space. (Will have to test how many
|
|||
|
dimensions empirically retain angles and distances)
|
|||
|
|
|||
|
We then, in the reduced space, find natural groupings, a natural grouping
|
|||
|
being an elliptic region in high dimensional space where the density is
|
|||
|
anomalously large, or rather a normal distribution in high dimensional space
|
|||
|
such that assigning a particular email to a particular normal distribution
|
|||
|
dramatically reduces the entropy.
|
|||
|
|
|||
|
We label each such natural grouping with the statistically improbable phrase
|
|||
|
that best distinguishes members of the grouping from all other such groupings.
|
|||
|
|
|||
|
The end user can then issue rules that mails belonging to certain groupings
|
|||
|
be given particular attention – or lack of attention, such as being sent
|
|||
|
direct to spam.
|
|||
|
|
|||
|
The success of face recognition, etc, suggests that this might be just a
|
|||
|
problem of clever algorithms. Pile enough successful intelligence like
|
|||
|
algorithms together, integrate them well, perhaps we will have sentience.
|
|||
|
Analogously with the autonomous cars. They had no new algorithms, they just
|
|||
|
made the old algorithms actually do something useful.
|
|||
|
|
|||
|
# Robot movement
|
|||
|
|
|||
|
Finding movement paths is full of singularities, looks to me that we force it
|
|||
|
down to two and half dimensions, force the obstacles to stick figures, and
|
|||
|
then find a path to the destination. Hence the mental limit on complex knot
|
|||
|
problems.
|
|||
|
|
|||
|
Equivalently, we want to reduce the problem space to a collection of regions
|
|||
|
in which pathfinding algorithms that assume continuity work, and then
|
|||
|
construct graph of such regions where nodes correspond to such convex region
|
|||
|
within which continuity works, and edges correspond an overlap between two
|
|||
|
such convex regions. Since the space is enormous, drastic reduction is
|
|||
|
needed.
|
|||
|
|
|||
|
In the case of robot movement we are likely to wind up with a very large
|
|||
|
graph of such convex regions within which the assumption of singularity free
|
|||
|
movement is correct, and because the graph is apt to be very large, finding
|
|||
|
an efficient path through the graph is apt to be prohibitive, which is apt to
|
|||
|
cause robot ground vehicles to crash because they cannot quickly figure out
|
|||
|
the path to evade an unexpected object and makes it impractical for a robot
|
|||
|
to take a can of beer from the fridge.
|
|||
|
|
|||
|
We therefore use the [sybil guard algorithm] to reduce the graph by treating
|
|||
|
groups of highly connected vertices as a single vertex.
|
|||
|
|
|||
|
[sybil guard algorithm]:./sybil_attack.html
|
|||
|
|
|||
|
# Artificial Intelligence
|
|||
|
|
|||
|
[Gradient descent is not what makes Neural nets work] Comment by Bruce on
|
|||
|
Echo State Networks.
|
|||
|
|
|||
|
[Gradient descent is not what makes Neural nets work]:https://scottlocklin.wordpress.com/2012/08/02/30-open-questions-in-physics-and-astronomy/
|
|||
|
|
|||
|
Echo state Network is your random neural network, which mixes a great pile of
|
|||
|
randomness into your actual data, to expand it into a much larger pile of
|
|||
|
data that implicitly contains all the uncorrupted original information in its
|
|||
|
very great redundancy, albeit in massively mangled form. Then “You just fit
|
|||
|
the output layer using linear regression. You can fit it with something more
|
|||
|
complicated, but why bother; it doesn’t help.”
|
|||
|
|
|||
|
A generalization of “fitting the output layer using linear regression” is
|
|||
|
finding groupings, recognizing categories, in the space of very large dimension
|
|||
|
that consists of the output of the output layer.
|
|||
|
|
|||
|
Fitting by linear regression assumes we already have a list of instances that
|
|||
|
are known to be type specimens of the category, assumes that the category is
|
|||
|
already defined and we want an efficient way of recognizing new instances as
|
|||
|
members of this category. But living creatures can form new categories,
|
|||
|
without having them handed to them on a platter. We want to be able to
|
|||
|
discover that a group of instances belong together.
|
|||
|
|
|||
|
So we generate a random neural network, identify those outputs that provide
|
|||
|
useful information identifying categories, and prune those elements of the
|
|||
|
network that do not contribute useful information identifying useful categories.
|
|||
|
|
|||
|
That it does not help tells me you are doing a dimensional reduction on the
|
|||
|
outputs of an echo state network.
|
|||
|
|
|||
|
You are generating vectors in a space of uncountably large dimension, which
|
|||
|
vectors describe probabilities, and probabilities of probabilities (Bayesian
|
|||
|
regress, probability of a probability of a frequency, to correct priors, and
|
|||
|
priors about priors) so that if two vectors are distant in your space, one is
|
|||
|
uncorrelated with the other, and if two things are close, they are
|
|||
|
correlated.
|
|||
|
|
|||
|
Because the space is of uncountably large dimension, the vectors are
|
|||
|
impossible to manipulate directly, so you are going to perform a random
|
|||
|
dimensional reduction on a set of such vectors to a space of manageably large
|
|||
|
dimension.
|
|||
|
|
|||
|
At a higher level you eventually need to distinguish the direction of
|
|||
|
causation in order to get an action network, a network that envisages action
|
|||
|
to bring the external world through a causal path to an intended state, which
|
|||
|
state has a causal correlation to *desire*, a network whose output state is
|
|||
|
*intent*, and whose goal is desire. When the action network selects one
|
|||
|
intended state of the external world rather than another, that selection is
|
|||
|
*purpose*. When the action network selects one causal path rather than
|
|||
|
another, that selection is *will*.
|
|||
|
|
|||
|
The colour red is not a wavelength, a phenomenon, but is a qualia, an
|
|||
|
estimate of the likelihood that an object has a reflectance spectrum in
|
|||
|
visible light peaked in that wavelength, but which estimate of probability
|
|||
|
can then be used as if it were a phenomena in forming concepts, such as
|
|||
|
blood, which in turn can be used to form higher level concepts, as when the
|
|||
|
Old Testament says of someone who needed killing “His blood is on his own
|
|||
|
head”.
|
|||
|
|
|||
|
Concepts are Hegelian Neural Networks: “Neurons that fire together, wire
|
|||
|
together”
|
|||
|
|
|||
|
This is related to random dimensional reduction. You have a collection of
|
|||
|
vectors in space of uncountably large dimension. Documents, emails, what you
|
|||
|
see when you look in a particular direction, what you experience at a
|
|||
|
particular moment. You perform a random dimensional reduction to a space of
|
|||
|
manageable dimension, but large enough to probabilistically preserve
|
|||
|
distances and angles in the original space – typically twenty or a hundred
|
|||
|
dimensions.
|
|||
|
|
|||
|
By this means, you are able to calculate distances and angles in your
|
|||
|
dimensionally reduced space which approximately reflect the distances and
|
|||
|
angles in the original space, which was probably of dimension
|
|||
|
$10^{100^{100^{100}}}$, the original space being phenomena that occurred
|
|||
|
together, and collections of phenomena that occurred together that you have
|
|||
|
some reason for bundling into a collection, and your randomly reduced space
|
|||
|
having dimension of order that a child can count to in a reasonably short
|
|||
|
time.
|
|||
|
|
|||
|
And now you have vectors such that you can calculate the inner product and
|
|||
|
cross product on them, and perform matrix operations on them. This gives you
|
|||
|
qualia. Higher level qualia are *awareness*
|
|||
|
|
|||
|
And, using this, you can restructure the original vectors, for example
|
|||
|
structuring experiences into events, structuring things in the visual field
|
|||
|
into objects, and then you can do the same process on collections of events,
|
|||
|
and collections of objects that have something common.
|
|||
|
|
|||
|
Building a flying machine was very hard, until the Wright brothers said
|
|||
|
“three axis control, pitch, yaw, and roll”
|
|||
|
|
|||
|
Now I have said the words “dimensional reduction of vectors in a space of
|
|||
|
uncountably large dimension, desire, purpose, intent, and will”
|