--- title: Recognizing categories, and recognizing particulars as forming a category # katex --- This is, of course, a deep unsolved problem in philosophy. However, it seems to be soluble as computer algorithm. Programs that do this, ought to look conscious. There are a lot of programs solving things that I though were AI hard, for example recognizing pornography, recognizing faces in images, predicting what music, or what books, or what movies, a particular customer might like. We have clustering algorithms that work in on points in spaces of reasonably small dimension. However, instances are sparse vectors in space of unenumerably large dimension. Consider, for example, the problem of grouping like documents to like, for spam filtering. Suppose the properties of the document are all substrings of the document of twenty words or less and 200 characters or less. In that case, there are as many dimensions as there are two hundred character strings. # Dimensional reduction The combinatorial explosion occurs because we have taken the wrong approach to reducing problems that originate in the physical world of very large dimension, large because each quality of the objects involved or potentially involved is a dimension. The cool magic trick that makes this manageable is dimensional reduction. Johnson and Lindenstrauss discovered in the early 1980s that if one has $O(2^n)$ points in a space of very large dimension, a random projection onto a space of dimension $O(n)$ does not much affect distances and angles. Achlioptas found that this is true for the not very random mapping wherein elements of the matrix mapping the large space to the smaller space have the form $1$, with probability $\frac{1}{6}$, $0$ with probability $\frac{4}{6}$, $-1$ with probability $\frac{1}{6}$, though a sparse matrix is apt to distort a sparse vector There exists a set of points of size $m$ that needs dimension $$\displaystyle{O(\frac{\log(m)}{ε^2})}$$ in order to preserve the distances between all pairs of points within a factor of $1±ε$ The time to find the nearest neighbour is logarithmic in the number of points, but exponential in the dimension of the space. So we do one pass with rather large epsilon, and another pass, using an algorithm proportional to the small number of candidate neighbours times the dimensionality with a small number of candidate neighbours found in the first pass. So in a space of unenumerably large dimension, such as the set of substrings of an email, or perhaps substrings of bounded length with bounds at spaces, carriage returns, and punctuation, we deterministically hash each substring, and use the hash to deterministically assign a mapping between the vector corresponding to this substring, and a vector in the reduced space. The optimal instance recognition algorithm, for normally distributed attributes, and for already existent, already known categories, is Mahalanobis distance Is not the spam characteristic of an email just its $T.(S-G)$, where $T$ is the vector of the email, and $S$ and $G$ are the average vectors of good email and spam email? Variance works, instead of probability – Mahalanobis distance, but this is most reasonable for things that have reasonable dimension, like attributing skulls to races, while dimensional reduction is most useful in spaces of unenumerably large dimension, where distributions are necessarily non normal. But variance is, approximately, the log of probability, so Mahalanobis is more or less Bayes filtering. So we can reasonably reduce each email into twenty questions space, or, just to be on the safe side, forty questions space. (Will have to test how many dimensions empirically retain angles and distances) We then, in the reduced space, find natural groupings, a natural grouping being an elliptic region in high dimensional space where the density is anomalously large, or rather a normal distribution in high dimensional space such that assigning a particular email to a particular normal distribution dramatically reduces the entropy. We label each such natural grouping with the statistically improbable phrase that best distinguishes members of the grouping from all other such groupings. The end user can then issue rules that mails belonging to certain groupings be given particular attention – or lack of attention, such as being sent direct to spam. The success of face recognition, etc, suggests that this might be just a problem of clever algorithms. Pile enough successful intelligence like algorithms together, integrate them well, perhaps we will have sentience. Analogously with the autonomous cars. They had no new algorithms, they just made the old algorithms actually do something useful. # Robot movement Finding movement paths is full of singularities, looks to me that we force it down to two and half dimensions, force the obstacles to stick figures, and then find a path to the destination. Hence the mental limit on complex knot problems. Equivalently, we want to reduce the problem space to a collection of regions in which pathfinding algorithms that assume continuity work, and then construct graph of such regions where nodes correspond to such convex region within which continuity works, and edges correspond an overlap between two such convex regions. Since the space is enormous, drastic reduction is needed. In the case of robot movement we are likely to wind up with a very large graph of such convex regions within which the assumption of singularity free movement is correct, and because the graph is apt to be very large, finding an efficient path through the graph is apt to be prohibitive, which is apt to cause robot ground vehicles to crash because they cannot quickly figure out the path to evade an unexpected object and makes it impractical for a robot to take a can of beer from the fridge. We therefore use the [sybil guard algorithm] to reduce the graph by treating groups of highly connected vertices as a single vertex. [sybil guard algorithm]:./sybil_attack.html # Artificial Intelligence [Gradient descent is not what makes Neural nets work] Comment by Bruce on Echo State Networks. [Gradient descent is not what makes Neural nets work]:https://scottlocklin.wordpress.com/2012/08/02/30-open-questions-in-physics-and-astronomy/ Echo state Network is your random neural network, which mixes a great pile of randomness into your actual data, to expand it into a much larger pile of data that implicitly contains all the uncorrupted original information in its very great redundancy, albeit in massively mangled form. Then “You just fit the output layer using linear regression. You can fit it with something more complicated, but why bother; it doesn’t help.” A generalization of “fitting the output layer using linear regression” is finding groupings, recognizing categories, in the space of very large dimension that consists of the output of the output layer. Fitting by linear regression assumes we already have a list of instances that are known to be type specimens of the category, assumes that the category is already defined and we want an efficient way of recognizing new instances as members of this category. But living creatures can form new categories, without having them handed to them on a platter. We want to be able to discover that a group of instances belong together. So we generate a random neural network, identify those outputs that provide useful information identifying categories, and prune those elements of the network that do not contribute useful information identifying useful categories. That it does not help tells me you are doing a dimensional reduction on the outputs of an echo state network. You are generating vectors in a space of uncountably large dimension, which vectors describe probabilities, and probabilities of probabilities (Bayesian regress, probability of a probability of a frequency, to correct priors, and priors about priors) so that if two vectors are distant in your space, one is uncorrelated with the other, and if two things are close, they are correlated. Because the space is of uncountably large dimension, the vectors are impossible to manipulate directly, so you are going to perform a random dimensional reduction on a set of such vectors to a space of manageably large dimension. At a higher level you eventually need to distinguish the direction of causation in order to get an action network, a network that envisages action to bring the external world through a causal path to an intended state, which state has a causal correlation to *desire*, a network whose output state is *intent*, and whose goal is desire. When the action network selects one intended state of the external world rather than another, that selection is *purpose*. When the action network selects one causal path rather than another, that selection is *will*. The colour red is not a wavelength, a phenomenon, but is a qualia, an estimate of the likelihood that an object has a reflectance spectrum in visible light peaked in that wavelength, but which estimate of probability can then be used as if it were a phenomena in forming concepts, such as blood, which in turn can be used to form higher level concepts, as when the Old Testament says of someone who needed killing “His blood is on his own head”. Concepts are Hegelian Neural Networks: “Neurons that fire together, wire together” This is related to random dimensional reduction. You have a collection of vectors in space of uncountably large dimension. Documents, emails, what you see when you look in a particular direction, what you experience at a particular moment. You perform a random dimensional reduction to a space of manageable dimension, but large enough to probabilistically preserve distances and angles in the original space – typically twenty or a hundred dimensions. By this means, you are able to calculate distances and angles in your dimensionally reduced space which approximately reflect the distances and angles in the original space, which was probably of dimension $10^{100^{100^{100}}}$, the original space being phenomena that occurred together, and collections of phenomena that occurred together that you have some reason for bundling into a collection, and your randomly reduced space having dimension of order that a child can count to in a reasonably short time. And now you have vectors such that you can calculate the inner product and cross product on them, and perform matrix operations on them. This gives you qualia. Higher level qualia are *awareness* And, using this, you can restructure the original vectors, for example structuring experiences into events, structuring things in the visual field into objects, and then you can do the same process on collections of events, and collections of objects that have something common. Building a flying machine was very hard, until the Wright brothers said “three axis control, pitch, yaw, and roll” Now I have said the words “dimensional reduction of vectors in a space of uncountably large dimension, desire, purpose, intent, and will”