Because protocols need to be changed, improved, and fixed from time to
time, it is essential to have a protocol negotiation step at the start of every networked interaction, and protocol requirements at the start of every store
and forward communication.
But we also want anyone, anywhere, to be able to introduce new
protocols, without having to coordinate with everyone else, as attempts to
coordinate the introduction of new protocols have ground to a halt, as
more and more people are involved in coordination and making decisions.
The IETF is paralyzed and moribund.
So we need a large enough address space that anyone can give his
protocol an identifier without fear of stepping on someone else’s identifier.
But this involves inefficiently long protocol identifiers, which can become
painful if we have lots of protocol negotiation, where one system asks
another system what protocols it supports. We might have lots of
protocols in lots of variants each with long names.
So our system forms a guess as to the likelihood of a protocol, and then
sends or requests enough bits to reliably identify that protocol. But this
means it must estimate probabilities from limited data. If one’s data is
limited, priors matter, and thus a Bayesian approach is required.
# Bayesian Prior
The Bayesian prior is the probability of a probability, or, if this recursion
is philosophically troubling, the probability of a frequency. We have an
urn containing a very large number of samples, from which we have taken
few or no samples. What proportion of samples in the urn will be
discovered to have property X?
Let our prior estimate of probability that the proportion of samples in
the urn that are X is ρ be $Ρ_{prior}(ρ)$
This is the probability of a probability. The probability is the sum over all the prior probabilities of probabilities.
Then our estimate of the chance $P_X$ that the first sample will be X is
$$P_X = \int_0^1 Ρ_{prior}(ρ) dρ$$
Then if we take one sample out of the urn, and it is indeed X, then we
We see the beta distributed part of the probability distribution keeps
getting smaller, and the delta distributed part of the probability keeps
getting higher.
And our estimate that the second sample will also be X is
$$\frac{8}{9}$$
After two samples, n=2, our new estimate is
Probability $\frac{1}{4}$
Probability distribution $\frac{1}{4}ρ^2+\frac{3}{4}δ(1−ρ)$
And our estimate that the third sample will also be X is $\frac{15}{16}$
By induction, after n samples, all of them members of category X, our new
estimate for one more sample is
$$1-(n+2)^{-2}=\frac{(n+3)×(n+1)}{(n+2)^2}$$
Our estimate that the run will continue forever is
$$\frac{(n+1)}{n+2}$$
Which corresponds to our intuition on the question “all men are mortal” If we find no immortals in one hundred men, we think it highly improbable that we will encounter any immortals in a billion men.
In contrast, if we assume the beta distribution, this implies that the likelihood of the run continuing forever is zero.