Satyadev Nandakumar
Wed Sep 2 08:08:40 2015
This is a brief summary of Shannon's view of a bit and Turing's view of a ban. Both were conceived as units of information, even though the second is now obscure.
The name "bit" was suggested by John Tukey to Shannon (see Poundstone Ch.2). Tukey thought of a bit as a "binary digit". This is what we normally think of a bit as. Even though it is equivalent to Shannon's notion, Shannon had a radically different definition.
Shannon thought a bit was the minimum amount of information needed to distinguish between two equally likely outcomes.
So if you have two independent events, each having two equally likely outcomes, in order to distinguish between the possible outcomes of the joint experiment, you would need two bits. If you have three independent events each with two equally likely outcomes, then in order to distinguish between the results of the experiment, you need 3 bits, and so on.
It emerges that "the minimum amount of information" is the logarithm of the reciprocal of the probability of the combined event, now called the entropy of the event. The number of bits required, is thus, the entropy. This viewpoint of the definition of a bit also makes Shannon entropy less mysterious, and explains why low probability events must have greater entropy.
Thinking along these lines, it is clear that Shannon's bit is a unit of frequentist probability.
It is now less known that Turing had a Bayesian unit of information, which he termed the ban. The concept is explained in his paper, "The application of probability theory to cryptography".