Information Theory

- The sum of the faces of two normal dice (six sided) when thrown is 7.
How much information does this fact supply us with?

Note: Outcomes such as (6,1) and (1,6) are to be considered as being different. - Of a group of students, 25% are not suitable for the University of
Victoria.
As the result of a selection, however, only 75% of these unsuitable students
are rejected.
50% of all students are rejected.

(a) How much information does one receive if a student, who knows that he is not suitable for UVic, hears the results of the selection?

(b) Answer the previous question if the selection is determined by tossing a coin.

(c) Compare the above results and give an explanation for the difference. - Given a discrete random variable X with values x
_{1}, ..., x_{M}, define a random variable Y by Y = f(X), where f is an arbitrary function. Prove that H(Y) ≤ H(X). Interpret this result. Under what conditions on the function f will there be equality? - A roulette wheel is subdivided into 38 numbered compartments of
various colours. The distribution of the compartments according to colour
is 2 green, 18 red and 18 black.
The experiment consists of throwing a small ball onto the rotating roulette
wheel. The event, that the ball comes to rest in one of the 38 compartments,
is equally probable for each compartment.

(a) How much information is received if the colour is revealed?

(b) How much information is received if both the colour and number are revealed?

(c) If the color is known, what is the remaining uncertainty about the number? - Show that ln(x) ≥(x-1)/x with equality only when x=1.
- A discrete memoryless source has symbols
x
_{1}, ..., x_{9}with p(x_{1}) = 1/2. Determine the upper and lower bounds on the entropy of this source. - Consider two binary random variables X and Y with joint probability distribution

p(0,0) = 1/2, p(0,1) = 1/4, p(1,0) = 1/8, and p(1,1) = 1/8.

Find the values of H(X), H(Y), H(X|Y), H(Y|X), H(XY), and I(X;Y).

- Let XYZ be a discrete joint ensemble of three random variables. Prove the validity of the following
inequalities and if true find conditions for equality:

(a) I(XY;Z) ≥ I(X;Z).

(b) H(XY|Z) ≥ H(X|Z).

(c) I(X;Z|Y) ≥ I(Z;Y|X) - I(Z;Y) + I(X;Z).

(d) H(XYZ) - H(XY) ≤ H(XZ) - H(X). - Relative entropy

(a) Consider a random variable X with three possible outcomes {a, b, c}, and two distributions on X:

Calculate H(p), H(q), D(p||q), and D(q||p), and verify in this case that D(p||q) ≠ D(q||p).symbol p(x) q(x) a 3/4 1/3 b 1/8 1/3 c 1/8 1/3

(b) Part (a) shows that D(p||q) ≠ D(q||p) in general, but there could be distributions for which equality holds. Provide an example of distributions p(x) and q(x) on a binary alphabet such that D(p||q) = D(q||p) (other than the trivial case p(x) = q(x) for which D(p||q) = D(q||p) = 0). - Course Notes Problem 1.2.
- Course Notes Problem 1.5.
- Prove that H(X
_{1}X_{2}) + H(X_{2}X_{3}) + H(X_{1}X_{3}) ≥ 2H(X_{1}X_{2}X_{3}). - Consider the ensemble of sequences of N binary digits, x
_{1}, x_{2}, ..., x_{N}. The sequences with an even number of 1's are equiprobable with probability 2^{-N+1}. The sequences with an odd number of 1's have probability zero. Find the average mutual informations I(X_{2};X_{1}), I(X_{3};X_{2}|X_{1}), ..., I(X_{N};X_{N-1}|X_{1}... X_{N-2}). - The binary erasure channel is a channel
with a simple type of noise in that transmitted bits
can be erased but cannot be received in error.
The channel input is a binary random variable X and the channel output is
a ternary random variable Y.
Let the probability of x=0 be w so the probability of x=1 is 1-w.
The three possible channel outputs are {0,e,1} where e denotes an erasure.
The probability of an erasure is p so the probability of correct reception of a symbol is 1-p.

(a) Calculate (I(X;Y) in terms of w and p.

(b) Find the value of w that maximizes I(X;Y).

- A binary source produces statistically independent bits
with p(x
_{1}) = 0.005 and p(x_{2}) = 0.995, where x_{1}denotes a 1 and x_{2}denotes a 0. These bits are taken 100 at a time and a binary codeword is provided only for every sequence of 100 bits containing 3 or fewer 1's.

(a) If the codewords all have the same length, find the smallest length required for the set of codewords.

(b) What is the probability of obtaining a source sequence for which no codeword exists?

(c) Using the Tchebycheff Inequality, provide a bound on the probability of obtaining a sequence for which no codeword is available (Equation (2.6) in the course notes may be useful). Compare this result with that in part (b). - Determine which of the following codes is uniquely decodable

(a) {0,10,11}

(b) {0,01,11}

(c) {0,01,10}

(d) {0,01}

- The source coding theorem says that for a source X with entropy H(X), it is possible to assign a binary codeword to each source symbol so that a prefix code of average length L(C) < H(X) +1 is generated. Show by example that this theorem cannot be improved upon, i.e. for any ε>0, find a source for which L(C) ≥ H(X)+1-ε.
- Consider two discrete memoryless sources.
Source 1 has six symbols with probabilities 0.3, 0.2, 0.16, 0.14, 0.1 and 0.1.
Source 2 has seven symbols with probabilities 0.3, 0.25, 0.15, 0.1, 0.1, 0.07 and 0.03.

(a) Construct a binary Huffman code and a ternary Huffman code for each source. Find the efficiency of each code. Are these codes unique in terms of the codeword lengths? - A source has five symbols with the following probabilities p(a)=0.3, p(b)=0.2, p(c)=0.2, p(d)=0.15 and p(e)=0.15. These symbols are coded into binary codewords for use on a noiseless channel. It takes 1 second to transmit a 0 and 3 seconds to transmit a 1. Find a code satisfying the prefix condition that minimizes the average time required to transmit a source symbol and calculate the minimum average time.
- For Source 1 in Problem 5, construct an optimal binary code (with minimum average codeword length) that satisfies the suffix condition. The suffix condition is that no codeword is the suffix of another codeword. Show that a code which satisfies the suffix condition is always uniquely decodable. Show that the minimum average codeword length for a code satisfying the suffix condition for a given source is the same as the average codeword length for the corresponding Huffman code.
- Let h(p) = -plog
_{2}p - (1-p)log_{2}(1-p) be the binary entropy function. Calculate the average entropy when the probability p is uniformly distributed in the range 0≤ p ≤ 1.