Information Theory

- The sum of the faces of two normal dice (six sided) when thrown is 7.
How much information does this fact supply us with?

Note: Outcomes such as (6,1) and (1,6) are to be considered as being different. - Of a group of students, 25% are not suitable for the University of
Victoria.
As the result of a selection, however, only 75% of these unsuitable students
are rejected.
50% of all students are rejected.

(a) How much information does one receive if a student, who knows that he is not suitable for UVic, hears the results of the selection?

(b) Answer the previous question if the selection is determined by tossing a coin.

(c) Compare the above results and give an explanation for the difference. - Given a discrete random variable X with values x
_{1}, ..., x_{M}, define a random variable Y by Y = f(X), where f is an arbitrary function. Prove that H(Y) ≤ H(X). Interpret this result. Under what conditions on the function f will there be equality? - A roulette wheel is subdivided into 38 numbered compartments of
various colours. The distribution of the compartments according to colour
is 2 green, 18 red and 18 black.
The experiment consists of throwing a small ball onto the rotating roulette
wheel. The event, that the ball comes to rest in one of the 38 compartments,
is equally probable for each compartment.

(a) How much information is received if the colour is revealed?

(b) How much information is received if both the colour and number are revealed?

(c) If the color is known, what is the remaining uncertainty about the number? - Show that ln(x) ≥(x-1)/x with equality only when x=1.
- A discrete memoryless source has symbols
x
_{1}, ..., x_{9}with p(x_{1}) = 1/2. Determine the upper and lower bounds on the entropy of this source. - Consider two binary random variables X and Y with joint probability distribution

p(0,0) = 1/2, p(0,1) = 1/4, p(1,0) = 1/8, and p(1,1) = 1/8.

Find the values of H(X), H(Y), H(X|Y), H(Y|X), H(XY), and I(X;Y).

- Let XYZ be a discrete joint ensemble of three random variables. Prove the validity of the following
inequalities and if true find conditions for equality:

(a) I(XY;Z) ≥ I(X;Z).

(b) H(XY|Z) ≥ H(X|Z).

(c) I(X;Z|Y) ≥ I(Z;Y|X) - I(Z;Y) + I(X;Z).

(d) H(XYZ) - H(XY) ≤ H(XZ) - H(X). - Relative entropy

(a) Consider a random variable X with three possible outcomes {a, b, c}, and two distributions on X:

Calculate H(p), H(q), D(p||q), and D(q||p), and verify in this case that D(p||q) ≠ D(q||p).symbol p(x) q(x) a 3/4 1/3 b 1/8 1/3 c 1/8 1/3

(b) Part (a) shows that D(p||q) ≠ D(q||p) in general, but there could be distributions for which equality holds. Provide an example of distributions p(x) and q(x) on a binary alphabet such that D(p||q) = D(q||p) (other than the trivial case p(x) = q(x) for which D(p||q) = D(q||p) = 0). - Course Notes Problem 1.2.
- Course Notes Problem 1.5.
- Prove that H(X
_{1}X_{2}) + H(X_{2}X_{3}) + H(X_{1}X_{3}) ≥ 2H(X_{1}X_{2}X_{3}). - Consider the ensemble of sequences of N binary digits, x
_{1}, x_{2}, ..., x_{N}. The sequences with an even number of 1's are equiprobable with probability 2^{-N+1}. The sequences with an odd number of 1's have probability zero. Find the average mutual informations I(X_{2};X_{1}), I(X_{3};X_{2}|X_{1}), ..., I(X_{N};X_{N-1}|X_{1}... X_{N-2}). - The binary erasure channel is a channel
with a simple type of noise in that transmitted bits
can be erased but cannot be received in error.
The channel input is a binary random variable X and the channel output is
a ternary random variable Y.
Let the probability of x=0 be w so the probability of x=1 is 1-w.
The three possible channel outputs are {0,e,1} where e denotes an erasure.
The probability of an erasure is p so the probability of correct reception of a symbol is 1-p.

(a) Calculate (I(X;Y) in terms of w and p.

(b) Find the value of w that maximizes I(X;Y).

- A binary source produces statistically independent bits
with p(x
_{1}) = 0.005 and p(x_{2}) = 0.995, where x_{1}denotes a 1 and x_{2}denotes a 0. These bits are taken 100 at a time and a binary codeword is provided only for every sequence of 100 bits containing 3 or fewer 1's.

(a) If the codewords all have the same length, find the smallest length required for the set of codewords.

(b) What is the probability of obtaining a source sequence for which no codeword exists?

(c) Using the Tchebycheff Inequality, provide a bound on the probability of obtaining a sequence for which no codeword is available (Equation (2.6) in the course notes may be useful). Compare this result with that in part (b). - Determine which of the following codes is uniquely decodable

(a) {0,10,11}

(b) {0,01,11}

(c) {0,01,10}

(d) {0,01}

- The source coding theorem says that for a source X with entropy H(X), it is possible to assign a binary codeword to each source symbol so that a prefix code of average length L(C) < H(X) +1 is generated. Show by example that this theorem cannot be improved upon, i.e. for any ε>0, find a source for which L(C) ≥ H(X)+1-ε.
- Consider two discrete memoryless sources.
Source 1 has six symbols with probabilities 0.3, 0.2, 0.16, 0.14, 0.1 and 0.1.
Source 2 has seven symbols with probabilities 0.3, 0.25, 0.15, 0.1, 0.1, 0.07 and 0.03.

(a) Construct a binary Huffman code and a ternary Huffman code for each source. Find the efficiency of each code. Are these codes unique in terms of the codeword lengths? - A source has five symbols with the following probabilities p(a)=0.3, p(b)=0.2, p(c)=0.2, p(d)=0.15 and p(e)=0.15. These symbols are coded into binary codewords for use on a noiseless channel. It takes 1 second to transmit a 0 and 3 seconds to transmit a 1. Find a code satisfying the prefix condition that minimizes the average time required to transmit a source symbol and calculate the minimum average time.
- For Source 1 in Problem 5, construct an optimal binary code (with minimum average codeword length) that satisfies the suffix condition. The suffix condition is that no codeword is the suffix of another codeword. Show that a code which satisfies the suffix condition is always uniquely decodable. Show that the minimum average codeword length for a code satisfying the suffix condition for a given source is the same as the average codeword length for the corresponding Huffman code.
- Let h(p) = -plog
_{2}p - (1-p)log_{2}(1-p) be the binary entropy function. Calculate the average entropy when the probability p is uniformly distributed in the range 0≤ p ≤ 1.

- Consider a random variable X with five possible outcomes {a, b, c, d, e},
and two distributions on X:
symbol p(x) q(x) a 1/2 1/2 b 1/4 1/8 c 1/8 1/8 d 1/16 1/8 e 1/16 1/8

(a) Determine the average codeword lengths for the Huffman codes corresponding to these distributions.

(b) Show that the relative entropy D(p||q) is the increased redundancy if the code for q(X) is used when the true distribution is p(X). - Consider a source with four symbols.
The symbol probabilities and two binary codes for this source are given below:

Symbol Probability Code 1 Code 2 x _{1}0.4 1 1 x _{2}0.3 01 10 x _{3}0.2 001 100 x _{4}0.1 000 1000

For each code, answer the following questions

(a) Is it a prefix code?

(b) Is it a uniquely decodable code?

(c) What is the mutual information provided about the source symbol being x_{1}given that the first bit of the codeword is 1?

(d) What is the average mutual information provided about the source symbol given the first bit of the codeword? Explain the role of the first bit in the codewords of Code 2. - A source has nine symbols x
_{1}to x_{9}with probabilities 1/4, 1/4, 1/8, 1/8, 1/16, 1/16, 1/16, 1/32 and 1/32. This source is to be coded using three symbols a, b and c.

(a) Determine the code and the efficiency of the code using the methods of Fano, Huffman and Tunstall. Can you suggest an improvement to the Fano algorithm to improve the efficiency?

(b) Determine a uniquely decodable code and the efficiency of the code for the case when two consecutive symbols in the stream of codewords cannot be aa, bb or cc. This code should have the highest possible code efficiency. Compare with the results in part (a). - Consider a source with probabilities p
_{1}≥ p_{2}≥ p_{3}≥ ... ≥ p_{N}.

(a) Prove that if p_{1}> 2/5, the corresponding codeword in a binary Huffman code has length 1.

(b) Prove that if p_{1}< 1/3, the corresponding codeword in a binary Huffman code has length ≥ 2. - A source has symbols x
_{1}, x_{2}, x_{3}and x_{4}with probabilities 0.5, 0.3, 0.15 and 0.05, respectively. This source is to be encoded using an arithmetic code with the intervals assigned so that the larger probabilities are on the left. Find the shortest codeword corresponding to x_{2}x_{1}x_{1}x_{4}x_{3}x_{1}x_{3}Compare the efficiency of this code with that of the corresponding Huffman code. - A source has symbols x
_{1}, x_{2}and x_{3}with probabilities 0.8, 0.02 and 0.18, respectively. Decode the arithmetic code sequence 110001 for these probabilities. Assume the symbols are arranged in numeric order (x_{1},x_{2},x_{3}) in the range [0,1) and the number of encoded symbols is 7. Compare the length of this codeword with the expected codeword length. - Using the Lempel-Ziv algorithm in the course notes, encode the sequence 01101110011010101111000111

- A DMC has transition probability matrix

| .985 .005 |

| .010 .010 |

| .005 .985 |

(a) Determine the capacity of this channel.

(b) The middle row of the transition probability matrix can be considered the probabilities of an erasure. Suppose an erasure is equally likely to be correct or in error. What is the capacity of the corresponding channel without erasures?

(c) Suppose that the errors are all erasures. What is the capacity of the corresponding channel?

(d) compare the capacities of the three channels. - Course Notes Problem 3.3, but use N=15.
- Course Notes Problem 3.8
- Course Notes Problem 3.9
- Find the channel capacity of the additive noise DMC defined by y=x+z where X and Y are the input and output random variables, respectively, Assume p(z=0) = p(z=a) = 1/2, and Z is independent of X. The input is binary (X={0,1}) and a can be any real number.
- Consider a keyboard with 26 keys A,B, ..., Z.

(a) If pushing a key results in printing the associated letter, what is the capacity C in bits?

(b) Suppose that pushing a key results in printing the associated letter or the next letter with equal probability so that A -> A or B, ..., Z -> Z or A. What is the capacity?

(c) What is the highest rate code with blocklength one that you can find that achieves zero probability of error for the channel in part (b)?