Information Theory

- Consider two binary random variables X and Y with joint probability distribution p(x,y) given by

p(0,0) = 1/2, p(0,1) = 1/4, p(1,0) = 0, and p(1,1) = 1/4.

Find the values of H(X), H(Y), H(X|Y), H(Y|X), H(XY), and I(X;Y). - A discrete memoryless source has symbols
x
_{1}, ..., x_{6}with p(x_{1}) = 1/2 and p(x_{2}) = 1/4. Determine the upper and lower bounds on the entropy of this source. -
A jar contains 5 black balls and 10 white balls.
Experiment X involves randomly drawing a ball out of the jar.
Experiment Y involves randomly drawing a ball with the ball drawn in experiment X not replaced in the jar.
One is interested in the colour of the drawn ball.

(a) How much uncertainty does experiment X contain?

(b) What is the uncertainty in experiment Y given that the first ball is black?

(c) What is the uncertainty in experiment Y given that the first ball is white?

(d) How much uncertainty does experiment Y contain? - Let X be a random variable whose entropy H(X) is 8 bits. Suppose that
Y(X) is a deterministic function that takes on a different value for each
value of X.

(a) What is H(Y)?

(b) What is H(Y|X)?

(c) What is H(X|Y)?

(d) What is I(X;Y)?

(e) Suppose now that the deterministic function Y(X) is not invertible, so that different values of X may correspond to the same value of Y(X). In this case, what can be said about H(Y), and also about H(X|Y)? - A Division Series is a five game baseball series held in October that ends as soon as a team wins 3 games. Define a random variable X that represents the outcome of the series between two teams A and B. Three possible values of X are BBB, ABAA, and BBAAA (many more outcomes can occur). Let Y be the number of games played which range from 3 to 5. Assuming teams A and B are equally matched and that the games are independent, calculate H(X), H(Y), H(Y|X) and H(X|Y).
- In a country, 25% of the people are blond and 75% of all blond people have blue eyes.
In addition, 50% of the people have blue eyes.
How much information is received in each of the following cases:

(a) if we know that a person is blond and we are told the colour (blue/not blue) of their eyes,

(b) if we know that a person has blue eyes and we are told the colour (blond/not blond) of their hair,

(c) if we are told both the colour of their hair and that of their eyes.

- Let X and Y be random variables which are identically distributed but
not necessarily independent.
Define the correlation between X and Y as ρ = 1 - H(Y|X)/H(X).

(a) Show that ρ = I(X;Y)/H(X).

(b) Show that 0 ≤ ρ ≤ 1.

(c) When is ρ = 0?

(d) When is ρ = 1? - Let XYZ be a discrete joint ensemble of three random variables. Prove the validity of the following
inequalities and if true find conditions for equality:

(a) I(XY;Z) ≥ I(Y;Z).

(b) H(XY|Z) ≥ H(Y|Z).

(c) I(X;Z|Y) ≥ I(Y;Z|X) + I(Z;X) - I(Y;Z).

(d) H(XYZ) - H(YZ) ≤ H(XY) - H(Y). - The following equiprobable messages are used to send data over a binary symmetric channel with crossover probability p:

M_{1}= 0000 M_{2}= 0011 M_{3}= 0101 M_{4}= 0110

M_{5}= 1001 M_{6}= 1010 M_{7}= 1100 M_{8}= 1111

If the sequence Y=0000 is received,

(a) how much information is provided about M_{1}by the first digit received?

(b) how much additional information about M_{1}is provided by the second, then the third, and finally the fourth received digit? - Course Notes Problem 1.3
- Course Notes Problem 1.7
- Relative entropy

(a) Consider a random variable X with three possible outcomes {a, b, c}, and two distributions on X:

symbol p(x) q(x) a 1/2 1/3 b 1/4 1/3 c 1/4 1/3

Calculate H(p), H(q), D(p||q), and D(q||p), and verify in this case that D(p||q) ≠ D(q||p).

(b) Part (a) shows that D(p||q) ≠ D(q||p) in general, but there could be distributions for which equality holds. Provide an example of distributions p(x) and q(x) on a binary alphabet such that D(p||q) = D(q||p) (other than the trivial case p(x) = q(x) for which D(p||q) = D(q||p) = 0).

- A source S has six symbols with probabilities p
_{1}to p_{6}. The probabilities p_{i}are ordered such that p_{1}= p_{2}= p_{3}> p_{4}> p_{5}= p_{6}. Construct a binary Huffman code for this source. - A source has 9 different symbols u
_{1}to u_{9}with respective probabilities 1/4, 1/4, 1/8, 1/8, 1/16, 1/16, 1/16, 1/32 and 1/32. This source must be coded using an alphabet of three symbols a, b and c.

(a) Determine the code and the efficiency of the code using the methods of Fano, Shannon, Huffman and Tunstall. Compare these efficiencies.

(b) Determine a code and the efficiency of the code for the case when the symbol c may never be followed by another c in the encoder output. This must be a prefix code, and have the highest possible efficiency. - A source outputs symbols x
_{1}to x_{7}with probabilities p(x_{1}) = p(x_{2}) = 1/3, p(x_{3}) = p(x_{4}) = 1/9, and p(x_{5}) = p(x_{6}) = p(x_{7}) = 1/27.

(a) Determine a binary Huffman code for this source and find the efficiency.

(b) Determine a ternary Huffman code for this source and find the efficiency.

(c) Which code is preferred based on the efficiency?

(d) If it costs $1.80 to send a symbol over a binary channel and $2.70 to send a symbol over a ternary channel, which code is preferred based on cost? - Is the length of the binary Huffman codeword for a symbol x
_{k}with probability p(x_{k}) always less than or equal to l_{k}= ⌈ -log_{2}p(x_{k}) ⌉ ? - A source generates zeros and ones with probabilities p(0) = 0.8 and p(1) = 0.2.
The sequence of symbols are encoded with the code symbols a, b, c, d and e
according to the follwing table.

message code symbol 1 a 01 b 001 c 0001 d 0000 e

(a) Is this code uniquely decodable? Is it a prefix code?

(b) What is the average amount of information obtained per code symbol?

(c) What is the efficiency of this code? - Which of the following codes cannot be a binary Huffman code for any probability distribution?

(a) C_{1}= {0,10,11}

(b) C_{2}= {00,01,10,110}

(c) C_{3}= {01,10}

(d) C_{4}= {1,10,100,1000,0000}

- For a codeword alphabet J, let p(X) be the true pdf with compact code C and q(X) be the estimated pdf with compact code Ĉ.
Show that
H(p(X)) + D(p(X)||q(X))log≤ L(Ĉ) <
_{b}JH(p(X)) + D(p(X)||q(X))log+ 1._{b}J - A source has symbols x
_{1}, x_{2}and x_{3}with probabilities 0.7, 0.2 and 0.1, respectively. Encode this source using an arithmetic code with the intervals assigned so that the larger probabilities are on the left. Find the smallest possible binary codeword corresponding to the symbol sequence x_{1}x_{1}x_{2}x_{3}x_{2}x_{1}x_{3}x_{1}. Compare this with the expected codeword length based on the entropy. - A source has symbols x
_{1}, x_{2}and x_{3}with probabilities 0.2, 0.3 and 0.5, respectively. Decode the arithmetic code sequence .63215699 for these probabilities. Assume the symbols are arranged in numeric order (x_{1},x_{2},x_{3}) in the range [0,1), and the number of encoded symbols is 16. - Using the Lempel-Ziv algorithm in the course notes, encode the sequence 000101110010100101
- A discrete memoryless channel (DMC) has transition probability matrix

| .978 .002 |

| .020 .020 |

| .002 .978 |

(a) Determine the capacity of this channel.

(b) The middle row of the transition probability matrix can be considered the probabilities of an erasure. Suppose an erasure is equally likely to be correct or in error. What is the capacity of the corresponding channel without erasures?

(c) Suppose that the errors are all erasures. What is the capacity of the corresponding channel?

(d) Compare the capacities of the three channels. - Course Notes Problem 3.2

- Consider the following channel.

(a) Find the channel capacity.

(b) If p=1/2, determine an input distribution that allows error-free communications at capcity. - A discrete memoryless channel is additive modulo L if it has
input and output alphabets 0, 1, ..., L - 1, and the output y is given
by (x + z) modulo L.
The noise variable z also takes the values 0, 1, ..., L - 1,
and is statistically independent of the input.

(a) Show that I(X;Y) = H(Y) - H(Z).

(b) Find the capacity in terms of H(Z) and the corresponding input probabilities. - Consider the construction of a binary code of length n = 6
with M codewords which can correct all single errors.

(a) How many binary vectors have a distance less than or equal to 1 from a given codeword c_{m}?

(b) What is the maximum possible number of codewords M?

(c) Assuming that this maximum value of M can be achieved, what is the corresponding code rate R?

(d) Show that a code with M = 8 exists. - Consider the binary code C composed of the following four code words
C = {(00100),(10010),(01001),(11111)}

(a) Is this code linear? Prove your answer.

(b) What is the minimum distance of this code?

(c) What is the maximum number of errors this code can correct?

(d) What is the maximum number of errors this code can detect?

- Find the length, dimension, rate, and minimum distance for the code defined by the following generator matrix

|111000|

|001110|

|100101|

Determine a systematic generator matrix for this code and give a parity check matrix for this systematic generator matrix. - The parity check matrix of a code with d
_{min}=3 is given by

|011100|

|101010|

|110001|

(a) Determine a syndrome decoding table for this code and use it to decode the following received vectors

(b) r = 110011

(c) r = 111000