ECE 515
Information Theory

Assignment Submission

Assignments should be submitted on Brightspace https://bright.uvic.ca/d2l/home

Assignments

Assignment 1 - Due September 26, 2025 Solutions

Consider two binary random variables X and Y with joint probability distribution p(x,y) given by
p(0,0) = 1/2, p(0,1) = 1/4, p(1,0) = 0, and p(1,1) = 1/4.
Find the value of
(a) H(X)
(b) H(Y)
(c) H(X|Y)
(d) H(Y|X)
(e) H(XY)
(f) I(X;Y)
Consider a discrete random variable X with 2ⁿ+1 symbols x_i, i = 1, 2, …, 2ⁿ+1. Determine the upper and lower bounds on the entropy and the corresponding symbol probabilities when
(a) p(x₁)=0
(b) p(x₁)=1/2
A jar contains 5 black balls and 10 white balls. Experiment X involves randomly drawing a ball out of the jar. Next, experiment Y involves randomly drawing a ball with the ball drawn in experiment X not replaced in the jar. One is interested in the colour of the drawn ball.
(a) How much uncertainty does experiment X contain?
(b) What is the uncertainty in experiment Y given that the first ball is black?
(c) What is the uncertainty in experiment Y given that the first ball is white?
(d) How much uncertainty does experiment Y contain?
Let X be a random variable whose entropy H(X) is 8 bits. Suppose that Y(X) is a deterministic function that takes on a different value for each value of X.
(a) What is H(Y)?
(b) What is H(Y|X)?
(c) What is H(X|Y)?
(d) What is I(X;Y)?
(e) Suppose now that the deterministic function Y(X) is not invertible, so that different values of X may correspond to the same value of Y(X). In this case, what can be said about H(Y), and also about H(X|Y)?
The Stanley Cup Final is a seven-game hockey series that ends as soon as either team wins 4 games. Let X be the random variable that represents the outcome of the Stanley Cup Final between teams A and B. For example, some possible values of X are AAAA, BABABAB, and BBBAAAA. Let Y be random variable for the number of games played, which ranges from 4 to 7. Assuming A and B are equally matched and that the games are independent, determine
(a) H(X)
(b) H(Y)
(c) H(Y|X)
(d) H(X|Y)
In a country, 25% of the people are blond and 75% of all blond people have blue eyes. In addition, 50% of the people have blue eyes. How much information is received in each of the following cases
(a) if we know that a person is blond and we are told the colour (blue/not blue) of their eyes
(b) if we know that a person has blue eyes and we are told the colour (blond/not blond) of their hair
(c) if we are told both the colour of their hair and that of their eyes.
A roulette wheel is subdivided into 38 numbered compartments of various colours. The distribution of the compartments according to colour is 2 green, 18 red, and 18 black.

The experiment consists of throwing a small ball onto the rotating roulette wheel. The event that the ball comes to rest in one of the 38 compartments is equally probable for each compartment.
(a) How much information is received if the colour is revealed?
(b) How much information is received if both the colour and number are revealed?
(c) If the color is known, what is the remaining uncertainty about the number?

Assignment 2 - Due October 12, 2025 Solutions

Let X and Y be random variables which are identically distributed but not necessarily independent. Define the correlation between X and Y as ρ = 1 - H(Y|X)/H(X).
(a) Show that ρ = I(X;Y)/H(X).
(b) Show that 0 ≤ ρ ≤ 1.
(c) When is ρ = 0?
(d) When is ρ = 1?
Let F, G and D be three discrete random variables. Prove the validity of the following inequalities and if true find conditions for equality. Venn diagrams are not sufficient.
(a) I(GD;F) ≥ I(D;F).
(b) H(GD|F) ≥ H(D|F).
(c) I(G;F|D) ≥ I(D;F|G) + I(F;G) - I(D;F).
(d) H(FDG) - H(DF) ≤ H(GD) - H(D).
The following equiprobable messages are used to send data over a binary sym metric channel with crossover probability p:
M₁ = 0000 M₂ = 0011 M₃ = 0101 M₄ = 0110
M₅ = 1001 M₆ = 1010 M₇ = 1100 M₈ = 1111
If the sequence Y=0000 is received
(a) how much information is provided about M₁ by the first digit received?
(b) how much additional information about M₁ is provided by the second, then the third, and finally the fourth received digit?

Construct a decision tree for job acceptance prediction using the following training data

Salary: {High, Medium, Low}
Commute: {Short, Medium, Long}
Worklife Balance: {Good, Average, Poor}
Remote Working: {Yes, No}

The output is Accept Job: {Yes, No}.

Instance	Salary	Commute	Worklife Balance	Remote Working	Accept Job
1	High	Short	Good	Yes	Yes
2	High	Medium	Good	No	Yes
3	High	Long	Good	Yes	Yes
4	High	Long	Poor	No	No
5	High	Short	Poor	Yes	Yes
6	High	Medium	Average	No	Yes
7	High	Long	Average	Yes	Yes
8	High	Short	Average	No	Yes
9	Medium	Short	Good	Yes	Yes
10	Medium	Medium	Good	Yes	Yes
11	Medium	Long	Average	Yes	No
12	Medium	Medium	Poor	No	No
13	Medium	Short	Average	No	Yes
14	Medium	Long	Good	No	No
15	Medium	Short	Poor	Yes	No
16	Medium	Medium	Average	Yes	Yes
17	Low	Short	Good	Yes	Yes
18	Low	Medium	Good	No	No
19	Low	Medium	Poor	Yes	No
20	Low	Long	Good	Yes	No
21	Low	Short	Average	No	No
22	Low	Long	Poor	No	No

(b) Predict the job acceptance for the following instances

Instance	Salary	Commute	Worklife Balance	Remote Working	Accept Job
23	High	Medium	Average	Yes	?
24	Medium	Medium	Average	Yes	?
25	Medium	Long	Poor	No	?
26	Low	Short	Poor	No	?
27	Low	Medium	Average	Yes	?
28	High	Medium	Poor	No	?

Consider the binary entropy function h(p) = -plog₂p-(1-p)log₂(1-p). Calculate the average entropy when the probability p is uniformly distributed in the range 0 ≤ p ≤ 1.
The binary erasure channel is a channel with a simple type of noise in that transmitted bits can be erased but cannot be received in error. The channel input is a binary random variable X and the channel output is a ternary random variable Y. Let the probability of x=0 be w so the probability of x=1 is 1-w. The three possible channel outputs are {0,e,1} where e denotes an erasure. The probability of an erasure is p so the probability of correct reception of a symbol is 1-p.
(a) Calculate (I(X;Y) in terms of w and p.
(b) Find the value of w that maximizes I(X;Y).
Relative entropy
(a) Consider a random variable X with four possible outcomes {a, b, c, d}, and two distributions on X:

x p(X) q(X)
a 5/8 3/8
b 1/8 1/4
c 1/8 1/4
d 1/8 1/8

Calculate H(p), H(q), D(p||q), and D(q||p), and verify in this case that D(p||q) ≠ D(q||p).
(b) Part (a) shows that D(p||q) ≠ D(q||p) in general, but there could be distributions for which equality holds. Provide an example of distributions p(X) and q(X) on a ternary alphabet {a, b, c}, such that D(p||q) = D(q||p) (other than the trivial case p(X) = q(X) for which D(p||q) = D(q||p) = 0).

x	p(X)	q(X)
a	5/8	3/8
b	1/8	1/4
c	1/8	1/4
d	1/8	1/8

Assignment 3 - Due November 2, 2025 Solutions

Consider a random variable X with four possible outcomes {x₁, x₂, x₃, x₄} and two probability distributions on X:
p(x₁) = .5, p(x₂) = .2, p(x₃) = .2, p(x₄) = .1
q(x₁) = a, q(x₂) = 2a, q(x₃) = b, q(x₄) = 2b
(a) Find the values of a and b that minimize the cross entropy H(p,q) and the corresponding value of H(p,q).
(b) Show that this value is equal to H(p(X)) + D(p||q).
Prove that H(p,q) ≥ 0 and give conditions for equality. Do not use the identity H(p,q) = H(p(X)) + D(p||q).
A source S has seven symbols with probabilities p₁ to p₇. The probabilities p_i are ordered such that p₁ > p₂ > p₃ > p₄ > p₅ > p₆ > p₇, and p₃ > p₄ + p₅ + p₆ + p₇.
Construct a binary Huffman code for this source considering the distribution of the codeword lengths.
For each of the following probability distributions for a source X, construct a binary and a ternary Huffman code. Fnd the corresponding average codeword length L(C) in each case and determine the code efficiencies ζ.
(a) p(x₁) = .33, p(x₂) = .23, p(x₃) = .12, p(x₄) = .12, p(x₅) = .10 and p(x₆) = .10.
(b) p(x₁) = .35, p(x₂) = .20, p(x₃) = .15, p(x₄) = .15, p(x₅) = .10, p(x₆) = .03 and p(x₇) = .02.
(c) For the code in (b), construct a different binary Huffman code. Which code is preferable in practice and why?
Consider a source Y with five possible outcomes having probabilities (1/3, 1/5, 1/5, 2/15, 2/15).
(a) Find a binary Huffman code for this source.
(b) Show that this code is also optimal for the probabilities (1/5, 1/5, 1/5, 1/5, 1/5).
Determine which of the following codes is uniquely decodable and which are prefix codes.
(a) {1,00,01}
(b) {00,1,10}
(c) {1,10,01}
(d) {1,10,1000}
(e) {00,01,10,001}
The source coding theorem says that for a source X with entropy H(X), it is possible to assign a binary codeword to each source symbol so that a prefix code of average length L(C) < H(X) + 1 is generated. Show by example that this theorem cannot be improved upon, i.e. for any ε > 0, find a source for which L(C) ≥ H(X) + 1 - ε.

Assignment 4 - Due November 14, 2025

Consider the construction of a prefix code where the first codeword symbol is binary and the remaining symbols are ternary.
(a) Devise a construction algorithm which provides an optimal prefix code.
(b) Use this algorithm to construct a prefix code for the source X with symbols x₁ to x₇ having probabilities p(x₁) = .23, p(x₂) = .22, p(x₃) = .20, p(x₄) = .15, p(x₅) = .10, p(x₆) = p(x₇) = .05.
A source has nine symbols x₁ to x₉ with probabilities 1/4, 1/4, 1/8, 1/8, 1/16, 1/16, 1/16, 1/32, and 1/32. This source is to be coded using J = 3 symbols a, b and c.
(a) Determine the code and the efficiency of the code using the methods of Shannon, Fano, Huffman and Tunstall. Suggest an improvement to the Fano algorithm to improve the efficiency.
(b) Determine a uniquely decodable code and the efficiency of the code for the case when two consecutive symbols in the stream of codewords cannot be aa, bb, or cc. This code should have the highest possible code efficiency. Compare with the results in part (a).
A source has symbols x₁, x₂ and x₃ with probabilities 0.7, 0.2,and 0.1, respectively. Encode this source using an arithmetic code with the intervals assigned so that the larger probabilities are on the left.
(a) Find the smallest possible binary codeword corresponding to the symbol sequence x₁ x₁ x₂ x₃ x₂ x₁ x₃ x₁.
(b) Compare the length of this codeword with the theoretical codeword length based on the symbol sequence probabilities and the length of the corresponding Huffman codewords.
A source has symbols x₁, x₂ and x₃ with probabilities 0.8, 0.02, and 0.18, respectively. Decode the arithmetic code sequence 110001 for these probabilities. Assume the symbols are arranged in numeric order (x₁,x₂,x₃) in the range [0,1) and the number of encoded symbols is 7. Compare the length of this codeword with the theoretical codeword length and the expected codeword length based on the entropy.
The suffix condition is that no codeword is the suffix of another codeword.
(a) Show that a code which satisfies the suffix condition is always uniquely decodable.
(b) Show that the minimum average codeword length for a code satisfying the suffix condition for a given source is the same as the average codeword length for the corresponding Huffman code.
(c) Construct an optimal binary suffix code for the source X with symbols x₁ to x₆ having probabilities p(x₁) = .30, p(x₂) = .20, p(x₃) = p(x₄) = .15, p(x₅) = p(x₆) = .10.
A binary source produces statistically independent bits with p(x₁) = 0.9 and p(x₂) = 0.1. Design a fixed length source compaction code for this source with codeword length L=6 bits. Representations for atypical sequences should be a multiple of L. This code should have the smallest actual code rate.
For a codeword alphabet J, let p(X) be the true pdf with optimal code C and q(X) be the estimated pdf with optimal code Ĉ. Show that
H(p(X)) + D(p(X)||q(X))

log_b J
≤ L(Ĉ) <
H(p(X)) + D(p(X)||q(X))

log_b J
+ 1.

Assignment 5 - Due

Aaron Gulliver
2025-11-3

ECE 515 Information Theory

Assignment Submission

Assignments

ECE 515
Information Theory