MAST90083作业写作、辅导Data Mining作业

” Student
Number
Semester 2 Assessment, 2019
School of Mathematics and Statistics
MAST90083 Computational Statistics and Data Mining
Writing time: 3 hours
Reading time: 15 minutes
This is NOT an open book exam
This paper consists of 3 pages (including this page)
Authorised Materials
Mobile phones, smart watches and internet or communication devices are forbidden.
No handwritten or print materials may be brought into the exam venue.
This is a closed book exam.
No calculators of any kind may be brought into the examination.
Instructions to Students
You must NOT remove this question paper at the conclusion of the examination.
Instructions to Invigilators
Students must NOT remove this question paper at the conclusion of the examination.
This paper must NOT be held in the Baillieu Library
MAST90083 Semester 2, 2019
Question 1 Suppose we have a model p(x, z | ) where x is the observed dataset and z are the
latent variables.
(a) Suppose that q(z) is a distribution over z. Explain why the following
F(q, ) = Eq [log p(x, z | ) log q(z)]
is a lower bound on log p(x | ).
(b) Show that F(q; ) can be decomposed as follows
F(q, ) = KL(q(z) || p(z|x, )) + log p(x | )
where for any two distributions p and q, KL(q||p) = Eq log p(z)
q(z)
is the Kullback-Leibler
(KL) divergence.
(c) Describe the EM algorithm in terms of F(q, ).
(d) Note that the KL divergence is always non-negative. Furthermore, it is zero if and only if
p = q. Conclude the optimal q that maximises F is p(z | x, ).
[10 + 10 + 5 + 5 = 30 marks]
Question 2 Let {(xi
, yi)}
n
i=1 be our dataset, with xi R
p and yi R. Classic linear regression
can be posed a empirical risk minimisation, where the model is to predict y using a class of
functions f(x) = w
T x, parametrised by vector w R
p using the squared loss, i.e. we minimise
(a) Show that the optimal parameter vector is
wn = (XT X)
1XT Y
where X is n p matrix, with i-th row given by x
T
i
and Y is a n 1 column vector with
i-th entry yi
(b) Consider regularising the empirical risk by incorporating an l2 penalty. That is, find w
minimising.
Show that the optimal parameter is given by the ridge regression estimator
wridge
n = (XT X + I)1XT Y.
(c) Suppose we now wish to introduce nonlinearities into the model, by transforming x to
(x). Let be a matrix with i-th row given by (xi)T.
(i) Show the optimal parameters would be given by
wkernel
n = (T + I)1T Y
(ii) Express the predicted y values on the training set, w
kernel n, only in terms of y and
the Gram matrix K = T
, with Kij = (xi)
T (xj ) = k(xi
, xj ), where k is some
kernel function. (This is known as the kernel trick.) Hint: You will find the following
matrix inversion formula useful:
Page 2 of 3 pages
MAST90083 Semester 2, 2019
(iii) Compute an expression for the value of y
predicted by the model at an unseen test
vector x.
[5+5+5+10+5 = 30 marks]
Total marks = 60
End of Exam