” CSE 150A编程 写作、 辅导Java编程语言CSE 150A. Assignment 3 Summer 2020, Session 1Out: Thu Jul 16Due: Thu Jul 233.1 Maximum likelihood Estimation(a) Complete dataConsider a complete data set of i.i.d. examples {at, bt, ct, dt}Tt=1 drawn from the joint distributionof the above belief network. Compute the maximum likelihood estimates of the conditional probabilitytables (CPTs) shown below for this data set. Express your answers in terms of equality-testingfunctions, such as:I(a, at) = 1 if a = at,0 if a 6= at.For example, in terms of this function, the maximum likelihood estimate for the CPT at node A isgiven by P(A = a) = 1TPTt=1 I(a, at). Complete the numerators and denominators in the belowexpressions.P(B =b|A=a) =P(C =c|A=a, B =b) =P(D =d|A=a, C =c) =1AB CCSE 150A作业 写作、 辅导Java编程语言作业(b) Posterior probabilityConsider the belief network shown above, with observed nodes B and D and hidden nodes A and C.Compute the posterior Probability P(a, c|b, d) in terms of the CPTs of the belief networkthat is, interms of P(a), P(b|a), P(c|a, b) and P(d|a, c).(c) Posterior probabilityCompute the posterior Probabilities P(a|b, d) and P(c|b, d) in terms of your answer from part (b).In other words, in this problem, you may assume that P(a, c|b, d) is given, since you showed how tocompute that in part (b).(d) Log-likelihoodConsider a partially complete data set of i.i.d. examples {bt, dt}Tt=1 drawn from the joint distributionof the above belief Network. The log-likelihood of the data set is given by:L =Xtlog P(B =bt, D =dt).Compute this log-likelihood in terms of the CPTs of the belief network. You may re-use work fromearlier parts of the problem.2AB CD(e) EM algorithmThe posterior probabilities from parts (b) and (c) can be used by an EM algorithm to estimate CPTsthat maximize The log-likelihood from part (d). Complete the numerator and denominator in the belowexpressions for the EM update rules. Simplify your answers as much as possible, expressing them interms of the posterior probabilities P(a, c|bt, dt), P(a|bt, dt), and P(c|bt, dt), as well as the functionsI(b, bt), and I(d, dt).P(A=a) P(B =b|A=a) P(C =c|A=a, B =b) P(D =d|A=a, C =c) 33.2 EM algorithm for noisy-ORIn this problem, we will see how the EM algorithm can be used to learn the parameters of a noisy-OR model.Part (a) is related To the derivation of the EM update rules for the noisy-OR model. Parts (b) through (d) willask you to write code to learn the parameters of a noisy-OR model based on a particular data set.(a) Equivalence of modelsThe following diagrams depict two variations of the noisy-OR model:YX1 X2 … XnYZ1 Z2 … ZnX1 X2 Xn…Model A Model BFor this part of the problem, we will use the notation PA( ) and PB( ) to distinguish betweenprobabilities computed using Model A and Model B, respectively.Model A is a standard noisy-OR model in which the CPT for Y isPA(Y = 1|X~ = ~x) = 1 Yni=1(1 pi)xi.Model B is a variation of the noisy-OR model in which weve inserted a layer of hidden variablesZ1, . . . , Zn, with the following CPTs:PB(Zi = 1|Xi = xi) = (pi, if xi = 10, if xi = 0PB(Y = 1|Z~ = ~z) = (1, if Zi = 1 for any i0, if Zi = 0 for all iIn contrast to Model A, the Y node in Model B is a deterministic OR, since it will be 1 if and only ifany of the Zis are 1. We can view the Zi nodes as being a sort of noisy copy of the correspondingXi nodes: if Xi = 0, then Ziis guaranteed To be 0 as well, but if Xi = 1, then Zi has probability piof being 1.Note also that both models are defined in terms of parameters pi for i {1, . . . , n}.4To prove that the Y nodes in both models are equivalent, it is enough to show thatPA(Y = 0|X~ = ~x) = PB(Y = 0|X~ = ~x).The following equations are an incomplete proof of this fact, which differs slightly from the one seenin lecture. To complete this proof, provide a brief justification for each of the steps labeled (i) through(Because of this result, we can simply use the notation P( ) for probabilities in the remaining partsof this problem, since we have shown that the the two models PA( ) and PB( ) agree.)(b) EM Implementation: Per-iteration statisticsSuppose we are given a data set of T samples {(~xt, yt)}Tt=1 from Model A.If we try to apply standard maximum-likelihood techniques to estimate the values of the parameterspi based on Model A, we will find that there is no closed-form solution.However, since part (a) showed that Model A and Model B are equivalent, we can instead view thedata as an Incomplete data set corresponding to Model B. This allows us to estimate the values of piusing the EM algorithm.Noting that pi corresponds to P(Zi = 1|Xi = 1), which is one of the CPT entries for Model B, wecan write the EM update rule as:pi count d (Zi = 1, Xi = 1)count d (Xi = 1)=Pt P(Zi = 1, Xi = 1|X~ = ~xt, Y = yt)Pt P(Xi = 1|X~ = ~xt, Y = yt)Then, as shown in class, the EM update rule for noisy-OR can be simplified to the following:xit is the number of observations where Xi = 1.5Next, you will use the EM algorithm for estimating the noisy-OR parameters pi. Download thedata files hw3 noisyOr x.txt and hw3 noisyOr y.txt for this homework assignment. The data set hasT = 267 examples over n= 23 inputs.1For a data set {(~xt, yt)}Tt=1, the normalized log (conditional) likelihood is given by:In your code, you should initialize all pi = 0.05 and perform 512 iterations of the EM algorithm usingthe update rule shown above. An easy to miss reason for getting different results on this assignment isto not use the initialization specified here! At each iteration, compute the log conditional likelihoodshown above. (If you have implemented the EM algorithm correctly, this log conditional likelihoodwill always increase from one iteration to the next.)Also compute the number of mistakes made by the model at each iteration; a mistake occurs either when yt = 0 and P(Y = 1|X~ = ~xt) 0.5 (indicating a false positive), or when yt = 1 and P(Y = 1|X~ = ~xt) 0.5 (indicating a false negative).In other words, use the current parameter values of the iteration to try to predict Y given the valuesfor X, and count how many of those predictions are wrong. The number of mistakes should generallydecrease as the model is trained, though it is not guaranteed to do so at each iteration.For this part, turn in a completed version of the following table:iteration number Of mistakes M log conditional likelihood L0 175 -0.9581You should use the already completed entries of this table to check your work. As always you mayprogram in the language of your choice.(c) EM Implementation: Estimated values for piProduce a table or bar plot of your final estimated values for pi.(d) EM Implementation: Source codeInclude your source code for this problem as part of your Gradescope submission.1For those interested, more information about this data set is available here: https://archive.ics.uci.edu/ml/machine-learning-databases/spect/SPECT.names6Debugging hints: If you are having trouble getting your results to match starting at iteration 1, try printingout the values of p1 and p23 after the first few iterations. After iteration 1, you should find that p1 0.11655and p23 0.15310. If you find that your value of p1 matches but p23 does not, this is likely a sign thatyou are using the updated values for pitoo early. To solve this, you should save the new values for piintemporary variables until p1, p2, . . . , p23 have all been computed; then, overwrite the old values of pi all atonce.73.3 EM algorithm for binary matrix completionIn this problem you Will use the EM algorithm to build a simple movie recommendation system. Downloadthe files hw3 movieTitles su20.txt and hw3 movieRatings su20.txt. The first file provides the titles of 60movies. The second file contains a 15060 matrix of zeros, ones, and missing elements denoted by questionmarks. The (i, j)th element in this matrix contains the ith users rating of the jth movie (corresponding tothe jth line of the first file), according to the following key:1 recommended,0 not recommended,? not seen.Note: In past versions of this course, this data came from a survey given to you and your peers. In the interestof time for the Summer session, I have generated synthetic data to only loosely match the distribution seenfrom past students, so as to give them some privacy.(a) Sanity checkCompute the mean popularity rating of each movie, given by the simple rationumber of users who recommended the movienumber of users who saw the movie,and sort the movies by this ratio. Print out the movie titles from most popular (Doctor Strange) toleast popular (The Last Airbender).(b) LikelihoodNow you will learn a naive Bayes model of these movie ratings, represented by the belief networkshown below, with Hidden variable Y {1, 2, . . . , k} and partially observed binary variablesR1, R2, . . . , R60 (corresponding to movie ratings).YR1 R2 R3 R60 …This model assumes that there are k different types of movie-goers, and that the ith type of moviegoerwhorepresents a fraction P(Y =i) of the overall populationlikes the jth movie with conditionalprobability P(Rj = 1|Y =i).8This model is an interesting extension of the hidden variable models studied in lecture: for this problem,each sampled set of users reviews contains its own mix of observed and unobserved variables.Some users have watched many movies, some users have watched only a few. A hidden variable forone user might be observed for another user. This is what we mean by partially observed. Note thethe type of movie-goer (variable Y ) is always hidden for all users.Let t denote the set of movies seen (and hence rated) by the tth user. Show that the likelihood of thetth users ratings is given by.(c) E-stepThe E-step of this model is to compute, for each user, the posterior probability that they correspondto a particular type of movie-goer.(d) M-stepThe M-step of the model is to re-estimate the probabilities P(Y =i) and P(Rj = 1|Y =i) that definethe CPTs of the belief network. As shorthand,denote the probabilities computed in the E-step of the algorithm. Also, let T = 150 denote the numberof users. Show That the EM updates are given bydenotes a sum over all users t that rated movie j. Similarly, Pt:j /tdenotes a sumover all users t that did not rate movie j.(e) ImplementationDownload the files hw3 probType init su20.txt and hw3 probRatingGivenType init su20.txt, and usethem to initialize the probabilities P(Y = i) and P(Rj = 1|Y = i) for a model with k = 4 types ofmovie-goers. Run 128 iterations of the EM algorithm, computing the (normalized) log-likelihoodat each iteration. Your log-likelihood should increase (i.e., become less negative) at each iteration.Fill in the following table, using the already provided entries to check your work:9iteration log-likelihood LNote: Choosing k = 4 is somewhat arbitrary, but it is based on assuming that there are a small numberof fundamentally different movie-goer types. Moreover, the initial values for P(Rj = 1|Y =i) inhw3 probRatingGivenType init su20.txt have been chosen randomly. We provide these to you so thatthis HW unfolds in a certain manner and youll get the same results we got. If you are interested inexploring further after completing this assignment, feel free to try out other initializations and othervalues of k.(f) User categorizationUsing the formula from part (c) along with the parameters you estimated in part (e), evaluate theprobability distribution over movie-goer types (i.e., over Y ) given the following ratings:Tron recommendedFrozen recommendedStar Trek Beyond NOT recommendedJumanji: Welcome to the Jungle NOT recommendedIt NOT recommendedBatman v Superman: Dawn of Justice NOT recommendedThe Dark Knight NOT recommendedThe Lord of the Rings: The Fellowship of the Ring recommendedBlack Panther recommendedThe Avengers recommendedAvengers: Infinity War recommendedall others Not seenInclude either a table or a plot of P(Y =i| those ratings) for i {1, . . . , k}.What value of i {1, . . . , k} maximizes the value of P(Y =i| those ratings)? (In other words, whichof the k types of movie-goers does this user most closely match?)(g) User movie recommendationsNext, compute this users expected ratings on the movies they havent yet seen. You can do this byapplying the following formula for each movie ` that they havent seen:P (R` = 1 | their ratings) = Xki=1P (Y =i | their ratings) P(R` = 1|Y =i)10What are the top ten recommended movies? (Print out the first ten movies in the list of these (unseen)movies sorted by their expected ratings.) Does this list seem different than the list in part (a)? Hopefullyit does (although Our data set is obviously far smaller and more incomplete than the data setsat actual movie streaming companies, not to mention the fact this particular data set was randomlygenerated).Note: If you are Interested in exploring further after completing this assignment, feel free to try outplacing your own ratings in and re-doing parts (f) and (g).(h) Source codeInclude your source code for this problem as part of your Gradescope submission. As usual, you mayprogram in the language of your choice.如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。