CSE 142程序写作、辅导Python，Java，c++程序

” CSE 142程序写作、辅导Python，Java，c++程序CSE 142 Machine Learning Due: November 7th Homework 2Directions: This homework is to be done individually. Typeset (e.g. TeX) solutions are preferred, but scans orphotographs of hand-written solutions are acceptable provided that they are neat and legible. The TA maydeduct points for poorly Organized or illegible solutions.Question: 1 2 3 4 TotalPoints: 20 30 20 30 100Bonus Points: 0 0 0 0 0Score:Questions:1. Logistic Regression. Logistic regression treats a binary classification (e.g. is/is-not a dog) probabalistically,where the probability for any example x to be assigned the class y = 1 is given in terms thelogistic sigmoid function g and a weight vector w that must be learned:g(w x) = exp (w x)1 + exp (w x)p(1 | x, w) = g(w x)p(0 | x, w) = 1 g(w x)(a) (10 points) Let g(w x) = qFrom the definitions above, prove that w x is a logit or log-odds function when 0 q 1w x = ln q1 q(b) (10 points) Just as p(1|w, x) = g(w x) is the probability for a positive classification conditionedon a set of weights, it is the likelihood of weights w given a classification y = 1. Show that thegradient (in Parameter space – derivatives should be taken with respect to the components of w) ofthis log likelihood function reduces to log g(w x) = x(1 g(w x))which is a vector quantity parallel to x.2. (30 points) Naive Bayes. Use Naive Bayes to estimate whether a student will be an honor student(H) or normal student (N) in college based on their high school performance. Each instance has twofeatures: the students high school GPA (a real number) and whether or not the student took any APcourses (a Boolean value, yes=1, no=0). Based on the following training data, create (by hand or withcomputational tools) a Naive Bayes prediction rule using normal (Gaussian) distributions to estimatethe conditional probability density of high school GPAs given honors status (H or N) (this assigns nonzeroprobability to negative or greater-than-four GPA values, but that is fine for our purposes) and aBernoulli distribution for the AP feature.label AP GPARecall that Naive Bayes makes the simplifying assumption that the features are conditionally independentgiven a class / label:p(gpa, ap | honors) = p(gpa | honors)p(ap | honors).Use maximum likelihood estimation (not the unbiased or Laplace estimates) for the distributions of thetwo features conditioned on the two classes. Give the mean and variance for each distribution over GPAvalues. For the variance here, you only need to calculate the biased sample variance estimator (dividedby n), not the unbiased one (divided by n 1).Describe your prediction rule in the following form:If AP courses are taken, predict H if the GPA is in a Rif AP courses are not taken, predict H if the GPA is in b Rwhere a and b must be found.Hint: It is probably easier to get this description if you take logarithms. 3 digits of precision shouldsufficient. Also, the logarithm of the Gaussian densities are quadratic, possibly yielding two distrinctzeros that Correspond to the boundaries of the intervals a or b.3. Nearest Neighbors. Assume that examples are drawn uniformly from the unit square. Independentof example features (i.e., location (x1, x2) in the unit square), labels are generated at random such thata proportion q where 0.5 q 1 are assigned label y = 1 and the remainder are assigned label y = 0.The Bayes-optimal hypothesis will minimize the error rate of its predictions given (x1, x2) by alwayspredicting y = 1 and suffering an error rate of 1 q.How should we expect the 3-Nearest-Neighbors algorithm perform? Assume that the algorithm is trainedon a large set of known labels. For each new sample, the algorithm finds the three closest (according toany metric!) points in its training set, finds the label shared by the majority of these three, and assignsthis label to the new example.(a) (10 points) What is the expected error rate of the 3-Nearest-Neighbor algorithm, in terms of q?(b) (10 points) When is this better or worse than the Naive Bayes solution (recall, 0.5 q 1)?4. Decision Tree. Sammy-the slug owns a car dealership that sells two types of cars: Honda(H) andBMW(B). He collected the following data of his customers that records their Gender, Annual incomeand the type of car they purchased:Gender AI (Annual Income in thousands) PreferenceThis question is about building decision trees to predict the car type that a new customer will prefer.Note that one of the attributes, Annual Income (AI), is a continuous variable. Lets assume that we willonly allow Binary splits for this attribute of the form AI a and AI a, where a lies in the dataset.However, there can be multiple such splits in one path from root to leaf.For all your calculations, use log base 2.(a) (3 points) For this part of the question, lets assume that for some unknown reason, Sammy insistson keeping the Annual Income at the root node. How many possible values of a does he need toconsider? What are they?(b) (3 points) What is the entropy of labels (car type) in the training dataset?(c) (9 points) What is the optimal root node for this dataset? Show your calculations.(d) (15 points) Draw the Decision Tree that would be learned by ID3 on this dataset. Only binary splitsare allowed i.e. a node can have only two children. Your tree should contain all the informationnecessary to read it. At each node, indicate:1. the attribute you are splitting on (when splitting on AI, also include the a used);2. the label distribution of the instances at that node before the split (e.g. if there are 3 instancesat a node and 2 Belong to H and 1 belongs to B class, then label the node as h2, 1i).3. Additionally, for each non-leaf node indicate the gain attained by the corresponding split, andlabel each leaf node by its class-label (H or B).4. All edges Between a parent and a child should be labeled with the value of the attribute.5. It is okay to draw the tree by hand and include a clear picture in your pdf.6. Dont forget to include Your calculations (6 points here).Page 3如有需要，请加QQ：99515681 或邮箱：99515681@qq.com

“