” 写作STATS 302编程、Python,c/c++编程Questions 1-3 consider a set of data from a crowd-sourced lending service. It has attributesof 5425 loans that were Charged Off (not paid in full) and an equal number that were paidin full. There are 10 additional variables:Funded_amnt: the amount of money lent.Loan_amnt: the amount of money requested.Dti: debt to income ratio, excluding mortgage and proposed loan.Emp_length: the number of years the borrower has been employed.Installment: the monthly payment.Annual_inc: borrowers annual income.Revol_bal: the balance on all the borrowers revolving credit accounts.Earlyear: the year in which the borrower first borrowed money.Proputil: the proportion of the borrowers maximum revolving credit being utilized (this isactually given as a percentage, a number between 0 and 100).Open_acc: the number of Credit accounts the borrower has opened over the years.Q1 First we perform a principle components analysis of the 10 predictor variables, afterscaling.a) 2 marks The eigenvalues of the correlation matrix are:3.55 1.53 1.26 1.00 0.94 0.67 0.48 0.47 0.08 0.02Sketch the scree plot.b) 3 marks How many principal components do you suggest using? Explain your reasoning.What proportion of total variability do they account for?STATS 302Page 3 of 9c) 2 marks The loadings Of two different three component solutions, varimax rotated andunrotated, have been given below. Loadings below 0.2 have been suppressed. What is thepurpose of rotation? Has that been achieved here?Unrotated: 写作STATS 302作业、Python,c/c++编程Rotated:STATS 302Page 4 of 9d) 1 mark In both sets of loadings, whenever earlyear appears, it has an opposite sign toemp_length. Explain why this makes sense.e) 2 marks Suppose we decide to go with a two component rather than three componentsolution. Will the loadings of the first two components change for either the rotated orunrotated solution? Explain why.STATS 302Page 5 of 9Q2 We now wish to perform MANOVA to test whether there are differences between thecharged off and full paid groups.a) 4 marks Consider the following diagnostics designed to evaluate the MANOVAassumptions. State these assumptions, and your conclusion about how well these are satisfied,referencing specific proportions of the output.b) 4 marks Below find two p-values generated by comparing Pillais trace to the appropriateF distribution, and From comparing to a permutation distribution. Are either of themadequate for summarizing a test for a difference between the means of the two groups?Explain.Observed Pillais trace P-value from F distribution P-value from permutation0.045973 2.2e-16 *** 0.001STATS 302Page 6 of 9Q3. A linear discriminant analysis is performed using the 10 variables used in theMANOVA.a) 4 marks Below, see a table with the predicted classifications from a leave-one out crossvalidation, compared to the true classifications. What is the error rate? What is the purposeof performing cross validation? What are the advantages and disadvantages of leave-one-outvs 10-fold cross validation for this dataset?b) 4 marks A table showing the loadings (correlations) of the original variables with theLDA score is given below. Do any variables appear to be more important than the others inpredicting whether a loan will be charged off? Explain your reasoning. Based on theseloadings, do you expected charged off (unpaid) loans to have higher or lower LDA scores?Explain.c) 2 marks Consider an individual with LDA score 0.10. The height of the implied density ofthe LDA score is 0.38 for the charged off category, and 0.40 for the fully paid category. Inthe relevant population, the frequency of charged off loans is 14%. What is the posteriorprobability that this individuals loan will be charged off?STATS 302Page 7 of 9Questions 4-6 concern Bird sightings at 36 locations in Borneo. Counts of X species arerecorded at each location. The locations are of three different types: P for pristine (neverlogged), Q, logged 8 years previously, and R, logged 4 years previously.Q4a) 2 marks The counts have been fourth root transformed, and then have the Bray Curtisdistance taken between them. In what circumstances do we prefer the Bray Curtis distance?What is the purpose Of taking the quarter root of the counts?b) 2 marks Below you see plot showing the stress for non-metric multidimensional scalingson 2-11 axes. Which number of axes do you prefer? Explain your reasoning.c) 2 marks Explain the difference between metric and non-metric multidimensional scaling.In what case is metric scaling equivalent to principle components analysis?d) 4 marks A permanova has been performed to compare the sites. Explain how thisanalysis works. Are there any important assumptions or caveats? Output is given below;what is your conclusion?STATS 302Page 8 of 9Q5 We now compute two distances between each pair of bird species, one based on the birdsightings, using the Bray-Curtis distance on the quarter root counts, and one based on thebirds divergent characteristics (food source, nesting preference etc.) using the Manhattanmetric.a) 2 marks Compute the Manhattan distance between the two bird species below based onthe characteristics given.Canopy Bark Gleaning insectivore Foliage Frugivore Nectivore Sallying RaptorSp. 1 1 1 1 1 0 0 0 0 0Sp. 2 0 0 1 1 1 0 0 0 0b) 3 marks Output from a Mantel test is given below. Explain briefly what this proceduretests, and how it works. What is the conclusion here?STATS 302Page 9 of 9Q6 Using the Bray-Curtis distances between the occupancy profiles of bird species who areunderstory foliage gleaning insectivores, a dendrogram has been created for the understoryfoliage gleaning insectivores using complete linkage clustering.a) 2 marks Describe what Complete linkage clustering is.b) 2 marks Which two bird species have the most similar occupancy profiles, according tothe dendrogram? Would these two still be most similar under single linkage clustering?Explain.c) 3 marks Two genera, Malacopteron and Phaenicophaeus make up the majority of thesebirds. Cut the dendrogram so that three clusters are created. Tabulate the number ofMalacopteron and Phaenicophaeus in each cluster. Do you think bird species tend to occupythe same sites as other species with the same genera? Explain.END QUESTION PAPER如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。