EE516编程课程写作、R编程语言调

” EE516编程课程写作、R编程语言调2020/11/8 EE516 Take Home Mid-Term Examfile:///C:/Users/Thinkpad/Desktop/EE516_Midterm_2020.html 1/3EE516 Take Home Mid-Term ExamYour NameDue: November 9The due date for this is firm – no exceptions. Because this is an exam, no collaboration is allowed, aside fromdiscussions regarding clarification of questions. You can discuss what is being asked in any question amongyourselves, but you may not collaborate on the answers. Your mid-term should clearly address the questionsbeing asked, and provide any relevant results, scripts, or plots. As always, I will be available for questions, butI may be a less forthcoming in terms of specific questions related to solutions. Im happy to provide guidanceregarding what is being asked, but I will be more guarded in my responses regarding how to solve thequestions. Good luck!1: Probability and PDFsProbability theory underlies all inferential statistics. For example, in all hypothesis tests, we either specify asignificance level, or more commonly in the era of computers, a p-value is provided from the analysis. Withinthis framework, Answer the following questions:A. Explain the basic probability theory that provides the basis for univariate t-tests. Specifically, what is a tstatistic and what does the associated p-value represent?Explain.B. Explain what confidence intervals on the sample mean represent in terms of the underlying probabilitytheory for a data set drawn from a univariate normal distribution. How do the size (i.e., n) and varianceof the sample affect confidence intervals on the sample mean?Explain.C. Write a short program in R program to illustrate your answers.# Insert code here.2: Linear RegressionMany researchers have attempted to relate estimates of vegetation leaf area index measured on the ground tosatellite measurements of surface reflectance collected at the same locations. In the data file LAI_NDVI.txt, Ihave provided a Data set for you to develop a statistical model to do this. The file consists of 50 rows, whereeach row contains the following fields:Station ID, LAI1, LAI2, LAI3, LAI4, LAI5, NDVIFor each station there are five randomly located LAI measurements that were collected within a 15-m radius ofthe station (where LAI= the total one-sided area of leaves per unit ground area), with the station located at thecenter of a 30-m remotely sensed NDVI measurement (pixel). Having more than one measurement in each2020/11/8 EE516 Take Home Mid-Term Examfile:///C:/Users/Thinkpad/Desktop/EE516_Midterm_2020.html 2/3pixel is useful because LAI can be highly variable. For this problem, your first task is to average the LAImeasurements at each site to provide a single representative LAI value that corresponds to the NDVImeasurement centered over each station.Your main task is to estimate an appropriate and valid statistical model to predict LAI from NDVI. Carefullydescribe how You go about doing this, including the justification and rationale for your approach and anassessment of your final model in terms of its quality and the degree to which it meets the requiredassumptions. Explain your results. Can you exploit the fact that there are multiple LAI measurements at eachstation to improve your estimated model?# Insert code here.Explain.3: Multivariate Normal DistributionThe file plainspcp.txt contains Monthly precipitation data for 250 stations in the great plains, where the 1stcolumn provides the precipitation data for each station in January, the 2nd column provides the precipitationdata for February, and so on. Using these data:A. Perform an analysis where you assess the univariate normality for precipitation in June, July and August(i.e., each month individually). That is, assess whether the precipitation data in each month is univariatenormal. As part of this analysis, examine the data from each of these months and identify any potentialoutliers.B. Now do the same for these three months in a multivariate context. Is the precipitation data for June, Julyand August multivariate normal? Are there any multivariate outliers? To answer this question, you shouldcompute the standardized distance of each point relative to the mean vector, and follow the basicprocedure outlined in the Tutorial covering this material.4: Tests of mean vectors.Using the precipitation data (again), but this time using data from February through November, test thefollowing hypothesis:: = (1.19, 0.97, 1.06, 2.07, 2.67, 4.08, 3.87, 3.35, 2.81, 2.94)Explain your method and your results.# Insert code here.Explain.5: Analysis of VarianceIn this question you will use data from a data set called LAI.txt. This file consists of 6 columns where the firsttwo columns correspond to leaf area index measurements collected in May and July, respectively, at agrassland site in Kansas. The next two columns correspond to the greenness vegetation index (GVI; i.e., ameasure of how green the surface is) from remote sensing for pixels corresponding to sites on the groundH0 2020/11/8 EE516 Take Home Mid-Term Examfile:///C:/Users/Thinkpad/Desktop/EE516_Midterm_2020.html 3/3where LAI was measured on the same dates. The final two columns correspond to codes for each groundlocation indicating Burning treatment (1=burned in early spring, 2=unburned) and hillslope position (1=lowland,2=slope, 3=upland). Using these data:A. Write an R program to manually compute the univariate between-sample sum of squares and the withinsamplesum of squares for LAI in May and in July as a function of burning treatment (i.e., 2 distinctANOVAs, not using builtin functions). Also, compute the F-statistic in each case, and then use the builtinprobability model in R for the F distribution to compute the corresponding p-values for =0.05. Explainyour results. You can use the Builtin aov function in R to check your results, but your answer must beimplemented manually in R.# Insert code here.Explain.B. Repeat part (a), but perform a MANOVA using burning treatment as a grouping variable for LAI in bothMay and July. That is, write a program in R to compute the Wilks statistic manually (i.e., perform aMANOVA without using the built-in manova function in R). To do this use the getH and getE functionsthat I provided in lecture (available on Blackboard) to compute the H and E matrices, and then use thebuilt-in function det to compute determinants, as appropriate. Explain and interpret your results. Note,you do not need to compute the p-value for the Wilks statistic.# Insert code here.Explain.如有需要，请加QQ：99515681 或邮箱：99515681@qq.com

“