STA302H1F程序 写作、 辅导R课程设计程序

” STA302H1F程序 写作、 辅导R课程设计程序Final ProjectSTA302H1F: LEC5101/STA1001HF: LEC0201Due on 25th June, 2020 11:59 PM Sharp in QuercusAll relevant work must be shown for credit.Final Project: The final project is due on June 25, 2020 by 11:59PM EST and consists ofa data analysis on a novel dataset. The deadline will be strictly applied. At no circumstancesstudents can submit late. Please make sure that you start the submission process early so thatyour project is graded.Students will be required to demonstrate their understanding of the methods based on coursematerials by developing a reasonable regression model using the techniques taught in class. Thestudents will be responsible for choosing the correct methods to apply and providing appropriatejustifications defending their choices.The final project will be done individually, and must be typed and submitted by the stateddeadline. The project needs to fulfill the following criteria: Font: 12-point Ffont in a style similar to Times New Roman Spacing: single-spaced The word limit for the final project is 1500. This excludes the title page, table/figure captionsand appendix. Maximum 5 tables/figure will be allowed in the project report. The tables and figures shouldbe relevent, should convey the purpose of the project. All tables and figures should havecaptions. you may use any combination of tables and figures Up to 3 additional tables/figures but they should only be included if they are relevant to theanalysis and are referred to in the main text. You must submit the report in a standard file format (e.g., .doc, .docx or a pdf). Please submit your R codes file. This can be a .r or a .rmd file. No other file format for thecodes will be accepted.In order to pass the course, you must submit the final project.For this problem You need to load the NHANES dataset using the following command## If the package is not already installed then use ##install.packages(NHANES) ; install.packages(tidyverse)library(tidyverse)library(NHANES)small.nhanes – na.omit(NHANES[NHANES$SurveyYr==2011_12 NHANES$Age 17,c(1,3,4,8:11,13,17,20,21,25,46,50,51,52,61)])small.nhanes – as.data.frame(small.nhanes %group_by(ID) % filter(row_number()==1) )nrow(small.nhanes)## Checking whether there are any ID that was repeated. If not ##1## then length(unique(small.nhanes$ID)) and nrow(small.nhanes) are same ##length(unique(small.nhanes$ID))This is data collected by the US National Center for Health Statistics (NCHS). To check the variabledescription please type ?NHANES in R. The preceeding codes create a small subset of the originalNHANES dataset. The original dataset has 76 variables. The small.nhanes dataset has 17 variables.We have only selected data from people with age 17 years.With this dataset answer the following questions, Randomly select 400 observations from thedata. For this Selection use your student ID as the seed (you can follow the next chunk of codes forthis). This is the traning set. The rest of the data will be used as a test set. The test set shouldnot be used for model fitting and validating at any point during the analysis of the project.## Create training and test set ##set.seed(1002656486)train – small.nhanes[sample(seq_len(nrow(small.nhanes)), size = 400),]nrow(train)length(which(small.nhanes$ID %in% train$ID))test – small.nhanes[!small.nhanes$ID %in% train$ID,]nrow(test)The combined systolic blood pressure reading (BPSysAve) is our outcome of interest. Everyother variable other than the ID can be considered as predictors. We are mainly interested on theeffect of smoking (SmokeNow) on the combined systolic blood pressure reading. However, we arealso interested In the prediction of the combined systolic blood pressure reading and identifyingwhich variables are the best for the prediction. Based on the data analysis techniques you learnedfrom this course perform a complete analysis on the dataset. Your analysis should include (but isnot limited to): Model Diagnostics Checking for the variance inflation factor (VIF) Variable selection Shrinkage methods Model Validation Checking the prediction error on the test set after applying various model selection techniques After selecting the best model interpret and explain the parameter estimates Conclude on the effect of predictors on the combined systolic blood pressure readinHowever, you have to justify the aforementioned methods and have to use them accurately.The final project will be submitted as a project report, which consists of: Introduction section: where you introduce the purpose and relevance of the project. Youcan also include some literature review on the NHANES dataset if applicable.2 Methods section: Please describe and explain the methods, tools and techniques used toarrive at your final model here. Need to show some exploratory data analysis. Results section: here You present a description of your study sample, important resultsthat led you to make crucial decision in building your model, and the final model and anyother important results Discussion section: here you interpret your final model and describe why it answers theresearch question and why it is important, as well as discuss any limitations that still existbased on your results.ALL THE BEST!如有需要,请加QQ:99515681 或邮箱:99515681@qq.com

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导