” 辅导SIT114课程程序、 写作data留学生程序SIT114 2020.T1: Task 6.4HDContents1 To Do 12 Hint 23 Further Reading 44 Artefacts 45 Intended Learning Outcomes 41 To DoCreate a single RMarkdown report where you perform what follows.1. Just as in task 6.3D, load the Wine Quality dataset.wines – read.csv(winequality-all.csv,comment=#, stringsAsFactors=FALSE)2. Then, add a new 0/1 column named quality again (quality equal to 1 ifand only if a wine is ranked 7 or higher).3. Perform a random train-test split of size 60-40% create the matricesX_train and X_test and the corresponding label vectors Y_train andY_test that provide the information on the wines quality.4. Your task is to determine the best (see below) parameter setting for theK-nearest neighbour classification of the quality variable based on the 11physicochemical features. Perform the so-called grid (exhaustive) searchover all the possible combinations of the following parameters:a. K: 1, 3, 5, 7 or 9b. preprocessing: none (raw input data), standardised variables or robustlystandardised variablesc. metric: L2 (Euclidean) or L1 (Manhattan) 辅导SIT114课程作业、 写作data留学生作业、 辅导R编程设计作业、 写作R语言作业In other words, there are 5*3*2=30 combinations of parameters in total,and hence 30 different scenarios to consider.By robust standardisation we mean: from each column, subtractits median and then divide by median absolute deviation(MAD, i.e., median(abs(x-median(x)))). This data preprocessingscheme is less sensitive to outliers than the classic standardisation.Note that the L1 metric-based K-nearest neighbour method isnot implemented in the FNN package. You need to implement iton your own (see Chapter 3 of LMLCR).By the best classifier we mean the one that maximises the F-measureobtained By the so-called 5-fold cross-validation.In Chapter 3 we discussed that it would not be fair to use the test set forchoosing of the optimal parameters (we would be overfitting to the testset). We know that one possible way to assure the transparent evaluationof a classifier is to perform a train-validate-test split and use the validationset for parameter tuning.Here we will use a different technique one that estimates the methodstrue predictive performance more accurately, yet at the cost of significantlyincreased run-time. Namely, in 5-fold cross-validation, we split the originaltrain set randomly into 5 disjoint parts: A, B, C, D, E (more or less ofthe same number of observations). We use each combination of 4 chunksas training sets and the remaining part as the validation set, on which wecompute the F-measure:train set validation set F-measureB, C, D, E A FAA, C, D, E B FBA, B, D, E C FCA, B, C, E D FDA, B, C, D E FEFinally, we report the average F-measure, (FA + FB + FC + FD + FE)/5.5. Report the best scenario (out of 30) together with the correspondingclassifiers accuracy, precision, recall and F-measure on the test set.Make sure that the report has a readable structure. Divide the document intosections. Before each code chunk, explain what purpose does it serve.Side note: If you want a real challenge (this is definitely not obligatory),you can add another level of complexity: select the bestcombination of the input variables, e.g., amongst all the possible2pairs or triples of columns in the dataset.2 HintA grid search can be implemented based on a triply-nested for loop:Ks – c(1, 3, 5, 7, 9)Ps – c(none, standardised, robstandardised)Ms – c(l2, l1)for (K in Ks) {for (preprocessing in Ps) {for (metric in Ms) {if (preprocessing == standardised) {# …}else if (preprocessing == robstandardised) {# …}else {# …}if (metric == l2) {# …}else {# …}}}}Alternatively, you can go through every row in the following matrix and processeach thus defined scenario:expand.grid(Ks, Ps, Ms)## Var1 Var2 Var3## 1 1 none l2## 2 3 none l2## 3 5 none l2## 4 7 none l2## 5 9 none l2## 6 1 standardised l2## 7 3 standardised l2## 8 5 standardised l23## 9 7 standardised l2## 10 9 standardised l2## 11 1 robstandardised l2## 12 3 robstandardised l2## 13 5 robstandardised l2## 14 7 robstandardised l2## 15 9 robstandardised l2## 16 1 none l1## 17 3 none l1## 18 5 none l1## 19 7 none l1## 20 9 none l1## 21 1 standardised l1## 22 3 standardised l1## 23 5 standardised l1## 24 7 standardised l1## 25 9 standardised l1## 26 1 robstandardised l1## 27 3 robstandardised l1## 28 5 robstandardised l1## 29 7 robstandardised l1## 30 9 robstandardised l13 Further ReadingSee Section 5.1 of the book by James G et al. 2017. An introduction to statisticallearning with applications in R. Springer-Verlag. https://faculty.marshall.usc.edu/gareth-james/ISL4 ArtefactsSubmit two files via OnTrack:1. the Rmd file (RMarkdown report),2. the resulting PDF file that is generated by clicking Knit Document to PDFin RStudio; If you are unable to generate the PDF file directly, convert thereport to HTML or Word, and manually export the resulting file to PDF.5 Intended Learning OutcomesULO RelatedULO1 (Methods) YESULO2 (Problems) YES4ULO RelatedULO3 (Implementation and Evaluation) YESULO4 (Communication) YESULO5 (Impact) YES5如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。