” FIT3152程序 写作、Data analytics程序 辅导FIT3152 Data analytics: Assignment 2This assignment is worth 20% of your final marks in FIT3152.Due: Sunday 7th June 2020 at Midnight GMT+10Note: Students are expected to work individually on this assignment.How to submit: Submit your written report as a pdf file (.pdf). and R working as an Rscript (.R), orSubmit your report Comprising both written answers and script as an RMarkdown file in HTML format (.html).Use the naming convention: Firstname.Lastname.studentID.{pdf, R, html} Upload the one ortwo files to Moodle. Do not zip. Do not submit the data file.Objective:The objective of this assignment is to gain familiarity with classification models using R.You will be using a modified version of the Kaggle competition data: Predict rain tomorrowin Australia. httpss://www.kaggle.com/jsphyg/weather-dataset-rattle-package The datacontains a number of meteorological observations as attributes, and the class attribute RainTomorrow. Details of the decision attributes follow the assignment description.You are expected to use R for your analysis, and may use any R package. Clear yourworkspace, set the number of significant digits to a sensible value, and use WAUS as thedefault data frame name for the whole data set. Read your data into R using the followingcode:rm(list = ls())WAUS – read.csv(WAUS2020.csv)L – as.data.frame(c(1:49))set.seed(88888888) # Your Student ID is the random seedL – L[sample(nrow(L), 10, replace = FALSE),] # sample 10 locationsWAUS – WAUS[(WAUS$Location %in% L),]WAUS – WAUS[sample(nrow(WAUS), 2000, replace = FALSE),] # sample 2000 rowsWe want to obtain a Model that may be used to predict whether it is going to rain tomorrowfor 10 locations in Australia.Assignment questions:1. Explore the data: What is the proportion of rainy days to fine days.? Obtaindescriptions of the predictor (independent) variables mean, standard deviations,etc. for real-valued attributes. Is there anything noteworthy in the data? Are thereany attributes you need to consider omitting from your analysis? (1 Mark)2. Document any pre-processing required to make the data set suitable for the modelfitting that follows. (1 Mark)23. Divide your data into a 70% training and 30% test set by adapting the followingcode (written for the iris data). Use your student ID as the random seed.set.seed(XXXXXXXX) #Student ID as random seedtrain.row = sample(1:nrow(iris), 0.7*nrow(iris))iris.train = iris[train.row,]iris.test = iris[-train.row,]4. Implement a classification model using each of the following techniques. For thisquestion you may use each of the R functions at their default settings, or with minoradjustments to set factors etc. (5 Marks) Decision Tree Nave Bayes Bagging Boosting Random Forest5. Using the test data, classify each of the test cases as will rain tomorrow or will notrain tomorrow. Create a confusion matrix and report the accuracy of each model.(1 Mark)6. Using the test data, calculate the confidence of predicting will rain tomorrow foreach case and construct an ROC curve for each classifier. You should be able to plotall the curves on the same axis. Use a different colour for each classifier. Calculatethe AUC for each classifier. (1 Mark)7. Create a table comparing the results in parts 5 and 6 for all classifiers. Is there asingle best classifier? (1 Mark)8. Examining each of the models, determine the most important variables in predictingwhether or not it will rain tomorrow. Which variables could be omitted from the datawith very little effect on performance? Give reasons. (2 Marks)9. Create the best tree-based classifier you can. You may do this by adjusting theparameters, and/or Cross-validation of the basic models in Part 4, or using analternative tree-based learning algorithm. Show that your model is better than theothers using appropriate measures. Describe how you created your improved model,and why you chose that model. What factors were important in your decision? Statewhy you chose the attributes you used. (4 Marks)10. Using the insights from your analysis so far, implement an Artificial NeuralNetwork classifier and report its performance. Comment on attributes used and yourdata pre-processing required. How does this classifier compare with the others? Canyou give any reasons? (2 Marks)11. Write a brief report (suggested length 6 pages) summarizing your results in parts 1 10. Use commenting (# —-) in your R script, where appropriate, to help a readerunderstand your code. Alternatively combine working, comments and reporting in RMarkdown. (2 Marks)FIT3152程序 写作、Data analytics程序 辅导Description of the data:Attributes 1:3, Day, Month, Year of the observationAttribute 4, Location: the location of the observationAttribute 5, MinTemp: the daily minimum temperature in degrees celsiusAttribute 6, MaxTemp: the daily maximum temperature in degrees celsiusAttribute 7, Rainfall: the rainfall recorded for the day in mmAttribute 8, Evaporation: the evaporation (mm) in the 24 hours to 9amAttribute 9, Sunshine: hours of bright sunshine over the day.Attribute 10, WindGust: direction of the strongest wind gust over theday.Attribute 11, WindGustSpeed: speed (km/h) of the strongest wind gustover the day.Attribute 12, WindDir9am: direction of the wind at 9amAttribute 13, WindDir3pm: direction of the wind at 3pmAttribute 14, WindSpeed9am: speed (km/hr) averaged over 10 minutesprior to 9amAttribute 15, WindSpeed3pm: Speed (km/hr) averaged over 10 minutesprior to 3pmAttribute 16, Humidity9am: humidity (percent) at 9amAttribute 17, Humidity3pm: humidity (percent) at 3pmAttribute 18, Pressure9am: atmospheric pressure (hpa) reduced to meansea level at 9amAttribute 19, Pressure3pm: atmospheric pressure (hpa) reduced to meansea level at 3pmAttribute 20, Cloud9am: fraction of sky obscured by cloud at 9am. Thisis measured in oktas, which are a unit of eigths. It records how manyeigths of the sky are obscured by cloud. A 0 measure indicatescompletely clear sky whilst an 8 indicates that it is completelyovercast.Attribute 21, Cloud3pm: fraction of sky obscured by cloud at 3pm.Attribute 22, Temp9am: temperature (degrees C) at 9amAttribute 23, Temp3pm: Temperature (degrees C) at 3pmAttribute 24, RainToday: boolean: 1 if precipitation (mm) in the 24hours to 9am exceeds 1mm, otherwise 0Attribute 25, RainTomorrow: the target variable. Did it rain tomorrow?如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。