辅导ELEC0033程序、写作Python

” 辅导ELEC0033程序、写作PythonELEC0033 – 2020/2021Page 95 Data Analytics Task – Climate Data Analysis using Python5.1 General OverviewThe assignment comprises individual code writing, data analysis and inferring. You areallowed to discuss ideas with peers, but your code, and experiments and report must bedone solely based on your on work.The assignment leverages Elements covered in class (data analytics lecture). You will beworking with a couple of meteorological datasets, you will be required to crunch data, toclean the datasets and infer hidden patterns. Specifically, there will be three tasks you willbe asked to solve.The goals of the assignment are the following: To further develop your programming skills To further develop your skills and understanding principle of data analytics andmachine learning To acquire experience in dealing with real-world data5.2 Assignment description1. Dataset descriptionYou will find two pickle files named Weather-denmark-resampled.pkl and df_perth.pkl,respectively.For TASKS 1 and 2, which cover the main aspects of preliminary data analysis, missingdata and outlier detection, you must use the first dataset.For TASK 3, which cover correlation and pattern inferring, you will be using the secondsmaller dataset in order to find Correlations and infer patterns.2. Tasks to be solvedRead carefully the three tasks description and address them using the pre-compiledJupyter notebook named Coursework_weather_data.ipynb.TASK 1 – PRELIMINARY ANALYSISIn this first task, you will explore the dataset. Follow the instructions in the following:a. Import the weather-denmark-resampled.pkl dataset provided in the folder andexplore the dataset by answering the following questions.i. How many cities are there in the dataset?ii. How many observations and features are there in this dataset?iii. What are the names of the different features?ELEC0033 – 2020/2021Page 10b. Now that you got confident with the dataset, evaluate if the dataset contains anymissing values? If so, then remove them using the pandas built-in function.c. Extract the general statistical properties summarising the minimum, maximum,median, mean and standard deviation values for all the features in the dataset. Spotany anomalies in these properties and clearly explain why you classify them asanomalies.TASK 2 OUTLIERSThe second task is focused on spotting and overcoming outliers. Follow the instructionsin the following:d. Store the temperature measurements in May 2006 for the city of Odense. Thenproduce a simple plot of the temperature versus time.HINT: In this dataset, the cities are vertically stacked. Therefore, we have a multicolumn dataset, which basically works as a nested dictionary.e. Find the outliers in this set of measurements (if any) and replace them using linearinterpolation.TASK 3 CORRELATION AND INFERENCEIn this last task, you will be seeking correlation between features of the data and inferringhidden patterns. For this task, you will be working with a smaller dataset. Follow theinstructions in the following:3.1 CORRELATIONf. We now take a new dataset (df_perth.pkl), which collects climate data of a cityin Australia. Here we Have just one year of measurements, but more features.g. Find any significant correlations between features.HINT: you might find useful looking for trends and recurrent patterns within thedata.h. We now focus on the correlation between precipitation and cloud cover. Wewant to infer the probability of having moderate to heavy rain ( 1 mm/h) as afunction of the cloud cover index.HINT: you might find useful to create a new column where you have 0 ifprecipitation 1 mm/h and 1 otherwise.3.2 INFERENCEi. Lets now assume that we want to predict the photovoltaic production (PVproduction) using multiple linear regression. Explain which features arestatistically significant in modelling the target variable.j. Create a multivariate model using the predictors chosen in the previousquestion.ELEC0033 – 2020/2021Page 115.3 DeliverableReportThe report should be written in the form of an academic paper using the ICML format1.The report should be at most 10 pages long excluding references and appendices. Thereport must include the following sections:● Abstract. This section should be a short paragraph (4-5 sentences) that provides abrief overview of the methodology and results presented in the report.● Preliminary Analysis. This section describes your study carried out during task 1and should be organized in the following subsections:○ Data Understanding. This subsection should detail the data that was usedfor this study, clearly describing the content, size and format of the data,how many cities are Described in the dataset, how many observations andhow many (and which) features are considered. Further information canbe provided.○ Data Cleaning. This subsection should describe the missing dataprocessing. It is important to describe the methodology that you used insearching for the missing data and how did you address them in the bestway (for example how do you ensure that the dataset preserver the samestatistics/properties). Motivate clearly your answers.○ Data Statistics. This subsection should describe the general statisticalproperties of the dataset with numerical or graphical visualization. Providereflections toward anomalies (with clear motivation/supporting evidencefor anomalies)● Outliers. This section should describe all the steps that were applied to the datato find and tackle outlier pre-processing. A justification for each step should alsobe provided. In case no or very little pre-processing was done, this section shouldclearly justify why.● Data inference. This section should describe the explorative and inferenceprocess. The following subsections should be provided○ Data Correlation: This subsection should describe the different featurescorrelations that you have investigated in the current dataset. Even if youdiscover little patterns, it is important that you clearly explain and justifythe methodologies that you adopted. Clearly show results that can supportyour statements.○ Data Inference. This subsection should describe the final step of datainference. Again clearly motivate your solutions, approaches and1 httpss://icml.cc/Conferences/2020/StyleAuthorInstructionsELEC0033 – 2020/2021Page 12conclusions/results.● Conclusion. This last section summarises the findings, highlights any challenges orlimitations that were encountered during the study and provides directions forpotential improvements.Please make sure you complement your discussion in each section with relevantequations, diagrams, or figures as you see fit. Most importantly, be sure that all youranswers and solutions are well motivated.Marking CriteriaSee the following page for the marking criteriaCriteria MarkWeightAbstract/ConclusionsThe purpose of the executive summary is to outline data analytics project,input, envisioned outputs as well as key findings 5%Task 1 -PreliminaryAnalysisDataset Understanding. Provide a clear description of the dataset answering thefollowing questions: i) How many cities are there in the dataset? ii) How manyobservations and features are there in this dataset? iii) What are the names of thedifferent features?10%Data Cleaning Missing data. Provide a clear description of the resultsfrom your missing data analysis and key outcomes. 15%Data Statistics. Describe the general statistical properties of the datasetwith numerical or graphical visualization. Provide reflections towardanomalies (with clear motivation/supporting evidence for anomalies)10%Task 2 OutliersShow the visualization of the temperature measurements, together with somecomments on the behaviour depicted in the plots. Provide summaries on theoutliers in terms of number of outliers detected as well as techniques adopted toreplace outliers (motivate your answers).20%Task 3 InferenceData Correlation. Comment on the significant correlation you found betweenfeatures and assess rain probability as a function of cloud cover index. Supportthe text with visualization of results and key insights on the consideredapproach.15%Data Inference. Good understanding of data inference. Comment on themultivariate model using the predictors chosen in the previous question. 20%Report Style Report needs to be with a clean and clear structure as well as layout. Qualityof images, table, citations and references will be also taken into account. 5%请加QQ：99515681 或邮箱：99515681@qq.com WX：codehelp

“