” 辅导DATA2001程序、Data Science程序 写作School of Computer ScienceUwe RoehmDATA2001/DATA2901: Data Science, Big Data, and Data Diversity 1.Sem./2020Practical Assignment: Viral Vulnerability AnalysisGroup Assignment (20%) 06.05.2020IntroductionIn this practical assignment of DATA2001/DATA2901 you are asked to gather and integrate severaldatasets to perform a data analysis of the viral vulnerability of different neighbourhoods in Sydney.You find links to online documentation, data, and hints on tools and schema needed for thisassignment in the Assignments section in Canvas.Disclaimer: This assignment is mainly about data integration. Note that the age and varyingquality of the provided data do not allow to reliably assess the actual COVID19 risk.Data Set Description and PreparationYour task in this assignment is to calculate a vulnerability score with regard to infectious diseases fordifferent neighbourhoods in Sydney. The neighbourhood vulnerability is expressed as a measureof several factors which we assume to affect the spread of a virus within a community populationdensity, age distribution, pre-existing health conditions, and access to healthcare services.In order to calculate this score, you will need to integrate different data sources. As a startingpoint, we provide you with a few census-based datasets which give you input on at least threefactors: population density, age distribution, and locations of health services (hospitals and GPs).We leave it up-to you to integrate further data and to refine the suggested vulnerability score.Some ideas would be percentage of population with pre-existing health conditions such as asthmaor diabetes, presence of meeting hotspots such as large shopping centres or sports venues, intensityof international travel (either by locals there or by tourists in an area), or public transport usage.Based on your computed vulnerability scores, perform then a correlation analysis with the offi-cial COVID-19 data per neighbourhood as provided by NSW Health (also provided, resp. linked).Your submission should consist of your Jupyter notebook that you used for integrating the datasets and for performing and visualising your analysis.Milestone 1: Load and integrate the provided datasets into postgres by the tutorials in Week 11.Provided datasets: We provide in Canvas several CSV files with Statistical Area 2 (SA2) datafrom the Australian Bureau of Statistics (ABS), as well as some health service location data from 辅导DATA2001作业、Data Science作业 写作、Python程序语言作业 辅导、Python作业 写作Sydney (keep checking Canvas for any later additions or updates):StatisticalAreas.csv: area id, area name, parent area idNeighbourhoods.csv: area id, area name, land area, population, dwellings, businesses, median income, avg monthly rent, bounding boxPopulationStats2016.csv:area id, area name, age distribution, total persons, females, malesHealthServices.csv: id, name, category, num beds, address, …, longitude, latitude, commentNSW Postcodes.csv id, postcode, locality, longitude, latitudeCOVID-19 Statistics recent daily data can be accessed from data.gov.aue.g.: httpss://data.gov.au/dataset/ds-nsw-5424aa3b-550d-4637-ae50-7f458ce327f41Task 1: Data Integration and Database GenerationBuild a database using PostgreSQL that integrates data from the following sources:1. Sydney neighbourhood dataset (based on provided CSV files with SA2-data from ABS).2. Census data for the given neighbourhoods including population count and age distributions.3. Health services in NSW; Todo: spatial join with neighbourhoods.4. You are encouraged to extend and refine both scoring function and source data. Forfull points when integrating at least one additional data set.Milestone 1: Load and integrate the provided datasets into PostgreSQL by the tutorials in Week 11.Task 2: Viral Vulnerability Analysis1. Compute the vulnerability score for all given neighbourhoods according to the following formulaand definitions (adjust as needed if you integrated any additional datasets):vulnerability = S(z(population density)+z(population age)z(healthservice density)z(hospitalbed density))With S being the logistic function (sigmoid function), and z the z-score (standard score) of ameasure – the number of standard deviations from the mean (assuming a normal distribution):z(measure, x) = x avgmeasurestddevmeasureMeasure Definition Risk Data Sourcepopulation density population divided by neighbourhoods land area + nNeighbourhoods.csvpopulation age percentage of a neighbourhoods population age 70+ + PopulationStats2016.csvhealthservice density number of health services per suburb per 1000 people HealthServices.csvhospitalbed density number of hospital beds per suburb per 1000 people HealthServices.csv2. Store the computed measures and scores of each neighbourhood in your database. Createat least one index which is helpful for data integration or the vulnerability score computation.3. Determine whether there is a correlation between your viral vulnerability score and the numberof COVID-19 tests or COVID-19 cases (positive tests) per neighbourhood.Task 3: Documentation of your Viral Vulnerability AnalysisWrite a document (Jupyter notebook or Word document or PDF file, no more than 5 pages plusoptional Appendix) in which you document your data integration steps and the main outcomes ofyour vulnerability data analysis, including the correlation study with the COVID-19 statistics. Yourdocument should contain the following:1. Dataset DescriptionWhat are your data sources and how did you obtain and pre-process the data?2. Database DescriptionInto which database schema did you integrate your data (preferable shown with a diagram)?Which index(es) did you create, and why?3. Vulnerability Score AnalysisShow which formula you applied to compute the vulnerability score per neighbourhood, andgive an overview of vulnerability results. This can be done either in text by highlighting somerepresentative results, or with a graphical representation onto a map (preferred).4. Correlation AnalysisHow well does your score correlate to the number of COVID-19 cases in the given suburbs?Is there any correlation with the number of COVID-19 tests in the neighbourhoods?2Task 4: DATA2901 Task for Advanced Class Only1. For teams in the advance class, integration of at least one additional data set is compulsory.2. One of the additional data sources must come from a web source such as be Web Scrapingor using a Web-API, rather than just a downloadable additional CSV data set.3. Include in the vulnerability analysis some data that was inferred using a machine learning ornatural language processing step. For example, you could retrieve and count named entitiesfrom the scrapped content of a website about international visitors or travel infrastructure indifferent neighbourhoods in Sydney, or you could try to train a neighbourhood classifier.General Coding Requirements1. Solve this assignment with a Python Jupyter notebook in Python and SQL (Adv: also Unix).2. Use the provided Jupyter and PostgreSQL servers from the tutorials.3. If you use any extra libraries which are not installed in the labs, disclose in your documentationwhich library and what version.Deliverables and Submission DetailsThere are four deliverables:1. source code of the data integration and analysis tasks,2. a brief report/documentation (up to 5 pages, as of content description above), and a3. short demo in the labs of Week 12 with the whole team present.4. Please also provide access to your database with the schema and the processed data.All deliverables are due in Week 12, no later than 8pm, Friday 22 May 2020. Late submissionpenalty: -20% of the awarded marks per day late. See also the published marking rubric in Canvas.Please submit the source code and a soft copy of your documentation as a zip or tar file electronicallyin Canvas, one per each group. Name your zip archive after your UniKey: abcd1234.zipStudents must retain electronic copies of their submitted assignment files and databases, as theunit coordinator may request to inspect these files before marking of an assignment is completed. Ifthese assignment files are not made available to the unit coordinator when requested, the markingof this assignment may not proceed.All the best!Group member participationThis is a group assignment. The mark awarded for your assignment is conditional on you beingable to explain any of your answers to your tutor or the lecturers if asked.If members of your group do not contribute sufficiently you should alert your tutor as soon aspossible. The tutor has the discretion to scale the groups mark for each member as follows, basedon the outcome of the groups demo in Week 12:Level of contribution Proportion of final grade receivedNo participation or no demo. 0%Passive member, but full understanding of the submitted work. 50%Minor contributor to the groups submission. 75%Major contributor to the groups submission. 100%如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。