COMP SCI 4094程序 辅导、Database程序 写作

” COMP SCI 4094程序 辅导、Database程序 写作COMP SCI 4094/4194/7094 – Distributed Databases and Data MiningAssignment 2Important Notes Handins:. You must do this assignment individually and make individual submissions. Your program should be coded in C++ and pass test runs on 3 test files. The sampleinput and output files are downloadable in Assignments of the course home page( httpss://myuni.adelaide.edu.au/courses/54718/assignments/176864/). You need to use svn to upload and run your source code in the web submission systemfollowing Web-Submission instructions stated at the end of this sheet. You shouldattach your name and student number in your submission. Late submissions will attract a penalty: the maximum mark you can obtain will bereduced by 25% per day (or part thereof) past the due date or any extension you aregranted. Marking scheme: 12 marks for testing on 3 standard tests: 4 marks per test. 3 marks for the code structure. Note: If it is found your code did not implement the required computation tasksin this assignment, you will receive zero mark regardless of the correctness of testingoutput.If you have any questions, please send them to the student discussion forum. This way youcan all help each other and everyone gets to see the answers.The assignmentIn this assignment you are required to code a traffic packet clustering engine to cluster the rawnetwork packet to different applications, such as https, smtp. To accomplish this assignment,a data preprocessing Module and a clustering module should be implemented, the structure isillustrated below:You have two input files, and you should print two output files.The input file1 contains a distance threshold and the raw network packet information, that is,seven attributes of a packet: source address, source port, destination address, destination port,protocol, arrival time, and packet length. input file1.txt is Sample traffic flow information;Input file2.txt has a number K, and on the next line include K integer numbers represent aninitial set of K medoids.COMP SCI 4094作业 辅导、Database作业In the data preprocessing module, your program should prepare the flow data for clusteringby the raw packet data, two steps are involved: you need to firstly merge the packets into flowsby the rule: a network Flow includes at least TWO packets with same source address, sourceport, destination address, destination port, and protocol, then calculate two clustering features:average transferring time and the average packet length of a flow.In the clustering module, you need to apply k-medoids algorithm (course slides Chapter 10,not the books random method) to find the minimum number of clusters that the sum of thedistance of each flow to its centroid is less than the given threshold. Note: the clustering featurescome from data preprocessing module, the distance measurement is Mannhaton distance.For your convenience, below is the framework of the k-medoids algorithm which you shouldfollow:ExampleSample traffic flow informationsrc addr src port dst addr dst port protocol arrival time packet length202.234.224.254 49880 31.65.181.210 80 6 115258 52202.234.224.254 49880 31.65.181.210 80 6 115307 52202.234.35.144 55256 74.39.124.220 443 6 115310 46119.188.179.82 50592 150.79.7.129 80 6 115314 40202.234.224.254 49880 31.65.181.210 80 6 115341 52119.188.179.82 50592 150.79.7.129 80 6 115350 40119.188.179.82 50592 150.79.7.129 80 6 115363 40Data preprocessing moduleIn the above traffic flow information, there are two flows: The first, second, and fifth packetbelong to the first flow(index is 0); the fourth, sixth, and seventh packet belong to the secondflow(index is 1).The Average Transferring time of first flow = (( the arrival time of fifth packet – the arrivaltime of second packet ) + (the arrival time of second packet – the arrival time of first packet)) (3 – 1) = ((115341 – 115307) + (115307 – 115258)) 2 = 41.5. The Average length of firstflow = (P packet length) 3 = (52 + 52 + 52) 3 = 52. Similarly, the Average transferringtime of second flow = 24.5, the average length of second flow = 40.(arrival time is microsecond(s))Clustering moduleWe use Mannhaton distance to measure the distance between flows. In our sample, the distancebetween the two flows is |41.5 24.5| + |52 40|.Example input initial medoids.txt initial k medoids1 (k=1)0 (Start from index 0, as the initial start medoid)Example OutputAt begin you Should output the flow after Data preprocessing module, include index, averagetransferring time x value and average length y value.ID X YIn this case, flow.txt should print:0 41.50 52.001 24.50 40.00Rounding numbers (X,Y) to 2 decimal place. You can use:cout f ixed setprecision(2) 3.1415926;orprintf(%0.2f, 3.1415926);After doing KMedoid, you will get K clusters. It includes K+2 lines. First line is absoluteerrorcriterion. Next one line include K medoids index. Following each line have several flowindex represent each medoid includes which flows.29.00 (Absolute-error of the cluster,2 decimal place)0 (Medoid is 0)0 1 (This cluster include 2 flows index 0 and index 1)Web-submission instructions First, type the following command, all on one line (replacing xxxxxxx with your studentID):svn mkdir – -parents -m DDDM httpss://version-control.adelaide.edu.au/svn/axxxxxxx/2020/s2/dddm/assignment2 Then, check out this directory and add your files:svn co httpss://version-control.adelaide.edu.au/svn/axxxxxxx/2020/s2/dddm/assignment2cd assignment2svn add KMedoids.cpp svn commit -m assignment2 solution Next, go to The web submission system at: httpss://cs.adelaide.edu.au/services/websubmission/Navigate to 2020, Semester 2, Distributed Databases and Data Mining, Assignment 2.Then, click Tab Make Submission for this assignment and indicate that you agree to thedeclaration. The automark script will then check whether your code compiles. You canmake as many resubmissions as you like. If your final solution does not compile you wontget any marks for this solution. Note:1. Please follow the forms in sample output files.2. Your local file path will not work with our web-submission system.3. We prepared ten test files in web-submission system, when you submit your program,random test files will be allocated for you.4. The auto-marker script compiles and runs named KMedoids.cpp by using followingcommand:g++ -std=c++11 KMedoids.cpp -o runKMedoids./runKMedoids network packets.txt initial medoids.txtIn this assignment, you need to read two files network packets.txt ( network packetstraffic information) and initial medoids.txt (initial medoids) which are generatedrandomly by the system.you should print two output files named med Flow.txt (flow data after preprocessing)and KMedoidsClusters.txt (k-medoids clustering results) as shown in the followingtwosamples:.Example1input:File1.txtsrc addr src port Dst addr dst port protocol arrival time packet length202.234.224.254 49880 31.65.181.210 80 6 115258 52202.234.224.254 49880 31.65.181.210 80 6 115307 52202.234.35.144 55256 74.39.124.220 443 6 115310 46119.188.179.82 50592 150.79.7.129 80 6 115314 40202.234.224.254 49880 31.65.181.210 80 6 115341 52119.188.179.82 50592 150.79.7.129 80 6 115350 40119.188.179.82 50592 150.79.7.129 80 6 115363 40input:File2.txt如有需要,请加QQ:99515681 或邮箱:99515681@qq.com

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导