写作CSCI-1200语言、c++编程调试

” 写作CSCI-1200语言、c++编程调试CSCI-1200 Data Structures Spring 2021Homework 10 Performance and Big O NotationIn this final assignment for Data Structures, you will carry out a series of tests on the fundamental datastructures in the Standard Template Library to first hypothesize and then measure the relative performance(running time memory usage) of these data structures and solidify your understanding of algorithmcomplexity analysis using Big O Notation. The five fundamental data structures we will study are: vector,list, binary search tree (set / map), priority queue, and hash table (unordered set / unordered map).Be sure to read the entire handout before beginning your implementation.Overview of OperationsWe will consider the following six simple, but moderately compute-intensive, operations that are commonsubtasks of many interesting Real-world algorithms and applications. We will test these operations usingintegers and/or STL strings as noted below. Sort – well use the default operator for the specific data type. Remove duplicates from a sequence – without otherwise changing the overall order (keeping the firstoccurrence of the element). Determine the mode most frequently occurring element. If there is a tie, you may return any of themost frequently occurring elements. Identify the closest pair of items within the dataset – integer data only. Well use operator- tomeasure distance. If there is a tie in distance, you may return any of the pairs of closest distance. Output the first/smallest f items – a portion of the complete sorted output. Determine the longest matching substring between any two elements – STL string data only. Forexample, if the input contains the words antelope, buffalo and elephant, the longest substringmatch is ant (found within both antelope and elephant). If there is a tie, you may return any ofthe longest matching substrings.See also the provided sample output for each of these operations.Rules for comparison: For each operation, we will analyze the cost of a program/function that readsthe input from an STL input stream object (e.g., std::cin or std::ifstream) and writes the answer to anSTL output stream (e.g., std::cout or std::ofstream). The function should read through the input onlyonce and construct and use a single instance of the specified STL data structure to compute the output. Thefunction may not use any other data structure to help with the computation (e.g., storing data in a C-stylearray).Your Initial Predictions of Complexity AnalysisBefore doing any implementation or testing, think about which data structures are better, similarly good, oruseless for tackling each operation.Fill in the table on the next page With the big O notation for both the runtime and memory usage tocomplete each operation using that Data structure. If it is not feasible/sensible to use a particular datastructure to complete the specified operation put an X in the box. Hint: In the first 3 columns there shouldonly be 2 Xs! If two or more data structures have the same big O notation for one operation, predict andrank the data structures by faster running time for large data. We combine set map (and unordered set unordered map) in this analysis, but be sure to specify which datatype of the two makes the most sensefor each operation.For your answers, n is the number of elements in the input, f is the requested number of values in the output(only relevant for the first sorted operation), and l is the maximum length of each string (only use thisvariable for the longest substring match operation). Type your answers into your README.txt file.Youll also paste these answers into Submitty for autograding.sortremoveduplicates modeclosestpairfirst fsortedlongestsubstringmatchvectorlistBST (set/map)priority queue/binary heaphash table(unordered_set/unordered_map)Provided FrameworkWe provide a framework to implement and test these operations with each data structure and measure theruntime and overall memory usage. The input will come from a file, redirected to std::cin on the commandline. Similarly, the program will write to std::cout and we can redirect that to write to a file. Some basicstatistics will be printed to std::cerr to help with complexity analysis. Heres examples of how to compileand run the provided code:clang++ -g -Wall -Wextra performance*.cpp -o perf.out./perf.out vector mode string small_string_input.txt./perf.out vector remove_duplicates string small_string_input.txt my_out.txtdiff my_out.txt small_string_output_remove_duplicates.txt./perf.out vector closest_pair integer small_integer_output_remove_duplicates.txt./perf.out vector first_sorted string 3 small_string_input.txt./perf.out vector longest_substring string small_string_output_remove_duplicates.txt./perf.out vector sort string medium_string_input.txt vec_out.txt 2 vec_stats.txt./perf.out list sort Sring medium_string_input.txt list_out.txt 2 list_stats.txtdiff vec_out.txt list_out.txt2The first example reads string input from small string input.txt, uses an STL vector to find the mostfrequently occurring value (implemented by first sorting the data), and then outputs that string (the mode)to std::cout.The second example uses an STL vector to remove the duplicate values (without otherwise changing theorder) from small string input.txt storing the answer in my out.txt, and then uses diff to comparethat file to the provided answer.The next 3 command lines show examples of how to run the closest pair, first sorted andlongest substring operations. Note that the first sorted operation takes an additional argument, thenumber of elements to output from the sorted order. Also note that the closest pair and longest substringoperations are more interesting when the input does not contain duplicate values.The final example sorts a larger input of random strings first using an STL vector, and then using an STLlist and confirms that the answers match.Generating Random InputWe provide a small standalone program to generate input data files with random strings. Heres how youcompile and use this program to generate a file named medium string input.txt with 10,000 strings, eachwith 5 random letters (a-z). And also a file named medium integer input.txt with 10,000 integers, eachwith 3-5 digits (ranging in value from 100-99999).clang++ -g -Wall -Wextra generate_input.cpp -o generate_input.out./generate.out string 10000 5 5 medium_string_input.txt./generate.out integer 10000 3 5 medium_integer_input.txtMeasuring PerformanceFirst, create and save several large randomly generated input files with different numbers of elements. Test thevector code for each operation with each of your input files. The provided code uses the clock() functionto measure the processing time of the computation. The resolution accuracy of the timing mechanism issystem and hardware dependent and may be in seconds, milliseconds, or something else. Make sure you uselarge enough inputs so that your running time for the largest test is about a second or more (to ensure themeasurement isnt just noise). Record the results in a table like this:Sorting random 5 letter strings using STL vector# of strings vector sort operation time (sec)10000 0.03120000 0.06750000 0.180100000 0.402As the dataset grows, does your predicted big O notation match the raw performance numbers? We knowthat the running time for sorting with the STL vector sorting algorithm is O(n log2 n) and we can estimatethe coefficient k in front of the dominant term from the collected numbers.vector sort operation time(n) = kvector sort n log2 nThus, on the machine Which produced these numbers, coefficient kvector sort 2.3 x 107sec. Of coursethese constants will be different on different operating systems, different compilers, and different hardware!3These constants will allow us to compare data structures / algorithms with the same big O notation. TheSTL list sorting algorithm is also O(n log2 n), but what is the estimate for klist sort?Be sure to try different random string lengths because this number will impact the number of repeated/duplicate values in the input. The ratio of the number of input strings to number of output strings isreported to std::cerr with the operation running time. Which operations are impacted by the number ofrepeated/duplicate values? What is the relative impact?Operation Implementation using Different Data StructuresThe provided code includes the implementation of each operation (except longest substring) for the vectordatatype. Your implementation task for this assignment is to extend the program to the other data structuresin the table. You should carefully consider the most efficient way (minimize the running time) to use eachdata structure to complete the operation.Focus on the first three operations from the table first (sort, remove duplicates, and mode). Once thoseare debugged and tested, and youve analyzed the relative performance, you can proceed to implement theother operations.Estimate of Total Memory UsageWhen you upload your code to Submitty, the autograder will measure not only the running time, but alsothe total memory usage. Compare the memory used by the different data structures to perform the sameoperation on the same input dataset. Does the total memory usage match your understanding of the relativememory requirements for the internal representation of each data structure?You can also run this tool on Your local GNU/Linux machine (it may not work on other systems):clang runstats.c -o runstats.out./runstats.out ./perf.out vector sort string medium_string_input.txt my_out.txtResults and DiscussionFor each data type and each operation, run several sufficiently large tests and collect the operation time outputby the program. Organize these timing measurements in your README.txt file and estimate the coefficientsfor the dominant term of your Big O Notation. Do these measurements and the overall performance matchyour predicted Big O Notation for the data type and operation? Did you update your initial answers forthe Big O Notation of any cell in the table?Compare the relative coefficients for different data types that have the same Big O Notation for a specificoperation. Do these match your intuition? Are you surprised by any of the results? Will these results impactyour data structure choices for future programming projects?SubmissionYou must do this assignment On your own, as described in the Collaboration Policy Academic Integrity. If you did discuss the problem or error messages, etc. with anyone,please list their names in your README.txt file. Important Note: Do not include any large test datasetswith your submission, because this may easily exceed the submission size. Instead describe any datasets youcreated, citing the original source of the data as appropriate.请加QQ：99515681 或邮箱：99515681@qq.com WX：codehelp

“