” 写作CS 540程序语言、 辅导C++编程设计Oregon State University Assignment 4 CS 540, Winter 2021The assignment is to be turned in before Midnight (by 11:59pm) on March 2nd. You should turnin the solutions to the written part of this assignment (questions 1 and 2) as a PDF file throughCanvas. These solutions should be produced using editing software programs, such as LaTeX orWord, otherwise they will not be graded. You should turn in the source code to eachprogramming Question (questions 3 and 4) separately through Canvas. Thus, each group will havethree distinct submissions in Canvas for this assignment. The assignment should be done ingroups of two students.1: Query processing Algorithms (1.5 points)Consider the natural join of the relation R(A,B) and S(A,C) on attribute A. Neither relationshave any indexes built on them. Assume that R and S have 80,000 and 20,000 blocks,respectively. The cost of a join is the number of its block I/Os accesses. If the algorithms need tosort the relations, they must use two-pass multi-way merge sort. You may choose the joinalgorithms in your answers from the ones taught in the class.(a) Assume that there are 10 blocks available in the main memory. What is the fastest joinalgorithm for computing the join of R and S? What is the cost of this algorithm? (0.5 point)(b) Assume that there are 350 blocks available in the main memory. What is the fastest joinalgorithm to compute the join of R and S? What is the cost of this algorithm? (0.5 point)(c) Assume that there are 200 blocks available in the main memory. What is the fastest joinalgorithm to compute the join of R and S? What is the cost of this algorithm? (0.5 point)2: Query processing (2 points)(a) Assume that the entire of relation R(A,B) fits in the available main memory but relationS(A,C) is too large to fit in the main memory. Find a fast join algorithm, i.e., an algorithm withthe lowest number of I/O access, for the natural join of R and S. Justify that your proposedalgorithm is the fastest Possible join algorithm to compute the natural join of R and S. Next,assume that there is a clustered index on attribute A of relation S. Explain whether or how thiswill change your answer. (1 point)(b) Consider relations R(A,B) an S(A,C) that each have 1 million tuples and are too large to fitin main memory. A data scientist wants to compute 10000 (sample) tuples of the natural join ofR and S very fast. Since it is too time-consuming to compute the full natural join of R and S, thedata scientist selects 1% of relation R and 1% of relation S and computes their join. Explainwhether this algorithm returns the desired results. If it does not, propose an efficient algorithmthat returns the desired result without computing the full natural join of R and S. (1 point)3: Sort-merge Join Algorithms (5.5 points)(a) Consider the following relations:1Oregon State University Assignment 4 CS 540, Winter 2021Dept (did (integer), dname (string), budget (double), managerid (integer))Emp (eid (integer), ename (string), age (integer), salary (double))Fields of types integer, double, and string occupy 4, 8, and 40 bytes, respectively. Each block canfit at most one tuple of an input relation. There are at most 22 blocks available to the joinalgorithm in the main memory. Implement the optimized sort-merge join algorithm forDept ./Dept.managerid=Emp.eid Emp in C++. Each input relation is stored in a separate CSV file, i.e., each tuple is in a separate line andfields of each record are Separated by commas. The result of the join must be stored in a new CSV file. The files that store relations Deptand Emp are Dept.csv and Emp.csv, respectively. Your program must assume that the input files are in the current working directory, i.e., theone from which your program is running. The program must store the result in a new CSV file with the name join.csv in the currentworking directory. Your program must run on hadoop-master.engr.oregonstate.edu. Submissions should alsoinclude the g++ command (including arguments) that was used to compile the program.Each student has an account on the hadoop-master.engr.oregonstate.edu server, which is aLinux machine. You can use the following bash command to connect to it: ssh your_onid_username@hadoop-master.engr.oregonstate.eduIt will prompt you for your ONID password. You will need to be connected to the VPN inorder to access the server. You must name the file that contains the source code of the main() function main3.cpp. Ifyou place your source code in multiple files, you may submit all of them in a single zip file. You may use following commands to compile and run C++ code: g++ main3.cpp -o main3.out main3.out4: External Memory Sorting (6 points)(a) Consider the following relation:Emp (eid (integer), ename (string), age (integer), salary (double))Fields of types integer, double, and string occupy 4, 8, and 40 bytes, respectively. Each block canfit at most one Tuple of an input relation. There are at most 22 blocks available to the sortalgorithm in the main memory. Implement the multi-pass multi-way sorting for the relation Empin C++. The relation should be sorted by eid.2Oregon State University Assignment 4 CS 540, Winter 2021 The input relation is stored in a CSV file, i.e., each tuple is in a separate line and fields ofeach record are separated by commas. The result of the sort must be stored in a new CSV file. The file that stores the relationEmp is Emp.csv. Your program must assume that the input file is in the current working directory, i.e., theone from which your program is running. The program must store the Result in a new CSV file with the name EmpSorted.csv in thecurrent working directory. Your program must run on hadoop-master.engr.oregonstate.edu. Submissions should alsoinclude the g++ command (including arguments) that was used to compile the program.Each student has an account on hadoop-master.engr.oregonstate.edu server, which is aLinux machine. You can use the following bash command to connect to it: ssh your_onid_username@hadoop-master.engr.oregonstate.eduIt will prompt you for your ONID password. You will need to be connected to the VPN inorder to access the server. You must name the file that contains the source code of the main() function main4.cpp. Ifyou place your source code in multiple files, you may submit all of them in a single zip file. You may use following commands to Compile and run C++ code: g++ main4.cpp -o main4.out main4.out如有需要,请加QQ:99515681 或WX:codehelp
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。