辅导COMP9313程序、写作DataNodes课程程序

” 辅导COMP9313程序、写作DataNodes课程程序COMP9313 (20T2) ASSIGNMENT 1Q1. HDFS (30 Marks)Let N be the number of DataNodes and R be the total number of blocks in theDataNodes.Assume the replication factor is 5, and k out of N DataNodes have failedsimultaneously.1. Write down the formula of Li(k, N) for i {1, . . . , 5}, where Li(k, N) isthe number of blocks that have lost i replicas.2. Let N = 500, R = 20, 000, 000, and k = 200. Compute the number ofblocks that cannot be recovered under this scenario. You need to showboth the steps and the final result to get full credit.Q2. Spark (35 Marks)Consider the following PySpark code snippet:raw_data = [(Joseph, Maths, 83), (Joseph, Physics, 74),(Joseph, Chemistry, 91), (Joseph, Biology, 82),(Jimmy, Maths, 69), (Jimmy, Physics, 62),(Jimmy, Chemistry, 97), (Jimmy, Biology, 80),(Tina, Maths, 78), (Tina, Physics, 73),(Tina, Chemistry, 68), (Tina, Biology, 87),(Thomas, Maths, 87), (Thomas, Physics, 93),(Thomas, Chemistry, 91), (Thomas, Biology, 74)]Rdd_1 = sc.parallelize(raw_data)rdd_2 = rdd_1.map(lambda x:(x[0], x[2]))rdd_3 = rdd_2.reduceByKey(lambda x, y:max(x, y))rdd_4 = rdd_2.reduceByKey(lambda x, y:min(x, y))rdd_5 = rdd_3.join(rdd_4)rdd_6 = rdd_5.map(lambda x: (x[0], x[1][0]+x[1][1]))rdd_6.collect()1. Write down the expected output of the above code snippet.12. List all the stages in the above code snippet.3. What makes the above implmentation inefficient? How would you modifythe code and improve the performance?Q3: LSH (35 marks)Consider a database of N = 1, 000, 000 images. Each image in the database ispre-processed and represented as a vector o Rd. When a new image comes asa query, it is Also processed to form a vector q Rd. We now want to check ifthere is any Duplicates or near duplicates of q in the database. Specifically, animage o is a near duplicate to q if cos((o, q)) 0.9. We want to find any nearduplicate with probability no less than 99%.We now design an LSH scheme using SimHash to generate candidate nearduplicates. Assume that for query q, there are 100 images that are near duplicateto q.1. Assume k = 5, how many tables does the LSH scheme require (i.e., L) toensure that we can find any near duplicate with probability no less than99%?2. Consider image o with cos((o, q)) 0.8, k = 5 and L = 10. What is themaximum value of the Probability of o to become a false positive of queryq?You need to show the Intermediate steps along with the final result to getfull credit.SubmissionPlease write down your answers in a file named ass1.pdf. You must writedown your name and Student ID on the first page. You should typeset youranswers in LATEX or MS Word. We do not accept handwritten answers.You can submit your file using the command:give cs9313 ass1 ass1.pdfLate Penalty. We DO NOT ACCEPT LATE SUBMISSIONS (0 mark ifyou do not submit on time).2如有需要，请加QQ：99515681 或邮箱：99515681@qq.com

“

添加老师微信回复‘’官网辅导‘’获取专业老师帮助，或点击联系老师1对1在线指导。

声明：本站包含转载考而思在线或考而思。对于转载内容，本站尊重原创者劳动，保留原文作者或出处。但由于人为因素的限制，难免有疏忽、失误和遗漏，或者内容来源无法查明。如果出现类似这些情况，不管是被转载内容的原作者，还是本站读者，请及时联系本站，以确保第一时间予以修正。

本站辅导：留学课程辅导丨留学生论文辅导丨留学生作业辅导丨留学挂科申诉丨留学生课程预习

推荐：essay代写

相关文章