CISC372编程设计 写作、 辅导program程序

” CISC372编程设计 写作、 辅导program程序CISC372-ParallelProject 6Overview:In the last project, you implemented an image filter program using pThreads. A special case of the filterprogram is called a box blur. Basically, this type of filter involves setting all of the values in the matrix to1, then dividing the result by the number of values in the matrix. = [1 1 11 1 11 1 1The issue we had in the last program, is that when an image is high resolution, a 3×3 filter does very littleto change the appearance of the image. We would like a bigger filter (i.e. the radius here is 1, we mightwant a radius of 20 or 40), but this would make the problem somewhat intractable.A fast way to do this is to simply keep a running sum for each row of the last 2*radius+1 elements, thentake the resultant image, and do the same for each column. If we divide each of these by the width ofthe kernel (2*radius+1), Then We end up computing exactly what the filter computes (average around aradius), with exactly one pass through the columns and one pass through the rows. Now that each rowand each column is independent, we have a hope of parallelizing this algorithm.Project Details:For this project, you may either work alone, or in pairs. You will have until the final Friday (5/14) tocomplete this assignment. If you work in pairs, make sure that the header of all files that you generatecontains the names of both people who worked on the project so that you both get credit. Both peopleshould hand in the final project via Canvas. You may run this code anywhere you like (on PSC, oncisc372 using srun, or on your own machine configured for CUDA). You should hand in your final .cu fileand any other files you produce.Part 1: Fast BlurYou can retrieve my fast blur code from github, along with a sample image (Gauss,jpg) from github at:gsilber/CISC372_HW6 (github.com)Use the included makefile to build the program. You can run it as is by executing ./fastblur gauss.jpg 40where 40 is the desired radius (this is a big image). You can play with different values of radius to seehow it behaves. The radius is dependent on the image resolution. On different resolutions, the radius isa different percentage of the entire image, and thus will have a different blurring effect.Part 2: Simple CUDAIn this part of the project, you Should modify the fastblur.c file to create cudablur.cu (cuda code musthave a .cu extension to work). You will need to change the makefile to use nvcc instead of gcc tocompile for cuda.Rewrite the program, so that each column runs in its own thread. I suggest a thread block size of 256.This means turning the computeColumn function into a kernel, and figuring out the col parameter fromthe threadIdx, blockIdx, and blockdim variables.Then you must sync up the threads with a call to cudaDeviceSync and repeat the process for each row.Finally convert back to uint8_t array, and save the image.I suggest for this part you use cudaMallocManaged and cudaFree for all the arrays to simplify the code.If you have a block size of 256, Then you would have a block count of (width+255)/256 columns. Makesure to check in your kernel function for unused threads where the computed columnpWidth. Do thesame for the rows (height+255)/256. And check the computed row against height. If the height or widthis not divisible by the blocksize, then we will have some extra threads that need to just returnimmediately.Part 3: More advanced CUDAPart 2 is kind of slow. This is because of the managed memory. To speed it up, we want to allocate thememory we need on the device where possible and move that memory with cudaMalloc andcudaMemcpy up to the device for calculation. Then when complete, copy that memory back to the hostin order to save it to the output file. Play with the values for blocksize to try to maximize performance.See how fast you can get the computation to run.What to hand in:Hand in your cudaBlur.cu file from part2, and from part3 along with makefiles for each and any otherfiles you added which are required to Build your program. Make sure your program compiles and runs,and put the system where you ran it in the comments to avoid any confusion.GradingThis is a hard project. My intent is that most people will be able to do part 2, so part 2 is worth 75% ofthe grade on this project. Part 3 is worth the remaining 25%.请加QQ:99515681 或邮箱:99515681@qq.com WX:codehelp

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导