辅导KIT308/408程序、 写作Programming程序、c/c++程序

” 辅导KIT308/408程序、 写作Programming程序、c/c++程序KIT308/408 Multicore Architecture and ProgrammingAssignment 2 SIMDAims of the assignmentThe purpose of this assignment is to give you experience at writing a program using SIMD programming techniques. This assignmentwill give you an opportunity to demonstrate your understanding of:the where there is one there is many approach (WTIOTIM);structures of arrays;the use of SIMD for calculation; andthe translation of conditional Statements into SIMD.Due Date11:55pm Friday 25th of September (Week 10 of semester)Late assignments will only be accepted in exceptional circumstances and provided that the proper procedures have beenfollowed (see the School Office or this link for details) assignments which are submitted late without good reason will besubject to mark penalties if they are accepted at all (see the School office or this link for details on this as well).Forms to request extensions of time to submit assignments are available from the Discipline of ICT office. Requests must beaccompanied by suitable documentation and should be submitted before the assignment due date.Assignment SubmissionYour assignment is to be submitted electronically via MyLO and should contain:An assignment cover sheet;A .zip (or .rar) containing a Visual Studio Solution containing a project for each attempted stage of the assignment (in theformat provided in the downloadable materials provided below).A document containing:A table of timing information comparing the original single-threaded code (from assignment 1), the provided base codetimes, and the Stage 1, 2, 3, 4, and 5 SIMD implementation on each scene file (all running with the maximum threadsnatively supported by your CPU).An analysis of the above timing data. 辅导KIT308/408作业、 写作Programming作业、You do not need to (and shouldnt) submit executables, temporary object files, or images. In particular, you mustdelete the .vs diretory before submission as it just Visual Studio temporary files and 100s of MBs. Do not however delete theScenes folder or the Outputs folder (but do delete the images within this one).Task/Topic Marks1. Boxes Where there is One there is Many (WTIOTIM) 10%Correct implementation of WTIOTIM approach to isBoxIntersected function to calculate distance to nearest object and use in the objectIntersection function (in Intersection.cpp)5%Correct implementation of WTIOTIM approach to create a short-circuiting isBoxIntersected function to return (assoon as possible) whether Any box intersects and use in the isInShadow function (in Lighting.cpp)5%2. Boxes Structures of Arrays 15%Correct declarations of data structures for SoA SIMD form of Box container data 5%Correct code to convert AoS container to dynamically declared SoA structures (conversion should happen after thescene has been loaded)5%Correct rewriting of isBoxIntersected functions and the calls to it to use SoA code 5%3. Boxes SIMD Conversion of Intersection Test 35%A. SIMD Conversion of Short-circuiting isBoxIntersected function 25%Correct and efficient (e.g. no use of if-statements, for-loops, or scalar code) intersection point (and exit point)calculation (i.e. tmin and tmax)5%Correct and efficient (e.g. no use of if-statements) handling of cases where no intersection occurs due to pointbeing outside of box (i.e. first if-statement)5%Correct and efficient (e.g. no use of if-statements) handling of cases where intersection does occur (i.e. second ifstatement)5%Correct declaration of loop constants before loop 5%Correct handling of box list length not being divisible by 8 5%B. SIMD Conversion of non-short-circuiting isBoxIntersected function 10%Correct and efficient (e.g. no use of if-statements) SIMD calculation of closest intersection (i.e. t) 5%Correct and efficient (e.g. no use of if-statements) SIMD calculation of corresponding box index (i.e. index) 5%4. SIMD Lighting 20%(NOTE: this section will require SoA declarations and conversion from AoS for Light container data)Correct SIMD conversion of applyLighting function 5%Correct SIMD conversion of applySpecular function (i.e. to make it handles 8 lights) 5%Correct and efficient SIMD conversion (e.g. no use of switch-statements) of applyDiffuse function withapplyCheckboard, applyCircles, and applyWood functions manually inlined (i.e. to make it handle 8 lights)5%Splitting of applyDiffuse function into two functions, one that can be done before the loop in applyLighting, andone with the part of the calculation that must be done within the loop5%5. SIMD Render Loop and Ray Tracing 10%Correct SIMD conversion of renderSection function (including call to traceRay function and output of pixels) 5%Correct SIMD conversion of traceRay, calculateIntersectionResponse, calculateReflection, andcalculateRefractionOR a well reasoned explanation Of why this conversion is not worth completing (in terms of likely efficiencyincrease)5%Documentation 10%Outputs showing timing information for the base assignment code and Stages 15 (with the maximum threadsnatively supported by your CPU) on all applicable scene files5%Analysis of data (comparisons across the base code and Stages 15) 5%PenaltiesFailure to comply with submission instructions (eg. no cover sheet, incorrect submission of files, abnormalsolution/project structure, etc.)-10%Poor programming style (eg. insufficient / poor comments, poor variable names, poor indenting, obfuscated codewithout documentation, compiler warnings, etc.)up to -20%Lateness (-20% for up to 24 hours, -50% for up to 7 days, -100% after 7 days) up to -100%MarkingThis assignment will be marked out of 100. NOTE: if your code for a particular stage does not correctly produce the expected outputimages, you will only be able to receive a maximum of half Marks for that stage see below for more details.The following is the breakdown of marks:Marking and Correct ImagesAs SIMD code is very very difficult to debug as grows exponentially harder as the amount of it grows there will be limitedopportunity in the marking process to determine where exactly mistakes have been made, and even less chance of being able toprovide fixes to those mistakes.As a result, if your code for a stage does not Produce the expected image, then you will only be eligible for half marksfor that stage of the assignment.In order to work within this constraint, you should attempt translations into SIMD in very small steps (i.e. a single line at a time, oreven a partial line at a time). The lectures will demonstrate this approach to SIMD translation. You will likely receive more marks fora partially complete SIMD translation than a fully complete translation that doesnt work 100%.Stage 4 and Stage 5As these stages may require the use of less accurate approximations to some mathematical functions, the concept of correct imagewill be somewhat relaxed from having to be an exact match as specified by ImageMagick an acceptable visual match will likely beenough.Programming StyleThis assignment is not focussed on programming style (although it is concerned with efficient code), but you should endeavour tofollow good programming practices. You should, for example:comment your code;use sensible variables names;use correct and consistent indenting; andinternally document (with comments) any notable design decisions.[NOTE: any examples in the provided assignment materials that dont live up to the above criteria, should be considered to bedeliberate examples of what not to do and are provided to aid your learning ;P]The Assignment TaskYou are to modify the (square-based) multithreaded implementation of a simple raytracer from the first assignment to takeadvantage of SIMD instructions. This Rrequires changes across multiple files.To aid you in this conversion, the sphere/ray intersection code has already been translated into SIMD code (by following thetechniques required to do stages 13 of the assignment).From the provided (square-based) multithreaded raytracer implementation, for Stages 13 of the assignment you will createmultiple subsequent versions that modify the box implementation as follows:1. Rewritten functions for the box/ray intersection tests in a where-there-is-one-there-is-many approach.2. Translation of array-of-structures (AoS) with structures-of-arrays (SoA) for the box container.3. Optimisation of the box/ray intersection test to take advantage of SIMD.For Stage 4, you will follow a similar pattern, but for lights and the lighting calculations.For Stage 5, you will attempt to rewrite the main rendering loop and subsequent functions.Implementation SIMD Boxes (Stages 13)The following section describes in detail the steps required to complete Stages 13 of the assignment. If you complete this step, onlythen should you attempt Stage 4 and Do similar steps for lights. If you complete this step, only then should you attempt Stage 5.1. Where there is One there is ManyThis stage involves replacing the isBoxIntersected function with two new versions that use the where there is one there is manyparadigm.In order to complete this step you will need to:Write two functions that take the entire box container (or at least a pointer/reference) as an argument and will performwhatever action took place at the calling site. There are two versions as the isInShadow function (in Lighting.cpp) uses ashort-circuited approach, exiting as soon as any box collision is discovered, and the objectIntersection function (inIntersection.cpp) finds the closest box intersected with (which means it must examine them all).A version of isBoxIntersected that returns a boolean depending on whether or not a box intersects with the ray (i.e.stopping as soon as one is found).A version of isBoxIntersected that returns void and updates its t parameter to be the closest box that intersects withthe ray.2. Structures of ArraysThis stage involves modifying the Scene struct (Scene.h) to store a duplicate of the box data via structure of arrays rather than thecurrently existing array of structures (currently in the boxContainer variable).In order to complete this step you will need to:Create a SoA copy of the data that is stored in boxContainer in the Scene struct.NOTE: this means the program has two copies of the box data, the original one in AoS form and a second one in SoAform. As this assignment predominately only requires changes to isBoxIntersected (and the sites of the calls to thisfunction) the rest of the code will still use the original AoS version of the data.Fill this SoA copy of the box data after the Scene struct has been loaded (dynamically allocating memory for the SoArepresentation).Rewrite the isBoxIntersected function to make use of the SoA.At times this code update may require conversions to and/or from Vector structs to the equivalently stored data in the SoA.At the end of this stage the Program should still produce the same results as the base code. Be thorough in your testing (i.e. test allthe scenes) to ensure that everything works correctly before progressing.3A. SIMD Conversion of isBoxIntersected functionThis stage involves converting the first of the two new version of the isBoxIntersected function from Stage 2 (the bool returningone) into a SIMD implementation.In order to complete this step you will need to:Convert many of the input parameters (or parts of them) into SIMD-ready values (e.g. the ray starting location and directionneed to be converted into a SIMD format).Step through the SoA in chunks proportional to the number of value stored in each SIMD variable (i.e. 8).Convert the calculation to use SIMD. Care must be taken to correctly deal with the various conditional expressions in the loop.There are two if-statements that must be converted to SIMD code via the use of appropriate masking statements.Ensure your approach deals with the situation where the box count isnt equally divisible by the number in each SIMDcalculation (e.g. 9 boxes with 8 in each SIMD value).3B. SIMD Conversion of isBoxIntersected functionThis stage involves converting the latter of the two new versions of the isBoxIntersected function from Stage 2 (the void returningone) into a SIMD implementation.Most of this conversion is identical to step 3, and the code can just be copied from that function. It does however require theaddition of:New variables to keep track of the closest intersection found, and which box index it was found at.Consolidation of the values calculated using SIMD into a single scalar return value. This should be done after the loop finishes.Hints / TipsThe techniques required to complete each stage rely heavily on work done in the SIMD tutorials and the step-by-stepapproach shown in lectures refer to them often.When implementing the SIMD stage, its best to SIMD-ify small portions of code at a time (e.g. even a single line is often alot) and then perform the rest of the calculation through the existing scalar code, or even compare the result of the SIMDversion to the existing scalar code.Again for SIMD, its helpful to Render scenes at a greatly reduced size for testing, with an equally small block size, one sampleper pixel, and with one thread (e.g. try using allMaterials -size 8 8 -blockSize 8 -samples 1 -threads 1). It may even behelpful to fix the number of rays cast (MAX_RAYS_CAST) to be 1, so that reflection and refraction dont occur. This doesntproduce a very nice image, but images can be easily compared visually for problems (the example is only 64 pixels) andprintf statements can be used to verify the correctness of calculations without outputting a huge amount of data (the use ofdebuggers is somewhat perilous in a multithreaded environment, but that shouldnt be an issue here).When converting conditional statements to masks it can be helpful to do this with scalars first (although C/C++implementations often use 1 rather than 0xFFFFFFFF for true, so the calculations may be different).Write functions to output all the elements in a SIMD value for easier printf debugging (I am not a fan of how these types areshown in the debugger).Implementation SIMD Lighting (Stage 4)The following section describes in detail the steps required to complete Stage 4 of the assignment. You should only attempt this stepif you have fully completed (and tested) Stages 13. If you complete this step, only then should you attempt Stage 5.This stage involves rewriting the applyLighting function and a number of functions it calls. In order to complete this step you willneed to:Create SoA declarations and conversions from AoS for Light container data (in a similar way that this was done for spheresand boxes).Convert the applyLighting function to SIMD (i.e. handle multiple lights simultaneously).NOTE: the call to isInShadow is not suitable for SIMD conversion, so a simple for loop that repeatedly calls the functionand stores each result in turn in a SIMD vector is acceptable.Convert the applySpecular function to SIMD (i.e. perform this for 8 lights and produce 8 colours).Create a normal non-SIMD applyDiffuse function that has applyCheckboard, applyCircles, and applyWood functions manuallyinlined before attempting the conversion to SIMD.Convert this new applyDiffuse function to SIMD (i.e. perform this for 8 lights and produce 8 colours).Determine what parts of applyDiffuse are the same regardless of where the light is, and separate the function into two parts,and then integrate this into applyLighting.Missing Mathematical FunctionsThere are no definitions Of a few needed mathematical functions in the AVX/AVX2 instruction sets:The powf function, as used in the applySpecular function in Lighting.cpp.The sinf and cosf functions, as used in the applyWood function in Texture.cpp.The expf function, as used in the convertToPixel method in Colour.h.Definitions for these functions can either be sourced from:The SVML api if your processor / compiler supports them (e.g. _mm256_sin_ps).From the included MathSIMD.h file (e.g. _sin256_ps). NOTE: when using this library, function precision will likely bereduced, resulting in output images diverging somewhat from the reference images).By writing your own function that just calls the required function 8 times for each slot in the vector (e.g. sinf).Hints / TipsAll the hints/tips from the stages 13 apply here (i.e. complete this in small steps, use 8×8 images as tests, use a lot ofprintfs, etc.).You may find it helpful to define some helper classes similar to Vector8 to aid with the completion of this step, e.g. Colour8(and maybe even Ray8). Also in a similar fashion to Vector8, defining useful operators can help with translation and codereadability.Implementation SIMD Main Rendering Loop (Stage 5)The following section describes in Detail the steps required to complete Stage 5 of the assignment. You should only attempt this stepif you have fully completed (and tested) Stages 14.This stage involves rewriting the renderSection function and a number of functions it calls. In order to complete this step you willneed to:Determine how to introduce SIMD into the main loop within renderSection with the aim of generating and writing 8 pixels ofthe image at a time.Determine how to convert traceRay, calculateIntersectionResponse, calculateReflection, and calculateRefraction to SIMD.At this point you can either complete this conversion, or explain why this conversion is not likely to yield significant effeciencyincreases to be worth the effort if you determine this to be the case.Timing TestAverage Time of 5 Runs (Milliseconds)1. -input Scenes/cornell.txt -size 1024 1024 -samples 42. -inputScenes/allmaterials.txt -size1000 1000 -samples 43. -inputScenes/5000spheres.txt -size 1280 720 -samples 14. -input Scenes/dudes.txt -size 1024 1024 -samples 15. -input Scenes/cornell-199lights.txt -size 10241024 -samples 1Test Image Result1. -blockSize 8 -size 8 8 -samples 1 -threads 12. -blockSize 16 -size 256 256 -samples 13. -blockSize 16 -size 256 256 -samples 2DocumentationWhen completing either stage of the assignment you should provide:timing information for each scene file for the average time (to 1 decimal place) taken over 5 runs for a render using a threadcount supported by your the number of logical processors in your system and a block size of 16. See a later section for anexample format for this timing table; andan explanation of the results (e.g. why theres no difference between the performance of stages 1, 2, and 3 (NOTE: this is amade up example and isnt necessarily what to expect), or why a particular type of implementation works well (or poorly) ona particular scene, etc.). This explanation should be with respect to the CPU on the system on which you ran the tests, andyou should discuss how the architectural features of the CPU explain the results.Tests / TimingThe following table lists all the tests that your code needs to generate correctly at each stage. It also shows the timing tests thatneed to be performed in order to fully complete the documentation section of the assignment. Fully completing this tests may takeup to an hour (with the 5 required runs) on some hardware, so plan your time accordingly.In order to confirm your images match the images created by the base version of the assignment code, its strongly recommendedyou use a image comparison tool. For part of the marking for this, Image Magick will be used (as it was in Assignment 1).The following tests will be run on your code for each scene file. You also might find the 8×8 tests useful for your SIMD codeconversion:Provided MaterialsThe materials provided with this assignment contain:The source code of the base multi-threaded version of the raytracer (i.e. a solution to Assignment 1).The source code of the base multi-threaded version of the raytracer with a SIMD implementation of the Ray-Sphere test.A set of scene files to be supplied to the program.A set of reference images for testing.Some batch files for testing purposes.Download the materials as an ZIP file.Source CodeThe provided MSVC solution, Contains 7 projects.RayTracerAss2The provided code consists of 19 source files.Raytracing logic:Raytrace.cpp: this file contains the main function which reads the supplied scene file, begins the raytracing, and writesthe output BMP file. The main render loop, ray trace function, and handling of reflection and refraction is also in thisfile.Intersection.h and Intersection.cpp: these files define a datastructure for capturing relevant information at the point ofintersection between a ray and a scene object and functions for testing for individual ray-object collisions and ray-scenecollisions.Lighting.h and Lighting.cpp: these files provide functions to apply a lighting calculation at a single intersection point.Texturing.h and Texturing.cpp: these files provide functions for the reading points from 3D procedural textures.Constants.h: this header provide constant definitions used in the raytracing.Basic types:Primitives.h: this header contains definitions for points, vector, and rays. It also provides functions and overloadedoperators for performing calculations with vectors and points.SceneObjects.h: this header file provides definitions for scene objects (ie. materials, lights, spheres, and boxes).Colour.h: this header defines a datastructure for representing colours (with each colour component represented as afloat) and simple operations on colours, including conversions to/from the standard BGR pixel format.Scene definition and I/O:Scene.h and Scene.cpp: the header file contains the datastructure to represent a scene and a single function thatinitialises this datastructure from a file. The scene datastructure itself consists of properties of the scene and lists of thevarious scene objects as described above. The implementation file contains many functions to aide in the scene loadingprocess. Scene loading relies upon the functionality provided by the Config class.Config.h and Config.cpp: this class provide facilities for parsing the scene file.SimpleString.h: this is helper string class used by the Config class.Image I/O:ImageIO.h and ImageIO.cpp: these files contain the definitions of functions to read and write BMP files.Miscellaneous:Timer.h: this class provides a simple timer that makes use of different system functions depending on whetherTARGET_WINDOWS, TARGET_PPU, or TARGET_SPU is defined (we dont use the latter two, but I left this file unchanged in caseanyone wanted to see how such cross-platform stuff can be handled).Stage0_SIMDSpheresThe provided code consists of 21 source files.There are a number of modified files:Scene definition and I/O:Scene.h: SIMD SoA Datastructures for representing spheres.Raytracing logic:Raytrace.cpp: simdifySceneContainers makes a copy of the SoA sphere data in AoS SIMD form.Intersection.h: prototypes of WTIOTIM short-circuiting and non-short-circuiting versions of isSphereIntersected.Intersection.cpp:selectMinimumAndIndex helper function.WTIOTIM short-circuiting version of isSphereIntersected.WTIOTIM non-short-circuiting version of isSphereIntersected.A commented out version that of isSphereIntersected that doesnt use helper functions/classes fromPrimitivesSIMD.h (just for reference).A call to non-short-circuiting version of isSphereIntersected in the objectIntersection function.Lighting.cpp: call to short-circuiting version of isSphereIntersected function in isInShadow function.There are two new files (not included in the base project):SIMD Helpers:PrimitivesSIMD.h: this header contains operators for many common SIMD functions as well as a definition for Vector8(as a representation for 8 vectors). It also provides functions and overloaded operators for performing calculations withVector8s.MathSIMD.h: this header file provides definitions for some math/trigonometry functions not provided on most systems(e.g. sin, cos, pow, exp, etc.). NOTE: this is not needed until Stage 4.Stage1 Stage5These projects are empty.To begin work on the assignment you should (in Windows Explorer) copy all of the 21 .h and .cpp files from Stage0_SIMDSphere intothe Stage1 folder and Then right-click on the Stage 1 in Visual Studio and choose Add / Exiting Item… and add those 21 files.ExecutingThe program has the following functionality:By default it will attempt to load the scene Scenes/cornell.txt and render it at 1024×1024 with 1×1 samples (using themaximum number of threads supported by the CPU natively, with a block size of 16).By default it will output a file named Outputs/[scenefile-name]_[width]x[height]x[sample-level]_[executable-filename].bmp(e.g. with all the default options, Outputs/cornell.txt_1024x1024x1_RayTracerAss2.exe.bmp)It takes command line arguments that allow the user to specify the width and height, the anti-aliasing level (must be a powerof two), the name of the source scene file, the name of the destination BMP file, and the number of times to perform therender (to improve the timing information).Additionally it accepts some arguments for setting the number of threads, whether each thread will tint the area that itrenders, whether each thread will instead colour the area rendered by the thread as a solid grey, and the size of the block torender.It loads the specified scene.It renders the scene (as many times as requested).It produces timing information for the average time taken to produce a render ignoring all file IO.It outputs the rendered scene as a BMP file.For example, running the program at the command line with no arguments would perform the first test (as described in the scenefile section):On execution this would produce output similar to the following (as well as writing the resultant BMP file toOutputs/cornell.txt_1024x1024x1_RayTracerAss2.exe.bmp):rendered 1048576 samples using 8 threads, average time taken (1 run(s)): 578.0msTesting Batch FilesA number of batch files are provided that are intended to be executed from the command line, e.g.For timing:baseTiming.bat will perform all the timing tests required for Stage 1 (i.e. 5 runs with the appropriate amount of threadsfor each Test scene).stage0Timing.bat will perform all the timing tests required for Stage 0 (SIMD Spheres).stage1Timing.bat will perform all the timing tests required for Stage 1.stage2Timing.bat will perform all the timing tests required for Stage 2.stage3Timing.bat will perform all the timing tests required for Stage 3.stage4Timing.bat will perform All the timing tests required for Stage 4.stage5Timing.bat will perform all the timing tests required for Stage 5.For testing (requires Image Magick installation), e.g.:stage1Tests.bat will perform all the comparisons required for Stage 1 Tests.stage2Tests.bat will perform all the comparisons required for Stage 2 Tests.stage3Tests.bat will perform all the comparisons required for Stage 3 Tests.stage4Tests.bat will perform all the comparisons required for Stage 4 Tests.stage4Tests_MathSIMD.bat will perform all the comparisons required for Stage 4 Tests if you use theMathSIMD.hfunctions.stage5Tests.bat will perform all The comparisons required for Stage 5 Tests.stage5Tests_MathSIMD.bat will perform all the comparisons required for Stage 5 Tests if you use theMathSIMD.hfunctions.如有需要,请加QQ:99515681 或邮箱:99515681@qq.com

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导