” 辅导CSCI 2122、 写作Thread Pools编程Lab 9: Multithreading and Thread PoolsCSCI 2122 – Fall 20201 IntroductionThis lab is designed to introduce you to the basics of multithreading. By the end of this lab, you will be expected tounderstand how to create threads, handle critical sections using semaphores, and create thread pools.In this lab you are expected to perform the basics of cloning your Lab 9 repository from the GitLab course group.A link to the course group can be found here and your repository can be found in the Lab9 subgroup. See theLab Technical Document for more information on using git. You will notice that your repository has a file in theLab9 directory named delete this file. Due to the limitations of GitLab, we are not able to push completely emptydirectories. Before you push your work to your repository (which should be in the Lab9 directory anyway), makesure to first use the git rm command to remove the extra file. If you do not, your pipeline could fail.Be sure to read this entire document before starting!12 MultithreadingMultithreading is a common practice in performance computing and is the primary way to get the most out of yourmodern CPUs. As the size of components gets smaller and smaller, they are now very close to the smallest componentwe can pass electricity through without causing interference with other components. Due to this limitation,manufacturers are now (instead of making things smaller) putting multiple processing units on their CPUs, calledcores. These cores can act as independent processing units from the point of view of the operating system (and thusyour software), allowing you to create lightweight processes called threads, which can be processed in parallel.Often you will hear about limitations in software design which keep CPUs from reaching their maximum potential ona single process. Most software, and especially as you go back in time, tends to be single-threaded from the point ofview of the operating system, meaning very little of the software is capable of utilizing more than one core of a CPUat a time, which is a leftover practice from times when CPUs werent capable of the kind of multiprocessing theyreable to achieve Now. Video games are a huge culprit, as many of the popular engines still run on a single catchall frame calculation cycle which acts as a single monolithic main loop for all of the games logic and graphics handling.In this lab we will discuss Some basic principles for handling multithreaded software, then code some simple examplesbefore moving on to more heavy-duty applications.2.1 Heavyweight vs. Lightweight ProcessesThe first thing to understand is the difference between heavyweight and lightweight processes, which is a very straightforwarddistinction: programs are heavyweight processes and threads are lightweight processes.When it comes to heavyweight processes, the operating system treats them as distinct entities in the memory space.Theyre cloned from a previous heavyweight process, and then the cloned data is overwritten with the new programso that it is distinct relative to the code which it was loaded from. Once the heavyweight process is creating bythe operating system, it has its own memory, its own address space, and its memory is protected so that no otherprograms are able to access it without permission of the system or the process itself. These are the programs you seeif you right your Task Manager in Windows, or htop on Timberlea. This is also how every one of your C programsstarts when you execute it.A lightweight process is a member of a heavyweight process. In general, theyre viewed through the lens of a thread.Threads are sub-programs of a heavyweight program and do not have their own memory assigned by the operatingsystem. Instead, they share the existing memory already allocated to the heavyweight program they belong to. Thismeans that threads are able to See any global variables, functions, and definitions held within a program and areable to communicate through those mechanisms freely, as long as the code designer has allowed that kind of interaction.Even though threads are considered a part of a heavyweight process, they are still scheduled independentlyto ensure maximum resource utilization. You will learn more about process scheduling in the Operating Systems class.When a heavyweight program is allowed to run by the operating system, the system also decides which of theprograms threads it will place on the CPU to execute. Threads will run freely until their subroutine is completed,at which point their return value is stored and the thread is destroyed. Since it is up to the operating system whichthreads are allowed to Execute on a CPU at any given time, theres no guarantee which order the threads will beexecuted in, or for how long, which is important to note when were trying to work with shared memory later.2.2 The POSIX Thread LibraryWhen working with threads in this course, we will be using the POSIX Thread Library, which is normally referredto as pthreads. The pthreads library is available on Timberlea and will supply you with all of the functionality youneed to take advantage of multiprocessing in C. You can view the man page for the pthreads library by entering manpthreads into the terminal on Timberlea. You will also find additional man pages for all of the commands specificto the pthreads library by entering them into the man program accordingly. Note that we will not be using any ofthe advanced features of pthreads, mostly because its not necessary under most use cases, but also because we dontwant to make the concept of threads too complicated.2.3 Importing and Compiling POSIX ThreadsIn order to import the pthreads library into your code, there are a few things to consider. The first thing to understandis that pthreads do not like to be compiled without also being linked. This means that you will not be ableto easily apply the -c option in your gcc commands in your Makefile. When you compile a C source file (.c) whichincludes pthreads, you will have to do it directly. For this reason, in this lab, you will not be required to make anyobject files (.o) for any source file which imports the pthreads library.To import the Pthreads library into your code, you will need to use #include pthread.h. This will give youaccess to all of the pthread functions outlined below. You will also need to include the library import option on yourgcc command, Which is -lpthread. This should be the last thing attached to any gcc command which requires it toensure maximum compatibility with your files and other gcc options.2.4 Creating a POSIX ThreadThe basics of a pthread revolve around creating functions which your threads will execute to completion. Thesefunctions need to be constructed in a very particular fashion before being passed to the pthread library to have thethread created and executed.2In order to Create and execute a pthread, you will need to use pthread create function. The pthread createfunction takes four parameters, which are outlined as follows:1. A pointer to a thread ID, represented by the pthread t data type.2. A struct of attributes which you wish to modify, represented by the pthread attr t data type. For ourpurposes, this will always be set to NULL.3. A function pointer to a function which the thread will execute when it is created.4. A void* to any data you wish to pass to the function held by this thread when it begins execution. In this lab,we will create structs For this purpose.When the pthread is created successfully, it is also immediately executed. The pthread itself is held internally, inthe systems created by the pthread library, and thus you do not have direct access to it. However, when yourpthread create function ends, it assigns an integer value to the provided pthread t, which can be used later todesignate which Thread you would like to reference in further function calls to the pthread library. In practice, thepthread t type is and int value, although its exact size is implementation-defined, and thus it is not appropriate touse integer types directly. You can see an example of how to create a few simple pthreads here:1 // Compile with : gcc –std = c18 fileName .c -lpthread2 # include stdio .h 3 # include pthread .h 45 void * example ( void * args )6 {7 pthread_t me = pthread_self () ;8 printf ( This is inside thread %ld .\n, me ) ;9 }1011 int main ( int argc , char ** argv )12 {13 pthread_t thread ;1415 pthread_create ( thread , NULL , example , NULL );16 pthread_create ( thread , NULL , example , NULL );17 pthread_create ( thread , NULL , example , NULL );18 pthread_create ( thread , NULL , example , NULL );19 pthread_create ( thread , NULL , example , NULL );2021 return 0;22 }When this program executes, it will create five threads, each executing a single print statement which prints itsthread ID. Each thread in this case is being executed using a single pthread t variable, so each time a thread is executedits ID is lost. For the purposes of the example, this is sufficient. In a real execution scenario, creating multiplethreads is better Done with an array of pthread t values. Creating an array of pthread t values is no different than creatingany other array. It can be iterated through with pthread create calls to initialize and execute all of your threads.You may notice that we used the pthread self() function. This function, when used inside a thread, will return youthe ID it has Been assigned. This can be useful for determining which thread is which in situations where it may beimportant to have their execution monitored or synchronized with other threads.2.5 Passing Arguments to a POSIX ThreadTo pass arguments to a pthread, you can convert any type of data into a void* and pass it into the pthread createfunction as the final argument. This can be any data you choose, although we recommend using structs for anythingbeyond a simple value, as theyre the easiest way to manage a variety of different data types that would normally beassociated with a single function.An example of passing data to a pthread can be seen here:1 # include stdio .h 2 # include pthread .h 34 typedef struct _Args5 {6 char * this ;7 int that ;8 float other ;9 } Args ;1011 void * example ( void * args )12 {13 Args * arg = args ;14 printf (My arguments are : %s %d %f\n, arg – this , arg – that , arg – other );15 }1617 int main ( int argc , char ** argv )18 {19 pthread_t thread ;20 Args arg ;21 arg . this = Hello !;22 arg . that = 13;23 arg . other = 815.0 f;24 pthread_create ( thread , NULL , example , arg );25 return 0;26 }3As you can see, if you have a firm understanding of how void pointers work (and you should by now, after all of thelists and collections youve had to set up with void pointers!), it should be fairly easy to pass arguments into yourthreads function during creation. We can simply cast the incoming void pointer to whatever struct type were usingand have access to all of the fields, assuming it was properly allocated. In the above example you may notice that Idid not manually allocate the Args struct. You can allocate it manually if you so desire, but I did not for this example.When you run this code, you may notice that sometimes it doesnt print anything at all. Whats going on? It turnsout that, by default, your program will not wait for the individual threads to finish. If the program creates a threadand then exits too quickly, the thread may not have time to properly execute and will be cancelled by the operatingsystem when the heavyweight process ends. How can we stop that from happening?2.6 Joining a POSIX ThreadTo ensure your Threads finish their execution, you can perform a pthread join on them. Joining a thread to yourprogram has two benefits. First, joining a thread stops the main program logic from continuing until the thread inquestion stops. If you have multiple threads currently executing and you want them to be guaranteed to finish, youcan join each one in your code, one after the other. This can be done manually, with a series of individual lines, orvia a loop if you have to iterate through an array of thread IDs.The second benefit of using a join is that youre able to receive a return value from the function the thread is running.You may have noticed in the previous code snippets that the example function has a very specific signature: it mustreturn a void*, And it must also accept a void* as a function parameter. We saw in the previous example thatwe can pass a void* into the function via the pthread create function. In order to retrieve data from the function viaa return statement, we must do so with a pthread join function call.A pthread join takes in two parameters:1. A thread ID value, represented by the pthread t data type. Note that unlike pthread create, this is not apointer.2. A void** value for holding the returning value after the join has completed.The second parameter Can be a little strange at first. The reason it is a void** and not a void* is because thepthread join function has to be able to give you the pointer to the data inside it. If you only have it a void*, itwould only be able to affect the data the pointer is pointing to. What the void** allows the function to do is not justchange the data in the pointed memory location, but it can change the whole void* to a totally different pointerlocation. While this Seems complicated, all you really need to do is create a pointer for the data type youd like tostore the returned value in, then pass in the address of that variable. You can see an example of a return value witha join here:1 # include stdio .h 2 # include stdlib .h 3 # include pthread .h 45 typedef struct _Args6 {7 char * this ;8 int that ;9 float other ;10 } Args ;1112 void * example ( void * args )13 {14 Args * arg = args ;15 printf (My arguments are : %s %d %f\n, arg – this , arg – that , arg – other );1617 int * value = malloc ( sizeof ( int ));18 * value = 15;1920 return value ;21 }2223 int main ( int argc , char ** argv )24 {25 pthread_t thread ;2627 Args arg ;28 arg . this = Hello !;29 arg . that = 13;30 arg . other = 815.0 f;3132 int * result = NULL ;3334 pthread_Create ( thread , NULL , example , arg );3536 pthread_join ( thread , ( void **) result );3738 printf ( Returned Value = %d\n, * result );3940 return 0;41 }You will notice a few things in this code. First, we dont allocate the result variable. This is not necessary, as thething being returned is being allocated. Its important to allocate the data you plan on returning and storing it ina pointer. Failure to do so could lead to your values being deallocated when your function ends. Always remember4that if you dont allocate something yourself inside a function, C will automatically deallocate it when the functionends. Normally this isnt a problem because C will pass-by-copy, but there are situations where you can run into badcopies. For example, if you try to create int value = 15 and then return value, C will inform you that you will losethe value of 15 because int value is local to the function and we be freed automatically when the function ends.You should also notice that we specifically have to convert the results address (being passed into the join function)to a void** in order for this to work. If you dont include that type cast, C will complain that the address types donot match.2.7 Returning Values without JoinWhile the above section says you can receive values from the thread by calling a pthread join and giving it a correctdouble pointer to Store the return value in, its also possible to return values in other (less clean) ways. Since you arehanding in a pointer to an argument struct, theres nothing stopping you from creating a field in that struct whichis capable of storing an output value (or a value for determining ongoing status). Since you still have access to thatargument struct on the outside of the thread, having the thread update that struct while you periodically check itfrom outside the thread could prove useful, easier, and more convenient (depending on the situation) than using ajoin. Remember That using a join forces your code to stop and wait for the thread, and you may not necessarily wantto do that to see whats happening inside!2.8 Critical Sections and Race ConditionsSince your threads simply execute and are not directly controlled by you after theyre created, you can run intoproblems with certain types of code where its possible for two threads to operate on the same piece of data simultaneously,possibly creating instability in your data structures. Consider the following example:You create an array list and decide that you want to add 100,000,000 integers values to it. Since it would take asingle thread a very long time to read in and add all of those values to the array list, you decide to create ten threadsto split the job up. That way each thread can add 10,000,000 values for you, and since theyre in parallel they shouldtake about 1/10 of the time.However, because the operating system doesnt know what the threads are doing and is likely to want to let everythread have at least some execution time, it will let each thread run for 10 seconds. Your first thread starts running(along with a few others) and everything seems fine, until it gets very close to the moment when your thread will bemoved off of the CPU to give another thread some time to execute. Your thread gets the value 27 and tries to addit to the end of your array list. It manages to get the memory allocated, stores the 27 inside it, and just before itmanages to increase the size of your array list by 1, the operating system swaps it for another thread on the CPU.That new thread then tries to add something to the array list, but because the first thread wasnt able to increase thesize in time, this new thread adds something to the end of the array list, which it sees as the same index as the lastthread. It allocates new memory to the last index, overwriting the value 27 and leaking that memory (since we nolonger have a pointer to it) and then increments the size of the array. It eventually is switched out by the operatingsystem and the original thread is returned to running from the same place it left, where it increments the size andmoves on long nothing happened.So from this situation, weve lost one of our values (27), and the array list thinks that it has one more element insideit than it actually Does, meaning the stability of the array list is now broken. Its possible that this situation couldhappen multiple times and you could end up with some serious errors down the road.This is referred to as a race condition, where each thread is attempting to change some shared data before theothers are able to do so. The place where this fight for shared data control is referred to as a critical section in yourcode, and it is important to protect your critical sections from the impact of multiple threads fighting for data control.To avoid this problem, we will implement a type of code locking structure called a semaphore. A semaphore is asimple piece of code which acts as a check-in or waiting area for your threads. In practice, a semaphore is a verysimple piece of code which acts as a number. The number is initially set equal to the number of threads you wantto allow access to your critical section. In this lab, that number will be 1, to ensure that the critical section isentirely mutually exclusive, sometimes shorted to mutex. Mutually exclusive things, by definition, cannot happentogether, so when you see people talk about something mutually exclusive, in means that only one thing in the listcan happen at a time. In this case, threads in the critical section will be considered mutually exclusive (only onethread can process at a time) if the semaphore is working correctly.Every time a thread reaches a semaphore wait point, it checks to see if the semaphore is greater than 0. If it is, itwill automatically reduce the value of the semaphore by 1 and then proceed past the wait point. If the semaphoreis 0 or less, the thread will block. A block is what occurs when the operating system is waiting for some kind offeedback from the user, but it can also be used to temporarily put a process to sleep. This forces a new thread tobe loaded while the previous thread waits for the semaphore to go back to positive. This can happen to multiplethreads, making them all stop and wait at the wait point.When a thread Enters a critical section, it is able to perform any calculations on the critical section it desires. Whenit is done, it will move through a semaphore post point. A post point is where the thread lets the semaphore knowthat it has completed its work inside the critical section and thus the next thread is free to move inside. When itreaches the post point, it tells the semaphore to increase its value by 1. If this makes the semaphore positive, thenext thread will proceed past the wait point (decreasing the semaphore value by 1) and the process will repeat.5We can create a semaphore with the pthread library. This is done by importing the semaphore.h library (which isincluded in the pthread library). This gives us access to the sem t data type, the sem init function, the sem waitfunction, and the sem post function.2.9 Using POSIX SemaphoresSimilar to pthreads, semaphores are created using their own data type, sem t. These are best used in a global scope(outside of any function) and can be declared below your includes and defines. Once a semaphore is created, you willneed to initialize it before you start creating threads. This can be done with the sem init function. This functionaccepts three parameters:1. A pointer to a sem t value. The value of a semaphore is assigned by the operating system, but is generally asemaphore value plus a waiting queue.2. An integer flag for determining whether or not this semaphore should be shared by sub-processes. Leave thisset to 0.3. An integer for setting the initial semaphore value. In this lab, setting this to 1 should suffice.Once a semaphore is initialized, it can be freely used in your code. Once you have identified a critical section, you canplace a sem wait Function call before it. The sem wait function accepts a pointer to a sem t type, which determineswhich semaphore the threads should be waiting in. If you have more than one critical section, you should also havemore than one semaphore, as each should be filtering threads into different blocks of code.At the end of your critical section, you should include a sem post function call, which accepts a single pointer to asem t value. The pointer passed in should match the pointer passed into the original sem wait call. Dont mix theseup, and if you have multiple semaphores nested together, make sure you are posting them in the correct order.An example of a semaphore can be seen here:1 # include stdio .h 2 # include pthread .h 3 # include semaphore .h 4 # include unistd .h 56 sem_t wait_here ;78 void * example ( void * args )9 {10 sem_wait ( wait_here );1112 printf ( Sleeping for 2 seconds …\ n);13 sleep (2) ;14 printf ( Woke up! Leaving the critical section .\n);1516 sem_post ( wait_here );17 }1819 int main ( int argc , char ** argv )20 {21 sem_init ( wait_here , 0 , 1) ;2223 pthread_t threads [5];2425 for ( int i =0; i 5; i ++)26 pthread_Create ( threads [i], NULL , example , NULL );2728 for ( int i =0; i 5; i ++)29 pthread_join ( threads [ i], NULL );3031 return 0;32 }When you run this program, you should find that a thread sleeps, and then wakes up, always in that order. Eachthread waits its turn to center the critical section and thus there should never be a mixing of sleeps or a mixing ofwakes. Every thread should sleep, then wake, and thus do it five times in sequence. If you comment out the sem waitcall, you might find a different behaviour.3 Thread PoolsA thread pool is a specific type of program which allocates a specific number of threads to a given data task. Normallythread pools are created to ensure that only a certain number of threads are created and running at any giventime. This is especially useful in shared resource systems (like Timberlea) or systems where stability is incrediblyimportant. It turns out that creating too many threads in rapid succession has the possibility of overwhelming anysystem, and thus enforcing some restraint gives you the benefit of increasing the speed at which tasks are performedwithout sacrificing system stability.There are many ways to create thread pools, but we will perform a very simple pool where we create a queue ofOperations and as each thread finishes execution, we will dequeue another Operation and create a new thread inplace of the old one. This is a very simple model which still suffers from overhead of creating many threads, butstill provides us the ability to manage the number of concurrent threads very easily. Other types of thread poolscan be more efficient by never finishing execution of a thread while waiting for more tasks to be given. This hasthe additional benefit of not having to constantly recreate threads at the cost of being more complicated to implement.6The thread pool requires a queue and an array. You will be given an array size for managing a certain number ofthreads. You should never Have more threads running than the given integer value. Since you have no direct meansof knowing whether or not a thread has completed processing, you will need to create an argument struct capable ofreporting when a thread is complete. Under normal circumstances you could join the thread immediately, but in thecase of a thread pool it would be inefficient to do so. Your goal is to loop through your currently running threads andany time you find one which has completed processing, then you join it and retrieve its return value before dequeuingthe next Operation and creating a new thread. Every time a thread completes and its value is returned, you muststore the value in an Array list to accumulate all of your data. When all of your Operations have completed, you willreturn the array list.Since the threads are not managed and the order of thread execution is outside of our control (controlled by theoperating system), the array lists values will be in a somewhat random order. You will need to sort these valuesand print them. You should already have the programs necessary to sort these values. We recommend looking backthrough previous lab pipeline results to find a means by which you can sort the values in your array list.74 Lab 9 Function ContractsIn this lab you will be responsible for fulfilling two lab contracts: the Threads contract and the Pool contract. Eachcontract is designed to test you on some of the things youve learned throughout the instruction portion of this lab.All contracts must be completed exactly as the requirements are laid out. Be very careful to thoroughly read thecontract instructions before proceeding. This does not, however, preclude you from writing more functions than youare asked for. You may write as many additional functions as you wish in your C source files.All contracts are designed to be submitted without a main function, but that does not mean you cannot write a mainfunction in order to test your code yourself. It may be more convenient for you to write a C source file with a mainfunction by itself and take advantage of the compilers ability to link files together by accepting multiple source filesas inputs. When you push your code to Gitlab, you dont need to git add any of your extra main function source files.For those of you who are concerned, when deciding which naming conventions you want to use in your code, favourconsistency in style, not dedication to a style that doesnt work.The contracts in this document are worth the following points values, for a total of 10.Contract PointsThreads 3Pool 7Total 1084.1 Threads4.1.1 ProblemYou will create three Programs for testing various types of thread features.4.1.2 PreconditionsYou are required to write three programs for creating and testing threads:1. threads: You will create a program which accepts an array and squares each value in the array using threads.2. unsafe: You will create a program which attempts to increment and print a variable without the use ofsemaphores.3. safe: You will create a program which attempts to increment and print a variable by protecting your criticalsection with semaphores.\r”
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。