” CS3103语言程序 写作、C/C++编程设计CS3103: OperatingSystemsSpring 2021Programming Assignment 21 GoalsThe purpose ofthis assignment is to helpyou: get familiar with multi-threaded programming using pthread get familiar with mutual exclusion using mutexes get familiar with synchronization using semaphores2 BackgroundSentiment analysis, which is a powerful technique based on natural language processing,has a wide range of applications, including consumer reviews analysis, recommendersystem, political campaigning, stock speculation, etc. A sentiment analysis model requires alarge text corpus, which consists of classified articles grabbed from the internet using webcrawlers.In the simplest scenario, a text corpus can be built by two components: a web crawler and aclassifier. The crawler browses through web pages and grabs articles from websites. Thegrabbed articles are stored in a buffer, from which the classifier processes articles andclassifies them.Considering the complexity of Modern websites, it usually takes a long time for a crawler tolocate and grab an article from the web page. So, the speed of crawlers is usually too slowfor the classifier. Thus, multiple crawlers would be a better choice.3 Components and RequirementsYou are required to design and implement three crawlers, a buffer and a classifier inC/C++ on Linux (other languages are not allowed). Mutual exclusion and synchronizationmust be done with mutex and semaphore provided in libraries pthread.h andsemaphore.h.3.1 crawlerEach crawler thread is created to grab articles from websites and load them into thebuffer. It keeps doing grabbing and loading job, which takes time interval_A, untilthe buffer is full. And then it starts waiting until the classifier deletes an article fromthe buffer.A function char* str_generator(void), is provided to generate articles for thecrawler to grab and each article is represented by a string of 50 characters.3.2 bufferThe buffer structure is a first-in-first-out (FIFO) queue. It is used to store the grabbedarticles from crawlers temporarily, until they are taken by the classifier. It can store up to12 articles at the same time. You need to implement your own queue. You are not allowedto use standard c++ library (e.g., queue or other container provided by standard templatelibrary) or third-party libraries.23.3 classifierA classifier thread is created to classify the articles grabbed by the crawlers in FIFO order.Specifically, there are two steps in the procedure:1. Pre-processing: the classifier makes a copy of the article at the head of the buffer, changesall the uppercase letter (A-Z) to lowercase letter (a-z) and deletes any symbol that is nota letter.2. Classification: the classifier classifies the article into one of the 13 classes based on thefirst letter, x, of the processed article as follows.Class label = int(x a)%13 + 1Next, an auto-increasing key starting from 1 will be given to the classified article. (So, thekeys of classified articles are 1, 2, 3, ). At last, the key, the class label and the originalarticle, are stored to the text corpus in a text file. Then, the classifier deletes the classifiedarticle in the buffer. The whole procedure takes time of interval_B.3.4 terminationThe articles are divided into 13 Classes. Denote the number of articles in each class as C1, C2, C13, and p = min{ C1, C2, C13}. When p 5, the classifier notifies all crawlers to quit afterfinishing the current job at hand, and then the program terminates.3.5 input argumentsYour program has to accept the following two arguments in input order:interval_A, interval_B: integer, unit: microsecond.3.6 sample outputsThe outputs of your program are: A table with multiple columns shown on the screen, each column shows theactivities of a single thread in time order, and each row shows only one singleactivity of a thread. The text corpus, each line consists of a key, a class label and an article separated by aspace.All activities that need to be recorded for each thread are listed below, together with theirabbreviations.Crawler:start crawler starts.grab crawler starts to grab an article.f-grab an article has been grabbed and loaded into the buffer.wait crawler starts waiting for available space in the buffer.s-wait crawler stops waiting.quit crawler finished all job and about to quit.Classifier:start classifier starts.clfy classifier starts to classify an article.f-clfy the article has been classified and deleted from the buffer.k-enough k number of articles have been classified and the classifier notifies allthreads to quit.3n-stored a total n articles have been stored in the text corpus.quit classifier finished all job and about to quit.Below are sample output of the table on the screen and the text corpus. For example, in thetable, crawler1 started at t1, then, crawler2 started at t2 and grabbed at t3, and so on.Beginning of the table End of the tabletext corpus4 ChallengeThis challenge is for those students wish to get an A+ grade in this programmingassignment and to take one more step to the real-world application.Most modern websites are under anti-crawler protection. Thus, crawlers should be updatedwith new IP addresses and cookies periodically to get through the barrier.A strategy manager thread is Created to update the crawlers with a new IP and cookies. Eachcrawler notifies the strategy manager to update its IP and cookies after every M articles are4grabbed. The update takes time of interval_C. The input and extra output are listed below.Your program has to accept the following arguments in input order:interval_A, interval_B, interval_C: integer, unit: microsecond, M: integer.Crawler: two more activities have to be recorded:rest crawler starts resting.s-rest crawler stops resting.Strategy-Manager:start manager starts.get-crx manager gets a Notification from crawler x.up-crx manager updated crawler x with new IP and cookies.quit manager finished all job and about to quit.5 Helper Program and Hint5.1 generator.cppThe function char* str_generator(void) is provided in the file generator.cpp. Itreturns a string (char array) of length 50. Use it by declaring a prototype in your code andcompiling it along with your source code.5.2 hintMulti-threading needs careful manipulation. A specious program may show correctness inseveral tests at the beginning, but collapses at the later tests. Thus, testing your programmultiple times would be a good choice. Testing it with different arguments would be evenbetter.6 Marking SchemeYourprogramwill be testedonour CSLabLinux servers (cs3103-01, cs3103-02, cs3103-03).You should describe clearly how to compile and run your program as comments in yoursource program file. If an executable file cannot be generated and runningsuccessfully on our Linux servers, it will be considered as unsuccessful.A. Design and use of multi-threading (15%) Thread-safe multithreaded design and correct use of thread-managementfunctions Non-multithreaded implementation (0%)B. Design and use of mutexes (15%) Complete, correct and non-excessive use of mutexes Useless/unnecessary use of mutexes (0%)C. Design and use of semaphores (30%) Complete, correct and non-excessive use of semaphores Useless / unnecessary use of semaphores (0%)D. Degree of concurrency (15%) A design with higher concurrency is preferable to one with lowerconcurrency.o An example of lower Concurrency: only one thread can access the buffer ata time.o An example of higher concurrency: various threads can access the buffer5but works on different articles at a time. No concurrency (0%)E. Program correctness (15%) Complete and correct implementation of other features including:o correct logic and coding of thread functionso correct coding of queue and related operationso passing parameters to the program on the command lineo program output conform to the format of the sample outputo successful program termination Fail to pass the g++ complier on our Linux servers to generate a runnableexecutable file (0%)F. Programming style and documentation (10%) Good programming style Clear comments in the program to describe the design and logic Unreadable program without any comment (0%)7 Submission This assignment is to be done individually or by a group of two students. You areencouraged to discuss the high-level design of your solution with your classmates butyou must implement the program on your own. Academic dishonesty such as copyinganother students work or allowing another student to copy your work, is regarded as aserious academic offence. Each submission consists of two files: a source program file (.cpp file) and a text file (.txtfile) containing the table outputted by your program and the text corpus. Write down your name(s), eid(s), student ID(s), the command line to compile and runyour program in the beginning of your program as comments. Use your student ID(s) to name your submitted files, such as 5xxxxxxx.cpp and5xxxxxxx.txt for individual submission, or 5xxxxxxx_5yyyyyyy.cpp and5xxxxxxx_5yyyyyyy.txt for group submission. You may ignore the version numberappended by Canvas to your files. Only one submission is required for each group. Submit the files to Canvas. As far as you follow the above submission procedure, thereis no need to add comment to Repeat your information in Canvas. The deadline is 11:00am, 11-MAR-2021 (Thu). No late submission will be accepted.8 Questions? This is not a programming course. You are encouraged to debug the program on yourown first. If you have any question, please submit your question to Mr Wu Wei via the Discussionboard Programming Assignment #2 on Canvas. To avoid possible plagiarism, do not post your source code on the Discussion board. If necessary, you may also contact Mr Wu Wei at weiwu56-c@my.cityu.edu.hk.如有需要,请加QQ:99515681 或WX:codehelp
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。