” XJCO3011 Web Data编程 写作、 辅导Java编程University of Leeds SWJTU-Leeds Joint SchoolSouthwest Jiaotong UniversityWeb Services and Web Data XJCO3011Semester 2, 2020-2021Coursework 2 – Building a Search Tool30 marks = 30% of total module marksImportant Note: This is an individual Project, NOT a team project. Each student must implementtheir own search tool.Submission Deadline: 07/05/2021 at 11:00 pm (UK time)In this project, you will develop a search tool that can:1) Crawl the pages of a website.2) Create an inverted index of all word occurrences in the pages of a website.3) Allow the user to find pages containing certain search terms.The website you will use for this project is https://example.python-scraping.com/ . This websitecontains brief information about each country in the world, such as its capital, area, population,currency, etc. The website was purpose-built to allow students to learn web scraping. I haveobtained permission from the websites owner for us to crawl and download the pages of thewebsite. However, you must observe a politeness window of at least 5 seconds between successiverequests to the website. An inverted index that stores the frequency of occurrence of each word ineach page must be created by the tool as it crawls the pages of the website.Using the search tool, the user should be able to find pages containing individual words such asMariehamn, or a combination of Two or more words such as Capital Aland Islands.The search tool is to be command-driven and must provide the following commands:buildThis command instructs the search tool to crawl the website, build the index, and save the resultingindex into the file system. For simplicity you can save the entire index in one file.loadThis command loads the index from the file system. Obviously, this command will only work if theindex has previously been created using the build command.printThis command prints the inverted index for a particular word, for example:print Pesowill print the inverted index for the word PesofindThis command is used to find a certain query phrase in the inverted index and returns a list of allpages containing this phrase, for example:find Dinarwill return a list of all pages containing the word Dinar, whilefind Area Afghanistanwill return all pages containing the words Area and Afghanistan.For simplicity assume that the Search is case sensitive, so Euro is not the same word as euro.You should use Python 3 to implement the search tool. It is also strongly recommended to you usethe Requests library ( https://docs.python-requests.org/en/master/) for composing requests, andthe Beautiful Soup library ( httpss://www.crummy.com/software/BeautifulSoup/bs4/doc/) to parseHTML pages.To submit the source code of your search tool to Minerva, put your Python source file(s) and theinverted index file that was created by your tool in one directory, compress the directory with ZIP,and upload it to Minerva. As part of your submission, you should also submit a brief report (2-3pages excluding the title page) that clearly, yet briefly, describes how you implemented each aspectof the tool. For example, the data structures, methods and algorithms you have used in 1) crawlingthe website, 2) creating the inverted index, and 3) computing the scores of pages when processing asearch query. The report should also include brief instructions on how to invoke and use the tool.Please do NOT fill your report by copying text from online resources, such as tutorials or lectureslides, as I am only interested to understand what you have done yourself in this coursework.Marking SchemeThe tool successfully crawls all the Pages of the website (6 marks)The tool successfully creates the inverted index for the whole website (6 marks)The tool can store then load the inverted index to/from the file system (6 marks)The tool prints the inverted list for a certain word (4 marks)The tool can correctly find pages Containing search terms (8 marks)The clarity of your report will affect the marks you are awarded for the relevant aspects of the markscheme.请加QQ:99515681 或邮箱:99515681@qq.com WX:codehelp
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。