ITI 1121程序 写作、 辅导java程序

” ITI 1121程序 写作、 辅导java程序ITI 1121. Introduction to Computing IIWinter 2021Assignment 1(Last modified on January 13, 2021)Deadline: February 5, 2021, 11:30 pmLearning objectives Edit, compile and run Java programs Utilize arrays to store information Apply basic object-oriented programming concepts Understand the university policies for academic integrityIntroductionThis year, we are going to implement, through a succession of assignments, a simplified version of a useful machinelearning technique, called decision tree classification. If you are not familiar with decision trees and are curious toknow what they are, You may wish to have a quick look at the following Wikipedia page: httpss://en.wikipedia.org/wiki/Decision_tree_learning.For Assignment 1, however, you are not going to do anything that is specific to decision trees; you can completeAssignment 1 without any knowledge of decision trees! We will get to decision trees only in Assignments 2 and 3.If you find the above Wikipedia page overwhelming, fear not! As we go along, we will provide you with simpleand accessible material to read on decision tree classification. Ultimately, the version of decision tree classificationthat you implement, despite still Being extremely useful, has many of the complexities of the more advancedimplementations removed (for example, handling unknown values in your training data).As far as the current assignment Assignment 1 is concerned, we have modest goals: we would like to read aninput file, which will (in future assignments) constitute the training data for our learning algorithm, and performsome basic tasks that are Prerequisites to virtually any type of machine learning.Specifically, you will be implementing the following tasks in Assignment 1: Task 1. Parsing comma-separated values (CSV) from a given data file and populating appropriate data structuresin memory Task 2. Extracting certain summary data (metadata) about the characteristics of the input data; this metadatawill come handy for the construction of decision trees in future assignments.These two tasks are best illustrated with a simple example. Suppose we have a CSV file named weather.csvwith the content shown in Figure 1.1 The data is simply a table. The first (non-empty) row in the file providesthe names of the table columns in a comma-separated format. Each column represents an attribute (also called afeature). The remaining (non-empty) rows are the datapoints.In our example, each datapoint is a historical observation about weather conditions (in terms of outlook, temperaturein fahrenheit, humidity and wind), and whether it has been possible to play a certain tournament (forexample, cricket) outside. What a Machine learning algorithm can do here is to learn from examples and helpdecide / predict whether one can play a tournament on a given day according to the weather conditions on that day.Now, going backing to Task 1 and Task 2, below is what each of these tasks would do with the data in Figure 1.1This example is borrowed from Data Mining: Practical Machine Learning Tools and Techniques 3rd Ed. (2011) by Ian H. Witten, EibeFrank and Mark A. Hall.1outlook,temperature,humidity,windy,playsunny,85,85,FALSE,nosunny,80,90,TRUE,noovercast,83,86,FALSE,yesrainy,70,96,FALSE,yesrainy,68,80,FALSE,yesrainy,65,70,TRUE,noovercast,64,65,TRUE,yessunny,72,95,FALSE,nosunny,69,70,FALSE,yesrainy,75,80,FALSE,yessunny,75,70,TRUE,yesovercast,72,90,TRUE,yesovercast,81,75,FALSE,yesrainy,71,91,TRUE,noFigure 1: Example CSV file (weather.csv)String[5]String[14][5]outlook temperature humidity windy playsunny 85 85 FALSE nosunny 80 90 TRUE noovercast 83 86 FALSE yesrainy 70 96 FALSE yesrainy 68 80 FALSE yesrainy 65 70 TRUE noovercast 64 65 TRUE yessunny 72 95 FALSE nosunny 69 70 FALSE yesrainy 75 80 FALSE yessunny 75 70 TRUE yesovercast 72 90 TRUE yesovercast 81 75 FALSE yesrainy 71 91 TRUE noattributeNamesmatrixFigure 2: Results of parsing our example input file Task 1 will parse the input data and build the conceptual memory representation shown in Figure 2. Moreprecisely, we get (1) an instance variable, attributeNames (discussed later), instantiated with a String arrayof length 5 and containing the column names, and (2) an instance variable, matrix (also discussed later),instantiated with a two-dimensional String array (of size 145) and populated with the datapoints in the file. Task 2 will identify the Unique values observed in each column. If all the values in a column happen to benumeric, then the column is found to be of numeric type. Otherwise, the column will be of nominal type,meaning that the values in the column are to be treated as labels without any quantitative value associatedwith them. For our example file, the column types and the set of unique values for each column would be asshown in Figure 3. Note that, for this assignment, you do not need to sort the numeric value sets in eitherascending or descending order. This becomes necessary only in the future assignments.21) outlook (nominal): [sunny, overcast, rainy]2) temperature (numeric): [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 81, 71]3) humidity (numeric): [85, 90, 86, 96, 80, 70, 65, 95, 75, 91]4) windy (nominal): [FALSE, TRUE]5) play (nominal): [no, yes]Figure 3: Derived column types and unique values sets for our example input fileoutlook,temperature,humidity,windy,playsunny, 85, 85,FALSE,nosunny,80,90,TRUE, noovercast,83,86,FALSE, yesrainy,70,96, FALSE,yesrainy,68,80,FALSE, yesrainy,65,70,TRUE,noovercast,64,65, TRUE,yessunny,72,95,FALSE,nosunny, 69,70,FALSE,yesrainy,75, 80,FALSE,yessunny,75,70, TRUE,yesovercast,72,90, TRUE,yesovercast,81,75,FALSE,yesrainy, 71,91,TRUE,noFigure 4: Example with surrounding spaces and empty lines that need to be ignoredImportant Considerations (Read Carefully!)While the Assignment is conceptually simple, there are some important consideration that you need to carefullypay attention to in your implementation.Determining the size of the arrays to instantiate: You will be storing the attribute names and datapoints usingtwo instance variables that are respectively declared as follows:private String[] attributeNames;private String[][] matrix;One problem that you have to deal with is how to instantiate these variables. To do so, you need to know thenumber of attributes (columns) and the number of datapoints. You can know the former number only aftercounting the attributes names on the first (non-empty) line of the file. As for the latter (number of datapoints),you can only know this once you have traversed the entire file. Later on in the course, we will see expandibledata structure like linked lists, which do not have a fixed size, allowing elements to be added to them as yougo along. For this assignment, you are expressly forbidden from using lists or similar data structures withnon-fixed sizes. Instead, you are expected to work with fixed-size arrays. For this assignment, the easiest wayto instantiate the arrays is through a two-pass strategy. This means that you will go over the input file twice.In the first pass, you merely count the number of columns and datapoints. With these numbers known, youcan instantiate attributeNames and matrix. Then, in a second pass, you can populate (the now-instantiated)attributeNames and matrix. Note that, as illustrated in Figure 2, you are expected to instantiate matrix asa row column array, as Oppposed to a column row array. While this latter strategy is correct too, you areasked to use the former (that is, row column) as a convention throughout this assignment.Removing blank spaces and empty lines: The blank spaces surrounding attribute names and values should bediscarded. For example, consider the input file in Figure 4. This file is the same as the one in Figure 1, only3outlook,temperature, in fahrenheit,humidity,windy,playsunny,85,85,FALSE,nosunny,80,90,TRUE,noovercast,83,86,FALSE,yesrainy, mild,70,96,FALSE,yesrainy, mild,68,80,FALSE,yesrainy, heavy,65,70,TRUE,noovercast,64,65,TRUE,yessunny,72,95,FALSE,nosunny,69,70,FALSE,yesrainy, mild,75,80,FALSE,yessunny,75,70,TRUE,yesovercast,72,90,TRUE,yesovercast,81,75,FALSE,yesrainy, heavy,71,91,TRUE,noFigure 5: Example usage of commas in attribute names and values1) outlook (nominal): [sunny, overcast, rainy, mild, rainy, heavy]2) temperature, in fahrenheit (numeric): [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 81, 71]3) humidity (numeric): [85, 90, 86, 96, 80, 70, 65, 95, 75, 91]4) windy (nominal): [FALSE, TRUE]5) play (nominal): [no, yes]Figure 6: Metadata derived from the input shown in Figure 5with some surrounding spaces added. These surrounding spaces need to be trimmed and ignored. The samegoes with empty lines. Empty lines can be treated as non-existent.Supporting commas in attribute names and values: Since commas are used as separators (delimiters), it is importantto provide a way to support commas within attribute names and attribute values. To do so, you will needto implement an escape sequence mechanism. You will do so using single quotes (). More precisely, commasare to be treated as regular characters if a text segment is embraced with single quotes. To illustrate, considerthe example input in Figure 5. The metadata information derived from this input is shown in Figure 6. Whilenot explicitly shown by Figure 6, the values to store in attributeNames and matrix are obviously affectedwhen escape sequences with single quotes are present in the input file.Missing attribute values: There may be situations where not all attribute values are known (for example, due toincomplete data collection). In such cases, the attribute values in question may be left empty. Your implementationneeds to be able empty (missing) attribute values. You can choose to represent missing values witha special value, for Example MISSING. Alternatively, you can choose to represent missing values with anempty string (). To illustrate, consider the input file in Figure 7, where some values are missing. The metadataderived from this input file is shown in Figure 8. Here, we have chosen to represent missing values withthe empty string. For this assignment, you should designate any column that has missing values as nominal.For example, in Figure 7, some of the values for the humidity attribute are missing. This has resulted inhumidity to no longer be a numeric attribute but rather a nominal one, as shown in Figure 8.Efficiency in identifying unique attribute values: You need to be prepared for the possibility that your input filewould be large. One particular place you need to be careful with is when you are determining the uniqueset of values that a given attribute can assume. Your implementation should be efficient (hint: should not dofutile search) in order to avoid a quadratic runtime.4outlook,temperature,humidity,windy,playsunny,85,85,FALSE,sunny,80,90,TRUE,noovercast,83,,FALSE,rainy,70,96,FALSE,yesrainy,68,80,FALSE,yesrainy,65,70,TRUE,noovercast,64,65,TRUE,yessunny,72,,FALSE,nosunny,69,70,FALSE,yesrainy,75,80,FALSE,yessunny,75,70,TRUE,yesovercast,72,90,TRUE,yesovercast,81,75,FALSE,yesrainy,71, ,TRUE,Figure 7: Example input file with missing values1) outlook (nominal): [sunny, overcast, rainy]2) temperature (numeric): [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 81, 71]3) humidity (nominal): [85, 90, , 96, 80, 70, 65, 75]4) windy (nominal): [FALSE, TRUE]5) play (nominal): [, no, yes]Figure 8: Metadata derived from the input file shown in Figure 7ImplementationWe are now ready to program our solution. We will only need two classes for this. For the assignment, you need tofollow the patterns that we provide. You cannot change any of the signatures of the methods (that is you cannotmodify the methods at all). You cannot add new public methods or variables. You can, however, add new privatemethods to improve the readability or the organization of your code.DataSetThere are a number of methods in DataSet that you need to either complete or implement from scratch. Guidanceis provided in the Template code in the form of comments. The locations where you need to write code have beenclearly indicated with an inline comment that reads as follows:// WRITE YOUR CODE HERE!The easiest way to navigate the template code is to start from the main method of the DataSet class, shown inFigure 9. Intuitively, this method works as follows: First, it reads from the standard input the name of the CSV fileto process. Next, it creates an instance of DataSet; the constructor of DataSet will process the input file and populateattributeNames and matrix (explained earlier). Finally, the main method prints to the standard output themetadata of the instance of the DataSet that was just created. Doing so requires calling the metadataToString() instancemethod. To better illustrate metadataToString(), Figure 10 shows the return value of metadataToString()by our reference implementation for the input file of Figure 1 (the instructors reference implementation will bereleased to you after the assignment due date).Please note that there are a couple of technicalities which you will learn about only later in the course. One ishow to process files. The other (and much more important technicality, for that matter) is the notion of exceptions inJava. For this assignment, the template code for file processing is provided wherever it is needed. As for exceptions,you do not have to deal with them in this assignment, but you will see that some methods in the code are declaredas throwing exceptions. You can ignore These exception declarations for now.5public static void main(String[] args) throws Exception {System.out.print(Please enter the name of the CSV file to read: );Scanner scanner = new Scanner(System.in);String strFilename = scanner.nextLine();DataSet dataset = new DataSet(strFilename);System.out.print(dataset.metadataToString());}Figure 9: The main method in DataSetNumber of attributes: 5Number of datapoints: 14* * * Attribute value sets * * *1) outlook (nominal): [sunny, overcast, rainy]2) temperature (numeric): [85, 80, 83, 70, 68, 65, 64, 72, 69, 75, 81, 71]3) humidity (numeric): [85, 90, 86, 96, 80, 70, 65, 95, 75, 91]4) windy (nominal): [FALSE, TRUE]5) play (nominal): [no, yes]Figure 10: String returned by metadataToString() for the input file of Figure 1UtilThe Util class is provided in order to faciliate the implementation of the metadataToString() method, which youwill be implementing in the DataSet class (see the template code). The Util class provides four static methods: public static boolean isNumeric(String str): Checks if str represents a numeric value isArrayNumeric(String[] array): Checks an array of strings, array, and returns true if and only if arrayis non-empty and all its elements represent numeric values public static String nominalArrayToString(String[] array): Produces a string representation of anarray of nominals. Note that all nominal labels are embraced with single quotes. public static String numericArrayToString(String[] array): Produces a string representation of anarray of numerics. Unlike nominals, numeric values are not embraced with single quotes in the representation.JUnit TestsWe are providing a set of JUnit tests for the class DataSet. These tests should help make sure that your implementationis correct. They can further help clarify the expected behaviour of this class, if need be. Please note that theDataSetTest class assumes that the following CSV files are located in the current directly: credit-info-with-commas.csv,weather-nominal.csv, credit-info.csv, weather-numeric.csv, large.csv, weather-with-spaces.csv, and missing-values.csv.You can find all these CSV files in the datasets directory of the template zip file you have been provided with.Academic IntegrityThis part of the assignment is meant to raise awareness concerning plagiarism and academic integrity. Please readthe following documents.6 httpss://www.uottawa.ca/administration-and-governance/academic-regulation-14-other-important-information httpss://www.uottawa.ca/vice-president-academic/academic-integrityCases of plagiarism will be dealt with according to the university regulations. By submitting this assignment, youacknowledge:1. I have read the academic regulations regarding academic fraud.2. I understand the consequences of plagiarism.3. With the exception of the source code provided by the instructors for this course, all the source code is mine.4. I did not collaborate With any other person, with the exception of my partner in the case of team work. If you did collaborate with others or obtained source code from the Web, then please list the names ofyour collaborators or the source of the information, as well as the nature of the collaboration. Put thisinformation in the submitted README.txt file. Marks will be deducted proportional to the level of helpprovided (from 0 to 100%).Rules and regulation Follow all the directives available on the assignment directives web page. Submit your assignment through the on-line submission system virtual campus. You must preferably do the assignment in teams of two, but you can also do the assignment individually. You must use the provided template classes below. If you do not follow the instructions, your program will make the automated tests fail and consequently yourassignment will not be graded. We will be using an automated tool to compare all the assignments against each other (this includes both, theFrench and English sections). Submissions that are flagged by this tool will receive the grade of 0. It is your responsibility to make sure that BrightSpace has received your assignment. Late submissions willnot be graded.FilesYou must hand in a zip file (no other file format will be accepted). The name of the top directory has to have thefollowing form: a1_3000000_3000001, where 3000000 and 3000001 are the student numbers of the team memberssubmitting the assignment (simply repeat the same number if your team has one member). The name of thefolder starts with The letter a (lowercase), followed by the number of the assignment, here 1. The parts areseparated by the underscore (not the hyphen). There are no spaces in the name of the directory. The archivea1_3000000_3000001.zip contains the files that you can use as a starting point. Your submission must contain thefollowing files. README.txt A text file that contains the names of the two partners for the assignments, their student ids, section, anda short description of the assignment (one or two lines). DataSet.java Util.java2 StudentInfo.java (Make sure to update the file, so that the display() method shows your personal information).2You are not supposed to change Util.java; you simply resubmit the file given to you.如有需要,请加QQ:99515681 或WX:codehelp

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导