” COMP9414程序 写作、 辅导Analysis程序COMP9414: Artificial IntelligenceAssignment 2: Sentiment AnalysisValue: 25%This assignment is inspired by a typical real-life scenario. Imagine you have been hired as aData Scientist by a major airline company. Your job is to analyse the Twitter feed to determinecustomer sentiment towards your company and its competitors.In this assignment, You will be given a collection of tweets about US airlines. The tweets have beenmanually labelled for sentiment. Sentiment is categorized as either positive, negative or neutral.Important: Do not distribute these tweets on the Internet, as this breaches TwittersTerms of Service.You are expected to assess various supervised machine learning methods using a variety of featuresand settings to determine what methods work best for sentiment classification in this domain. Theassignment has two components: programming to produce a collection of models for sentimentanalysis, and a report to evaluate the effectiveness of the models. The programming part involvesdevelopment Of Python code for data preprocessing of tweets and experimentation of methods usingNLP and machine learning toolkits. The report involves evaluating and comparing the modelsusing various metrics, and comparison of the machine learning models to a baseline method.You will use the NLTK toolkit for basic language preprocessing, and scikit-learn for feature constructionand evaluating the machine learning models. You will be given an example of how touse NLTK and scikit-learn For this assignment (example.py). For the sentiment analysis baseline,NLTK includes a hand-crafted (crowdsourced) sentiment analyser, VADER,1 which may performwell in this domain because of the way it uses emojis and other features of social media text tointensify sentiment, however the accuracy of VADER is difficult to anticipate because: (i) crowdsourcingis in general highly Unreliable, and (ii) this dataset might not include much use of emojisand other markers of sentiment.Data and MethodsCOMP9414作业 写作、 辅导Analysis作业A training dataset is a tsv (tab separated values) file containing a number of tweets, with onetweet per line, and linebreaks within tweets removed. Each line of the tsv file has three fields:instance number, tweet text and sentiment (positive, negative or neutral). A test dataset is a tsvfile in the same format as the training dataset except that your code should ignore the sentimentfield. Training and test datasets can be drawn from a supplied file dataset.tsv (see below).For all models except VADER, consider a tweet to be a collection of words, where a word is a stringof at least two letters, numbers or the symbols #, @, , $ or %, delimited by a space, after removingall other characters (two characters is the default minimum word length for CountVectorizer inscikit-learn). URLs should be treated as a space, so delimit words. Note that deleting junkcharacters may create longer words that were previously separated by those characters.Use the supervised learning methods discussed in the lectures: Decision Trees (DT), BernoulliNaive Bayes (BNB) and Multinomial Naive Bayes (MNB). Do not code these methods: instead use1 httpss://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8109the implementations from scikit-learn. Read the scikit-learn documentation on Decision Trees2 andNaive Bayes,3 and the linked pages describing the parameters of the methods. Look at example.pyto see how to use CountVectorizer and train and test the machine learning algorithms, includinghow to generate Metrics for the models developed.The programming part of the assignment is to produce DT, BNB and MNB models and your ownmodel for sentiment analysis in Python programs that can be called from the command line to trainand classify tweets read from correctly formatted tsv files. The report part of the assignment is toanalyse these models using a variety of parameters, preprocessing tools, scenarios and baselines.ProgrammingYou will produce and submit four Python programs: (i) DT sentiment.py (ii) BNB sentiment.py,(iii) MNB sentiment.py and (iv) sentiment.py. The first three of these are standard models asdefined below. The last is a model that you develop following experimentation with the data. Usethe given dataset (dataset.tsv) containing 5000 labelled tweets to develop the models.These programs, when called from the command line with two file names as arguments, thefirst a training dataset and the second a test dataset, should print (to standard output), theinstance number and Sentiment produced by the classifier of each tweet in the test set whentrained on the training set (one per line with a space between them) each sentiment being thestring positive, negative or neutral. For example:python3 DT sentiment.py training.tsv test.tsv output.txtshould write to the file output.txt the instance number and sentiment of each tweet in test.tsv,as determined by the Decision Tree classifier trained on training.tsv.When reading in training and test datasets, make sure your code reads all the instances (somePython readers use excel format, which uses double quotes as separators).Standard ModelsTrain the three standard models on the supplied dataset of 5000 tweets (the whole of dataset.tsv).For Decision Trees, use scikit-learns Decision Tree method with criterion set to entropy and withrandom state=0. Scikit-learns Decision Tree method does not implement pruning, rather youshould make sure Decision Tree construction stops when a node covers fewer than 50 examples(1% of the training set). Decision Trees are likely to lead to fragmentation, so to avoid overfittingand reduce computation time, for all Decision Tree models use as features only the 1000 mostfrequent words from the vocabulary (after preprocessing to remove junk characters as describedabove). Write code to train and test a Decision Tree model in DT sentiment.py.For both BNB and MNB, use scikit-learns implementations, but use all of the words in thevocabulary as features. Write two Pythons programs for training and testing Naive Bayes models,one a BNB model And one an MNB model, in BNB sentiment.py and MNB sentiment.py.Your ModelDevelop your best model for sentiment classification by varying the number and type of inputfeatures for the learners, the parameters of the learners, and the training/test set split, as describedin your report (see below). Submit one program, sentiment.py, that trains and tests a model.2 httpss://scikit-learn.org/stable/modules/tree.html3 httpss://scikit-learn.org/stable/modules/naive bayes.htmlReportIn the report, you will first evaluate the standard models, then present your own model. Forevaluating all models, report the results of training on the first 4000 tweets in dataset.tsv (thetraining set) and testing on the remaining 1000 tweets (the test set), rather than using thefull dataset of 5000 tweets for training, so stopping the Decision Tree classifiers when nodes coverless than 40 tweets rather than 50. Use the metrics (micro- and macro-accuracy, precision, recalland F1) and classification reports from scikit-learn. Show the results in either tables or plots, andwrite a short Paragraph in your response to each item below. The answer to each question shouldbe self contained. Your report should be at most 10 pages. Do not include appendices.1. (1 mark) Give simple descriptive statistics showing the frequency distribution for the sentimentclasses for the whole dataset of 5000 tweets. What do you notice about the distribution?2. (2 marks) Develop BNB and MNB models from the training set using (a) the whole vocabulary,and (b) the most frequent 1000 words from the vocabulary (as defined using CountVectorizer, afterpreprocessing by removing junk characters). Show all metrics on the test set comparing the twoapproaches for each method. Explain any similarities and differences in results.3. (2 marks) Evaluate the three standard models with respect to the VADER baseline. Show allmetrics on the test set and comment on the performance of the baseline and of the models relativeto the baseline.4. (2 marks) Evaluate the effect of preprocessing the input features by applying NLTK Englishstop word removal then NLTK Porter stemming on classifier performance for the three standardmodels. Show all metrics with and without preprocessing on the test set and explain the results.5. (2 marks) Evaluate the effect that converting all letters to lower case has on classifier performancefor the three standard models. Show all metrics with and without conversion to lower caseon the test set and Explain the results.6. (6 marks) Describe your best method for sentiment analysis and justify your decision. Givesome experimental results for your method trained on the training set of 4000 tweets and testedon the test set of 1000 tweets. Provide a brief comparison of your model to the standard modelsand the baseline (use the results from the previous questions).Submission Submit all your files using a command such as (this includes Python code and report):give cs9414 ass2 DT*.py BNB*.py MNB*.py sentiment.py report.pdf Your submission should include: Your .py files for the specified models and your model, plus any .py helper files A .pdf file containing your report When your files are submitted, a test will be done to ensure that one of your Python filesruns on the CSE machine (take note of any error messages printed out) When running your code on CSE machines: Set SKLEARN SITE JOBLIB=TRUE to avoid warning messages Do not download NLTK in your program: CSE machines have NLTK installed Check that your submission has been received using the command:9414 classrun -check ass2AssessmentMarks for this Assignment are allocated as follows: Programming (auto-marked): 10 marks Report: 15 marksLate penalty: 5 marks per day or part-day late off the mark obtainable for up to 3(calendar) days after the due date.Assessment Criteria Correctness: Assessed on standard input tests, using calls such as:python3 DT sentiment.py training.tsv test.tsv output.txtEach such test will give two files, a training dataset and a test dataset, which contain anynumber of tweets (one on each line) in the correct format. The training and test datasetscan have any names, not just training.tsv and test.tsv, so read the file names fromsys.argv. The output should be a sequence of lines (one line for each tweet) giving theinstance number and classified sentiment, separated by a space and with no extra spaces orlines. There are 2 marks allocated for correctness of each of the three standard models.For your own method, 4 marks are allocated for correctness of your methods on test sets oftweets that Include unseen examples. Report: Assessed on correctness and thoroughness of experimental analysis, and clarity andsuccinctness of explanations.There are 9 marks allocated to items 15 as above, and 6 marks for item 6. Of these 6 marks,2 marks are for the explanation of your choice of model, 2 marks are for the experimentalanalysis of your model, and 2 marks are for the evaluation of your model in comparison tothe standard models and baseline.PlagiarismRemember that ALL work submitted for this assignment must be your own work and no codesharing or copying is allowed. You may use code from the Internet only with suitable attributionof the source in your program. Do not use public code repositories. All submitted assignments willbe run through plagiarism detection software to detect similarities to other submissions, includingfrom past years. You should carefully read the UNSW policy on academic integrity and plagiarism(linked from the course web page), noting, in particular, that collusion (working together on anassignment, or sharing parts of assignment solutions) is a form of plagiarism. There is also a newplagiarism policy starting this term with more severe penalties.如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。