ECMM422编程语言写作、Python编程设计

” ECMM422编程语言写作、Python编程设计ECMM422 Machine LearningCourse Assessment 1This course assessment (CA1) represents 40% of the overall module assessment.This is an individual exercise and your attention is drawn to the College and Universityguidelines on collaboration and plagiarism, which are available from the College website.Note:.do not change the name of this notebook, i.e. the notebook file has to be named: ca1.ipynb.do not remove/delete any cell.do not add any cell (you can work on a draft notebook and only copy the functionimplementations here).do not add you name or student code in the notebook or in the file nameEvaluation criteria:Each question asks for one or more functions to be implemented.Each question is awarded a number of marks.A (hidden) unit test is going to evaluate if all desired properties of the required function(s) aremet.If the test passes all the associated marks are awarded, if it fails 0 marks are awarded. The largenumber of questions allows a fine grading.Notes:In the rest of the notebook, the Term data matrix refers to a two dimensional numpy arraywhere instances are encoded as rows, e.g. a data matrix with 100 rows and 4 columns is to beinterpreted as a collection of 100 instances each with four features.When a required function can be implemented directly by a library function it is intended thatthe candidate should write her own implementation of the function, e.g. a function to computethe accuracy or the cross validation.Some questions are just a check-point, i.e. it is for you to see that you are correctlyimplementing all functions. Since those check-points use functions that you have alreadyimplemented and that have already been marked, those questions are not going to be marked(i.e. they appear as having marks 0).In [ ]: %matplotlib inlineimport matplotlib.pyplot as pltimport numpy as np2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 2/16Question 1 [marks 6]a) Make a function data_matrix = make_data_classification(mean, std,n_centres, inner_std, n_samples, random_seed=42) to create a data matrixaccording to the following rules:.mean is a n-dimensional vector (say [1,1], but the function should allow vectors of anydimension).n_centres is the number of centres (say 3).std is the standard deviation (say 1).the centres are sampled from a Normal distribution with mean mean and standarddeviation std.from each centre sample n_Samples from a Normal distribution with the centre as themean and standard deviation inner_std so if mean=[1,1] n_centres=3 andn_samples=10 then the data matrix will be a 30 rows x 2 columns numpy array.b) Make a function data_matrix, targets = make_data_regression(mean, std,n_centres, inner_std, n_samples_list, random_seed=42) to create a data matrixand a target vector according to the following rules:.the data matrix is constructed in the same way as in make_data_classification.the targets are the Euclidean distance between the sample and the centre of the generatingNormal distributionSee Question 3 for a graphical example of the expected output.Question 2 [marks 2]import scipy as sp# unit test utilities: you can ignore these functiondef is_approximately_equal(test,target,eps=1e-2):return np.mean(np.fabs(np.array(test) – np.array(target)))epsdef assert_test_equality(test, target):assert is_approximately_equal(test, target), Expected:\n %s \nbut got:\n %sIn [ ]:def make_data_classification(mean, std, n_centres, inner_std, n_samples, random_# YOUR CODE HEREraise NotImplementedError()def make_data_regression(mean, std, n_centres, inner_std, n_samples, random_seed# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 3/16a) Make a function data_matrix, targets =get_dataset_classification(n_samples, std, inner_std) to create a data matrixand a target vector for a binary classification problem according to the following rules:the instances from the positive class are generated according to the same rules providedfor make_data_classification ; so are the instances from the negative classinstances from the positive class have as mean the vector [10,10] and those from thenegative class, vector [-10,-10]the number of centres is fixed to 3the random seed is fixed to 42n_samples indicates the total Number of instances finally available in the outputdata_matrixb) Make a function data_matrix, targets = get_dataset_regression(n_samples,std, inner_std) to create a data matrix according to the following rules:the instances are generated according to the same rules provided formake_data_regressionthe targets are generated according to the same rules provided formake_data_regressioninstances have as mean the vector [10,10]the number of centres is fixed to 3the random seed is fixed to 42n_samples indicates the total number of instances finally available in the outputdata_matrixQuestion 3 [marks 1]Make a function plot(X,y) to display the scatter plot of a data matrix of two dimensionalinstances using the array y to assign the colour to the instances.When runningX, y = get_dataset_regression(n_samples=600, std=30, inner_std=5)plot(X,y)you should get something likeIn [ ]:def get_dataset_classification(n_samples, std, inner_std):# YOUR CODE HEREraise NotImplementedError()def get_dataset_regression(n_samples, std, inner_std):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the Unit tests. Do not consider this cell.2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 4/16and when runningX, y = get_dataset_classification(n_samples=600, std=30, inner_std=5)plot(X,y)you should get something likeQuestion 4 [marks 1]Make a function classification_error(targets, preds) to compute the fraction oftimes that the entries in targets do not agree with the corresponding entries in preds .Note: do not use library functions to compute the result directly but implement your ownversion.Question 5 [marks 2]Make a function regression_error(targets, preds) to compute the mean squared errorbetween targets and preds .Note: do not use library functions to compute the result directly but implement your ownversion.Question 6 [marks 7]Make a function make_bootstrap(data_matrix, targets) to extract a bootstrappedreplicate of an input dataset.In [ ]:def plot(X,y):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:def classification_error(targets, preds):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the Unit tests. Do not consider this cell.MSE =ni=1(Ti Pi)2.1nIn [ ]:def regression_error(targets, preds):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 5/16The function should return the following 6 elements (in this order):bootstrap_data_matrix, bootstrap_targets, bootstrap_sample_ids,oob_data_matrix, oob_targets, oob_samples_ids , where:.bootstrap_data_matrix : is a data matrix encoding the bootstrapped replicate of thedata matrix.bootstrap_targets : is the corresponding bootstrapped replicate of the target vector.bootstrap_sample_ids : is an array containing the instance indices of the bootstrappedreplicate of the data matrix.oob_data_matrix : is a data matrix encoding the out of bag instances.oob_targets : is the corresponding out of bag instances of the target vector.oob_samples_ids : is an array containing the instance indices of the out of bag instancesQuestion 7 [marks 10]Consider the following functional blueprints estimator = train(X_train, y_train,param) and test(X_test, estimator) . A function of type train takes in input a datamatrix X_train a target vector y_train and a single value param (not a list ofparameters). A function of type train outputs an object that represent an estimator. Afunction of type test takes in input a data matrix X_test the fit object estimator andoutputs the predicted targets.Using this blueprint, write the specialised train and test functions for the following classifiersand regressors (use the function signature provided in the next cell, e.g. train_ab for trainingan adaboost classifier):Classifiers:a) k-nearest-neighbor: the parameter controls the number of neighbors (you may useKNeighborsClassifier from scikit) [train_knn, test_knn]b) adaboost: the parameter controls the maximal depth of the decision tree uses as weakclassifier (you may use the DecisionTreeClassifier from scikit but you should provide yourown implementation of the boosting algorithm) [train_ab, test_ab]c) random forest: the parameter controls the maximal depth of the tree (you may use theDecisionTreeClassifier from scikit but you Should provide your own implementation ofthe bagging algorithm) [train_rfc, test_rfc]Regressors:In [ ]:def make_bootstrap(data_matrix, targets):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 6/16d) decision tree: the parameter controls the maximal depth of the tree (you may use theDecisionTreeRegressor from scikit) [train_dt, test_dt]e) svm linear: the parameter controls the regularization constant C (you may use SVR fromscikit) [train_svm_1, test_svm]f) svm with a polynomial kernel of degree 2: the parameter controls the regularizationconstant C (you may use SVR from scikit) [train_svm_2, test_svm]g) svm with a polynomial kernel of degree 3: the parameter controls the regularizationconstant C (you may use SVR from scikit) [train_svm_3, test_svm]h) random forest: the parameter controls the maximal depth of the tree (you may use theDecisionTreeRegressor from scikit but you should provide your own implementation ofthe bagging algorithm) [train_rf, test_rf]For the algorithms adaboost and random forest , the size of the ensemble should be fixedto 100.In [ ]:# classifiersfrom sklearn.neighbors import KNeighborsClassifierdef train_knn(X_train, y_train, param):# YOUR CODE HEREraise NotImplementedError()def test_knn(X_test, est):# YOUR CODE HEREraise NotImplementedError()from sklearn.tree import DecisionTreeClassifierdef train_ab(X_train, y_train, param):# YOUR CODE HEREraise NotImplementedError()def test_ab(X_test, models):# YOUR CODE HEREraise NotImplementedError()from sklearn.tree import DecisionTreeClassifierdef train_rfc(X_train, y_Train, param):# YOUR CODE HEREraise NotImplementedError()def test_rfc(X_test, models):# YOUR CODE HEREraise NotImplementedError()# regressorsfrom sklearn.tree import DecisionTreeRegressordef train_dt(X_train, y_train, param):# YOUR CODE HEREraise NotImplementedError()def test_dt(X_test, est):2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 7/16Question 8 [marks 0]This is just a check-point, i.e. it is for you to see that you are correctly implementing allfunctions. Since this cell uses functions that you have already implemented and that havealready been marked, this Question is not going to be marked.Make a dataset usingX, y = get_dataset_classification(n_samples=240, std=30, inner_std=10)# YOUR CODE HEREraise NotImplementedError()from sklearn.svm import SVRdef train_svm_1(X_train, y_train, param):# YOUR CODE HEREraise NotImplementedError()def train_svm_2(X_train, y_train, param):# YOUR CODE HEREraise NotImplementedError()def train_svm_3(X_train, y_train, param):# YOUR CODE HEREraise NotImplementedError()#Note: you do not need to specialise the svm test function for each degreedef test_svm(X_test, est):# YOUR CODE HEREraise NotImplementedError()from sklearn.tree Import DecisionTreeRegressordef train_rf(X_train, y_train, param):# YOUR CODE HEREraise NotImplementedError()def test_rf(X_test, models):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 8/16from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)and check that the classification error fork-nearest-neighborrandom forest classifieradaboostQuestion 9 [marks 0]This is just a check-point, i.e. it is for you to see that you are correctly implementing allfunctions. Since this cell uses functions that you have already implemented and that havealready been marked, this Question is not going to be marked.Make a dataset usingX, y = get_dataset_regression(n_samples=120, std=30, inner_std=10)from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)and check that the regression error for these regressorsdecision treesvm with polynomial kernel of degree 2svm with polynomial kernel of degree 3is approximately comparable.Question 10 [marks 10]In [ ]:# Just run the following code, do not modify itX, y = get_dataset_classification(n_samples=240, std=30, inner_std=10)from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)param=3e_knn = classification_error(y_test, test_knn(X_test, train_knn(X_train, y_traine_rfc = classification_error(y_test, test_rfc(X_test, train_rfc(X_train, y_traine_ab = classification_error(y_test, test_ab(X_test, train_ab(X_train, y_train, pprint(e_knn, e_rfc, e_ab)In [ ]:# Just run the following code, do not modify itX, y = get_dataset_regression(n_samples=120, std=30, inner_std=10)from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)param=3e_dt = regression_error(y_test, test_dt(X_test, train_dt(X_train, y_train, parame_svm2 = regression_error(y_test, test_svm(X_test, train_svm_2(X_train, y_train,e_svm3 = regression_error(y_test, test_svm(X_test, train_svm_3(X_train, y_train,print(e_dt, e_svm2, e_svm3)2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 9/16Make a function sizes, train_errors, test_errors =compute_learning_curve(train_func, test_func, param, X, y, test_size,n_steps, n_repetitions) to compute the train and test errors as mandated in the learningcurve approach.The regressor will be trained via train_func on the problem data_matrix , targets withparameter param . The estimate will be done averaging a number of replicates equal ton_repetitions , i.e. the code needs to repeat the process n_repetitions times (say 10)and average the error.Note that a fraction of the data as indicated by test_size (say 0.33 for 30%) is going to bereserved for testing purposes. The remaining amount of data can be used in the training phase.The learning curve should be computed for an amount of training material that varies from aminimum of 2 instances up to all the instances available for training.You should use the function regression_error to compute the error.Note: do not use library functions (e.g. learning_curve in scikit) to compute the resultdirectly but implement your own version.Question 11 [marks 1]Make a function plot_learning_curve(sizes, train_errors, test_errors) todisplay the train and test error as a function of the size of the training set.You should get something like:Question 12 [marks 3]Make a function estimate_asymptotic_error(sizes, train_errors, test_errors)that returns an estimate of the asymptotic error, i.e. the error made in the limit of an infinitelylarge training set.In [ ]:def compute_learning_curve(train_func, test_func, param, X, y, test_size, n_step# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:def plot_learning_curve(sizes, train_errors, test_errors):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 10/16Question 13 [marks 0]This is just a check-point, i.e. it is for you to see that you are correctly implementing allfunctions. Since this cell uses functions that you have already implemented and that havealready been marked, this Question is not going to be marked.When you run:X, y = get_dataset_regression(n_samples=800, std=30, inner_std=10)train_func, test_func = train_dt, test_dtparam=5sizes, train_errors, test_errors = compute_learning_curve(train_func,test_func, param, X, y, test_size=.3, n_steps=10, n_repetitions=100)e = estimate_asymptotic_error(train_errors, test_errors)print(Asymptotic error: %.1f%e)plot_learning_curve(sizes, train_errors, test_errors)you should get something likeQuestion 14 [marks 6]Make a function bias2, variance = compute_bias_variance(predictions_dict,targets) that takes in input a dictionary of lists of predictions indexed by the instance index,and the target vector. The function should compute the squared bias component of the errorand the variance components of the error for each instance.As a toy example consider: predictions_dict={0:[1,1,1], 1:[1,-1], 2:[-1,-1,-1,1]} and targets=[1,1,-1] , that is, for instance with index 0 there are 3predictions available [1,1,1] , instead for instance with index 1 there are only 2 predictionsavailable [1,-1] , etc. In this case, you should get bias2=[0. , 1. , 0.25] andvariance=[0. , 1. , 0.75] .In [ ]:def estimate_asymptotic_error(sizes, train_errors, test_errors):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# Just run the following code, do not modify itX, y = get_dataset_regression(n_samples=800, std=30, inner_std=10)train_func, test_func = train_dt, test_dtparam=5sizes, train_errors, test_errors = compute_learning_curve(train_func, test_func,e = estimate_asymptotic_error(sizes, train_errors, test_errors)print(Asymptotic error: %.1f%e)plot_learning_curve(sizes, train_errors, test_errors)In [ ]: def compute_bias_variance(predictions_dict, targets):# YOUR CODE HEREraise NotImplementedError()2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 11/16Question 15 [marks 10]Make a function bias2, variance = bias_variance_decomposition(train_func,test_func, param, data_matrix, targets, n_bootstraps) to compute the biasvariance decomposition of the error of a regressor on a given problem. The regressor will betrained via train_func on the problem data_matrix , targets with parameter param .The estimate will be done using a number of replicates equal to n_bootstraps .Question 16 [marks 2]Consider the following regression problem (it does not matter that the target is only 1 and -1):from sklearn.datasets import load_irisdef make_iris_data():X,y = load_iris(return_X_y=True)X=X[:,[0,2]]y[y==2]=0y[y==0]=-1return X,yEstimate the squared bias and variance component for each instance.Consider as regressor a linear svm and a polynomial svm with degree 3.What is the class of the instances that have the highest bias error on average?In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:def bias_variance_decomposition(train_func, test_func, param, data_matrix, targe# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# Just run the following code, do not modify itfrom sklearn.datasets import load_irisdef make_iris_data():X,y = load_iris(return_X_y=True)X=X[:,[0,2]]y[y==2]=0y[y==0]=-1return X,yX,y = make_iris_data()bias2, variance = bias_variance_decomposition(train_svm_1, test_svm, param=2, daprint(np.mean(bias2[y==1]) , np.mean(bias2[y==-1]))bias2, variance = bias_variance_decomposition(train_svm_3, test_svm, param=2, daprint(np.mean(bias2[y==1]) , np.mean(bias2[y==-1]))2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 12/16Question 17 [marks 6]Make a function bs,vs = compute_bias_variance_decomposition(train_func,test_func, params, data_matrix, targets, n_bootstraps) to compute the averagesquared bias error component and the average variance component of the error for eachparameter setting in the vector params . The regressor will be trained via train_func on theproblem data_matrix , targets with parameter param . The estimate will be done using anumber of replicates equal to n_bootstraps . To be clear, the vector bs contains theaverage square bias error for each parameter in params and the vector vs contains theaverage variance error for each parameter in params .Question 18 [marks 1]Make a function plot_bias_variance_decomposition(train_func, test_func,params, data_matrix, targets, n_bootstraps, logscale=False) .You should plot the individual components or the squared bias, the variance and the total error.You should allow the possibility to employ a logarithmic scale for the horizontal axis via thelogscale flag.You should get something like:Question 19 [marks 2]Make a function find_best_param_with_bias_variance_decomposition(train_func,test_func, params, data_matrix, targets, n_bootstraps) that uses the biasvariance decomposition analysis to determine which parameter among params achieves thesmallest estimated predictive error.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:def compute_bias_variance_decomposition(train_func, test_func, params, data_matr# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:def plot_bias_variance_decomposition(train_func, test_func, params, data_matrix,# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]: def find_best_param_with_bias_variance_decomposition(train_func, test_func, para# YOUR CODE HEREraise NotImplementedError()2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 13/16Question 20 [marks 6]When you execute the following codeX, y = get_dataset_regression(n_samples=400, std=10, inner_std=7)params = np.linspace(1,30,30).astype(int)train_func, test_func = train_dt, test_dtp = find_best_param_with_bias_variance_decomposition(train_func,test_func, params, data_matrix, targets, n_bootstraps=60)print(Best parameter:%s%p)plot_bias_variance_decomposition(train_func, test_func, params,data_matrix, targets, n_bootstraps=50, logscale=False)You should get something like:The next unit tests will run your functionsfind_best_param_with_bias_variance_decomposition on an undisclosed datasetusing as regressors:decision treesvm degree 3and 3 marks will be awarded for each correct optimal parameter identified.Question 21 [marks 5]Make a function conf_mtx = confusion_table(targets, preds) to output theconfusion matrix as a 2 x 2 Numpy array. Rows indicate the prediction and columns the target.The cell element with index [0,0] should report the true positive count.Running the following code:from sklearn.datasets import load_irisX,y = load_iris(return_X_y=True)from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.3)models = train_knn(X_train, y_train, param=3)preds = test_knn(X_test, models)conf_mtx = confusion_table(y_test, preds)print(conf_mtx)In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 14/16you should obtain something similar to[[16. 1.][ 0. 28.]]Note: the exact values can differ in your runNote: do not use library functions to compute the result directly but implement your ownversion.Question 22 [marks 1]Make a function error_from_confusion_table(confusion_table_func, targets,preds) that takes in input the previous confusion_table function and returns the error, i.e.the fraction of predictions that do not agree with the targets.Question 23 [marks 12]Make a function predictions, out_targets =cross_validation_prediction(train_func, test_func, param, data_matrix,targets, kfold) that estimates the predictions of a classifier trained via the functiontrain_func with parameter param on the problem data_matrix, targets using a kfoldcross validation strategy with the number of folds indicated by kfold .Since the order of the instances associated to the predictions can be different from the originalorder, the function is required to output also the corresponding target values in the arrayout_targets (i.e. the value in position 10 in predictions corresponds to the target valuein position 10 in out_targets )Note: do not use library functions (such as KFold or StratifiedKFold ) but implementyour own version of the cross validation.In [ ]:def confusion_table(targets, preds):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:def error_from_confusion_table(confusion_table_func, targets, preds):# YOUR CODE HEREraise NotImplementedError()In [ ]:# This cell is reserved for the unit tests. Do not consider this cell.In [ ]:def cross_validation_prediction(train_func, test_func, param, data_matrix, targe# YOUR CODE HEREraise NotImplementedError()In [ ]: # This cell is reserved for the unit tests. Do not consider this cell.2021/3/2 1localhost:8888/nbconvert/html/Downloads/1.ipynb?download=false 15/16Question 24 [marks 5]Make a function mean_errors =compute_errors_with_crossvalidation(train_func, test_func, params,data_matrix, targets, kfold, n_repetitions) that returns the estimated averageerror for each parameter in params . The classifier is trained via the function train_funcwith parameters taken from params on the problem data_matrix, targets using a k-foldcross validation strategy with the number of folds indicated by kfold . The error estimate isrepeated a number of times indicated in n_repetitions . The error should be computedusing the fu”