MS6221课程序辅导、Modeling程序写作

” MS6221课程序辅导、Modeling程序写作MS6221 Predictive Modeling in MarketingIndividual Take-Home Final ProjectGuidance1. Please upload your pdf submission to Cavnas before May. 10th 11pm.2. Please have your name, student number, and CityU email on top of the first page.3. If you have any general question, please post it on Canvas discussion before May 8th 11pm.4. If you have any personal concern, please also email me as early as possible.Project QuestionA large company that uses catalogs as part of its targeting strategy plans to send out a spring tabloidfeaturing womens clothes and shoes. Management needs to decide who among all potential customers in thehouse file should receive a spring tabloid.In order to make a mailing decision, the company has extracted a sub-sample of previous customers fromits database. The data are in file catalog_data.csv. A data description is given below. The data containsinformation on whether the customer bought from the spring tabloid in the year before, and information onsome attributes of the customer at the time right before the spring tabloid was sent out last year.As always, familiarize yourself with the data first before you start your statistical analysis.0. Report Writing (30 points)You are going to prepare a consulting report to the management, most of whom only expect to see marketinganlaysis with figures and tables without any coding. In your report, please follow the question roadmap below. Please limit your report in 10 pages. There is no need to show us your code. You can use any software to prepare for the report. If you are really good in Excel, I dont mind.1Variable Descriptioncustomer_no Customer id, can be linked to addressbuytabw 1 = bought, 0 = did not buy from catalogtabordrs Total orders from tabloidsMS6221课程作业辅导、Modeling作业写作、Java，Python/c++程序语言作业辅导divsords Total Orders with shoe divisiondivwords Total Orders with womens divisionspgtabord Total spring tabloid orderstabordrs_year Orders from tabloids in the last yeardivsords_year Orders with shoe division in the last yeardivwords_year Orders with womens division in the last yeartabordrs_quarter Orders from tabloids in the last quarterdivsords_quarter Orders with shoe division in the last quarterdivwords_quarter Orders with womens division in the last quartermoslsdvs Months since last shoe ordermoslsdvw Months since last womens ordermoslstab Months since last tab orderorders Total ordersage Age of the customerfamily_income Family Income of the customermarried 1 = married of the customerfulltime_work Fulltime work status of the customerfamily_size Family size of the customer1. Estimation (20 points)Randomly split the data into an estimation sample and a validation sample:import numpy as npimport pandas as pdimport randomnp.random.seed(2020)catalog_DF = pd.read_csv(./catalog_data.csv)L = catalog_DF.shape[0]train_index = random.sample(range(0,L),10000)train_index.sort()DF_estimation = catalog_DF.loc[train_index,:]DF_prediction = catalog_DF.drop(index=train_index)(1) Using information from the estimation sample only, estimate a logistic regression model of the purchasedecision (buytabw), using all customer attributes in the data file (except customer_no) as independentvariables.(2) Try the linear probability model, a.k.a., regression. The purchase decision (buytabw) is a binaryoutcome. Using regression should restrict the outcome to [0, 1]. You can simply change all predictedvalues below 0 to zero, and all predicted value above 1 to 1.(3) Try recommend one machine learning method. Do it similar as the regerssion, change all predictedvalues below 0 to zero, and all predicted value above 1 to 1.For all analysis below, you should compare results for the logistic regression, linear probability model, andyour chosen method.22. Predicted purchase probability in the validation sample (10 points)Predict the purchase probability for all customers in the validation sample. Verify that the predicted purchaseprobability variable was created and that it has reasonable values.From now on, you should only work with observations in the validation sample.3. Plot predicted purchase probabilities (10 points)Try present useful plots of the predicted purchase probabilities, separately for customers who made a purchaseafter receiving the catalog and those who did not respond:Do your plots indicate that the model has some power to predict who is likely to purchase in the validationsample?4. Scoring and segmentation (10 points)Score the customers and segment the customers into ten deciles, where score = 1 corresponds to thecustomers with the lowest predicted purchase probabilities and score = 10 corresponds to the customerswith the highest predicted purchase probabilities. Employ the createBins function for this task.# define function createBins ——————————————–# Inputs: x, data# N, the number of bins (groups) to createdef createBins(x,N):cut = [i/N for i in range(1,N)]df = pd.DataFrame(x)cut_points = df.quantile(cut).T.valuescut_points = np.unique(cut_points)cut_points = np.insert(cut_points,0,values=float(-inf))cut_points = np.append(cut_points,values=float(inf))labels = [{0}.format(i) for i in range(1,len(cut_points))]bins = pd.cut(x, cut_points, labels=labels)bins = pd.DataFrame(bins).astype(int)return binsNow create a summary data set, score_DF, that contains some key summary statistics separately for eachsegment (score). Include these summary statistics: Number of observations in segment Number of buyers in segment Mean predicted purchase probability Mean observed purchase rate (based on buytabw)5. Lift and gains (10 points)Create a table indicating the lift, cumulative lift, and cumulative gains from the predictive model. Plot thelift, cumulative lift, and cumulative gains chart.3Interpret and discuss the lifts and gains: Is the predictive model useful for targeting purposes?6. Profitability analysis (20 points)From now on work again with the customer-level data in catalog_DF. Use the following data: Based on past data, the average dollar margin per customer is $ 26.90 The cost of printing and mailing one tabloid is $ 1.40Using the predicted purchase probability, calculate expected profits. Try provide useful figures of the expectedprofits variable and discuss. Calculate the fraction of customers who are expected to be profitable, i.e. have positive expected profits. Now rank customers according to their expected profitability. Then calculate realized profits, based onthe observed purchase decision of each customer. Calculate the cumulative sum of realized profits for a targeting strategy where customers are targetedin descending order of expected profits. Plot the cumulative realized profits on the y-axis versus the percent of customers mailed on the x-axis.Discuss your findings.7. Recommended targeting strategy (20 points)What mailing strategy do you recommend? Compare the actual profitability from your proposed strategy to1. The expected profitability based on your model,2. A mass mailing strategy where each customer receives a catalog.What is the percent improvement in profits from your recommended strategy relative to a mass mailingstrategy?4如有需要，请加QQ：99515681 或邮箱：99515681@qq.com

“