辅导MSBA7002程序业、 写作Statistics程序

辅导MSBA7002作业、 写作Statistics作业、 辅导R编程语言作业、R课程设计作业 写作
Predict the Bankruptcy Situation of Polish Companies
MSBA7002: Business Statistics
Nov 13, 2019
Due Date: 11:55pm Dec 1, 2019
Projective Deliverables
(1) Power point or pdf file, containing clear steps about model selection and five
interesting visualizations that can help to answer analytical problems;
(2) Rmarkdown file that embeds R code within the analysis narrative, or
(3) Python code with comments written in Jupyter notebook.
(please make sure that by running the code, all the results reported can be
reproduced).
Content
Bankrupt file
is about bankruptcy prediction of Polish companies. The data contains financial rates
from 2nd year of the forecasting period and corresponding class label that indicates
bankruptcy status after 4 years. The bankrupt companies were analysed in the period
2000-2012, while the still operating companies were evaluated from 2007 to 2013. In
this task, you need to use 64 features from the financial reports to predict which
companies will go bankrupt in the next four years.
Lookup file
contains the text description of all variable codes in the data file.
Training and testing datasets
Training data: containing 6000 rows of data;
Testing data: containing 3000 rows of data.
Tasks
1. Data Pre-processing
This dataset contains plenty of missing values. You need to report how you handle
these missing values.
2. Model Selection
Report how you build your model or model ensemble with suitable criterion.
3. Visualizations
Report 5 most interesting visualizations that can help to answer analytical problems.
For example, are there any predictors have some high correlations, which two
dimensions can provide a good classification performance, etc.
4. Classification
Based on the features available, develop a model that predicts the bankruptcy
situation of the companies. The classification results will be evaluated in Kaggle
automatically.
The evaluation metric F score
with TP, FP and FN being the numbers of true positive,
false positive, and false negative, respectively.
5. Report results
The slides should include at least three parts visualization, methodology, and results.
The methodology section should be precise and can justify your decisions, for
example, how you choose hyperparameters, why you prefer a particular method over
the others?
The codes need to demonstrate that the classification results are reproducible, and the
adopted method is consistent with the one introduced in the submission file.
Notice
1. Project deadline: 11:55 pm, Dec 1, 2019.
2. You can use either R or Python for the classification task.
3. The composition of marks is given in the table below
Total Score Criterion
Visualization 5 Innovation
Aesthetic
Information complexity
Insightfulness of conclusion
Classification 1
Analysis
Presentation
10 Clear idea about model selection
Good interpretation about model
Well-structured report

添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导