”
SDGB-7847作业 写作、c++程序语言作业 辅导、 写作Java、Python课程作业
SDGB-7847
Final Exam
The data we are working with is in longitudinal format. Each column represents a patient, and each row represents a gene expression reading for genes 1-5913. The patients disease status is marked in the column header. The first 20 patients are marked with meta, meaning these patients have a form of metastatic cancer (disease=1). The last 20 patients do not have the disease (disease=0).
You will need to transform this data into a model-ready format in order to predict metastatic disease by patients expression of each gene.
Set your Rs seed to 1234.
Once your data is ready to model, separate it into training and test sets.
Apply the following algorithms- training on your training data and testing on your test data- to predict disease based on gene expression. From your test data, pull out your accuracy, sensitivity and specificity.
RF (RF on the full dataset may take a long time to run due to the number of genes being used as predictor variables)
RF+PCA
KNN + PCA (Use iteration to find optimal value of K)
In an external document, write a discussion on which algorithm you would choose and why. Discuss what the variable importance plot showed for RF and RF + PCA, the number of principal components you chose and what you chose as your optimal value of K.
Upload your code and your external explanation document by Thursday, April 30th at 8pm.
Thank you for a wonderful class and have a great summer! Stay in touch!
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。