”
MAST90044作业 写作、 辅导data课程作业、R编程设计作业调试、R语言作业 辅导
MAST90044 Thinking and Reasoning with Data
Semester 1 2020
Assignment 1
Due: 8am, Monday 27 April
Instructions
Assignments are to be submitted (uploaded) via Canves.
Please label your assignment with the following information:
your name;
your student number;
your lab class;
your tutors name.
You must sign the plagiarism ideclaration. The link is available on the subjects Canves website.
Your assignment should show all working and reasoning, as marks will be given for method as well as
for correct answers. Please spell check your document.
Paste any R code and output into the appropriate places so that it can be seen easily along with your
other work. Graphics from R can be resized within your document; make them smaller as necessary.
Assignments count for 50% of the assessment in this subject. This one is worth 15%, and covers the
work done in weeks 1 to 4.
Tutors will not help you directly with assignment questions. However, they may give some help with
R.
Solutions to the assignment questions will be made available later.
When constructing a panel of graphs with multiple plots, it is good to use the R command
par(mfrow = c(nrows,ncols)) where nrows is the number of rows and ncols the number of columns
in the panel. The default is (1,1).
MAST90044 Thinking and Reasoning with Data Assignment 1
Q.1. The data set unesco.csv, available on the LMS, contains demographic and economic information from
the 1990 UNESCO yearbook on about half the worlds countries. Definitions of the variables in the
data set are as follows:
Birth rate per 1,000 of population
Death rate per 1,000 of population
Infant deaths per 1,000 of population
Life expectancy at birth for males
Life expectancy at birth for females
Gross National Product (GNP) per capita
Geopolitical group
1 Eastern Europe (former Soviet Satellite)
2 South America and Mexico
3 Western Europe, North America, Japan
4 Middle East
5 Asia
6 Africa
Country
Ignoring geopolitical group:
(a) Summarise the GNP values using summary statistics and two graphical tools. Briefly describe any
obvious features of the distribution.
(b) Use two graphical tools to compare the observed distribution of infant deaths with a normal
distribution. Briefly comment.
(c) Graphically examine the relationship between the infant death rate and GNP. Calculate the correlation
coefficient between the two variables. Comment on how useful it is in this situation.
(d) Graphically examine the relationship between life expectancy at birth for females and the birth
rate. Comment on the strength or otherwise of the relationship. Formulate a statistical model to
describe the relationship. Graphically fit the model.
Taking geopolitical group into account:
(e) Use two graphical tools to examine the relationship between life expectancy at birth for males and
geopolitical group. Use suitable R functions to calculate the mean and standard deviation for each
group, and the number of countries in each group. Comment on any obvious differences between
the groups and identify any clear outliers.
(f) Calculate the net population growth rate per 1000 of population (we will call this net growth).
Type library(lattice) in R to ensure that the xyplot() function is available. Use xyplot
to examine the relationship between net growth and GNP for each geopolitical group separately.
Note that in the matrix of plots, group 1 will be placed in the bottom left hand corner, and you
proceed across the row of plots. Comment on what the plots show in regard to the relationship,
and any limitations of this type of plot here.
(g) Create a plot of net growth vs GNP for group 2 on its own. Calculate the correlation coefficient,
and comment on the strength and direction of the relationship.
[4 + 3 + 4 + 5 + 8 + 7 + 5 = 36 marks]
2
MAST90044 Thinking and Reasoning with Data Assignment 1
Q.2. It is well known that quitting smoking is difficult. Many people who are trying to quit use nicotene
replacement methods like nicotene patches or nicotene gum to ease nicotene withdrawal symptoms. As
an alternative, medical researchers investigated whether the use of an antidepressant medication might
be a more effective aid to those attempting to give up cigarettes. In a study reported in March 4,
1999, New England Journal of Medicine, researchers published results that compared the effectiveness
of nicotene patches to the effectiveness of the antidepressant burpropion, which is marked with the
brand name Zyban. The study consisted of 893 participants who were randomly allocated to four
(i = 1, 2, 3, 4) treatment groups, listed below in the table. They did not know to which treatment
they were allocated i.e. this was a single-blind study. The table below shows the number of people not
smoking 6 months following the study, for each treatment.
Treatment Subjects not smoking (xi) Total subjects (ni)
Placebo only 30 160
Nicotene patch 52 244
Zyban 85 244
Zyban and nicotene patch 95 245
(a) Calculate the Wald, Agresti-Coull and Jeffreys prior 95% confidence intervals for each treatment
group separately. Draw the confidence intervals.
Comment briefly on your findings.
(b) Comment on the validity or otherwise of the assumptions made in these calculations.
(c) Find a point and an interval estimate of the difference in proportions of those not smoking after
6 months between people who used the Zyban + patch group and those who used Zyban alone.
Give an interpretation of the confidence interval. Make one comment, with supporting evidence
from above, on the claim that using a patch in addition to Zyban is effective for quitting.
(d) Construct a Wald confidence interval to test the claim that using a nicotene patch is no more
effective than using nothing at all. Interpret the confidence interval as well as a reason for your
choice of confidence interval method.
(e) Provide a single Wald confidence interval to test the claim that Zyban, with or without a patch,
is better than doing nothing or using a patch. Interpret the confidence interval as well as a reason
for your choice of confidence interval method.
[5 + 4 + 4 + 5 + 6 = 24 marks]
Q.3. The chi-squared distribution, denoted by X 2, is used a great deal in statistics and science, and we
will meet it again later. The exact shape of the distribution depends on the degrees of freedom (), at
larger values the chi-squared approaches a normal distribution, and therefore stronger departure from
the normal distribution. Here we will examine how quickly the sampling distribution of the sample
mean taken from a X
2
2 distribution converges to normality (or at least to symmetry).
(a) Take a large sample from the X
2
2 distribution and test its departure from normality using two
graphical tools. You will need the R function rchisq. Comment on the result.
(b) Examine the sampling distribution of the sample mean from samples of size 5, by generating 1000
such samples and looking at a plot of the density (make a comment about the distribution).
(c) Compare the sampling distribution of the sample mean for a range of sample sizes (e.g. 1, 5, 10,
20, 40, 80), and use your results to suggest how large the sample size needs to be for adequate
convergence. The mean of a X
2
distribution is .
[ 5 + 3 + 5 = 13 marks]
Total marks = 73
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。