dataset留学生程序辅导写作Python程序语言

” STAT 3701 Homework 3
Show all work. Submit your solutions in a pdf document on Canvas. Include your R code (which must be
commented and properly indented) in the pdf file. Also submit one text file with all your R code (comments
and all) clearly labeled with the problem it goes with. This must be properly indented. Before every solution
with random sampling use set.seed(3701).
Question 1 (15 points).
Lets consider a slightly different data generating model for paired data. Suppose we want to compare
students scores of a standardized test before and after taking a prep course. There are n students that will be
taking the test. Let X be the yet-to-be observed random variable of the score before the course and Y be the
yet-to-be observed random variable of the score after the course. Moreover, we let X and Y be the linear
combinations of two common latent variable, i.e., the model for the ith student is Xi = 1 + a1Ai + b1Bi
Yi = 2 + a2Ai + b2Bi,
where j , aj , bj , j = 1, 2 are constants. Ai
, Bi are independent both within the same subject and across
different subjects. Ais are iid N(0, 2A), Bi
is are iid N(0, 2B).
(a) (4 points) What are the distributions of Xi and Yi
, respectively? If we let Zi = Xi Yi
, does this data
generating model still satisfy the assumptions for paired t-test? Why?
(b) (5 points) Let 1 = 68, 2 = 70, 2
A = 2, 2
B = 1, a1 = 2, b1 = 1, a2 = 1, b2 = 2. It can
be show that E[X] = 68, E[Y ] = 70. Use simulation to produce a 99% confidence interval for
Var(X) = E[(X68)2
] and a 99% confidence interval for Var(Y ) = E[(Y 70)2
]. Set reps = 105
.
(c) (3 points) Now we are interested in paired t-test for
H0 : 1 = 2
Ha : 1 = 2,
with data assumed to be coming from our data generating model. Create a function named mypaired.pval
that generate realizations of the random p-value of this test. Your function should have the following
arguments
mu1, mean of X
mu2, mean of Y
a1,b1,a2,b2 as defined in our data generating model
sigmaA, standard deviation of A
sigmaAB, standard deviation of B
1
n, sample size (number of pairs)
reps, number of replications.
The function should output a vector containing the realizations of the random p-value.
(d) (4 points) Use 1 = 68, 2 = 68.5, 2
A = 2, 2
B = 1, a1 = 2, b1 = 1, a2 = 1, b2 = 2, = 0.05. Set
reps = 1000. Test your function by creating a simulation based estimation of the power curve with
the sample size n {20, 25, . . . , 200}. Produce a pointwise 95% confidence interval for the power
curve with the Clopper-Peason CI and include the curves for the lower bound and upper bound of CI
in the same plot of the estimated power curve.
Question 2 (15 points)
For two-independent-samples model, we assume that Xis are iid N(1, 2
) for i = 1, . . . , n1, Yj s are iid
N(2, 2
) for j = 1, . . . , n2 and all observations are independent. We provided the 100(1)% confidence
interval for 1 2 as
X Y t1/2,n1+n22Sp
In this question, we are interested in simultaneous confidence intervals for both 1 and 2, i.e., instead of
a CI for 1 2, we want one CI [L1, U1] for 1 and another CI [L2, U2] for 2 such that [L1, U1] and
[L2, U2] should capture the corresponding means simultaneously with probability larger than 1 . Here
L1, L2, U1, U2 are some lower bounds and upper bounds calculated using the data and 1 is our significance
level.
(a) (3 points) A very natural (but could be wrong potentially) idea would be
[L1, U1] = X t1/2,n11
is the sample variance for {Xi} and S
is the sample variance for {Yi}. Calculate
the coverage probability for these simultaneous CIs, that is, calculate
(b) (4 points) Let 1 = 68, 2 = 70, = 3, n1 = n2 = 20, = 0.05. Use simulation to estimate the
coverage probability defined in (a). Set reps = 105
. Create a 95% conservative CI for this coverage
probability. Is 1 in this CI?
(c) (4 points) Now consider a Bonferroni corrected simultaneous confidence interval,
Show that this set of simultaneous CIs has a coverage probability larger than 1 . You may want
apply the Bonfferoni inequality P(A B) 1 (1 P(A)) (1 P(B)).
2
(d) (4 points) Let 1 = 68, 2 = 70, = 3, n1 = n2 = 20, = 0.05. Use simulation to estimate the
coverage probability defined in (a). Set reps = 105
. Create a 95% conservative CI for this coverage
probability.
Question 3 (20 points)
We have explored two sample procedures for the means, but in this problem we will write a function to do
these procedures for proportions. Consider two binomial random variables V and W. More specifically,
V Binom(n1, 1) and W Binom(n2, 2), where 1, 2 (0, 1). We are interested in testing
H0 : 1 = 2
Ha : 1 = 2.
Let b1 = V /n1, b2 = W/n2 and b = (V + W)/(n1 + n2). Then the test statistic for the test is
Under the null hypothesis, T N(0, 1) approximately.
(a) (5 points) Let 1 = 2 = 0.4, use simulation to evaluate the distribution of T for the cases
n1 = n2 = 15
n1 = n2 = 100
In each case, create a QQ-plot to compare the distribution of T with N(0, 1) and comment on the plot.
Set reps = 10000. You may use qnorm function when calculating the percentiles of the standard
normal distribution. Comment on the plots. You are not allowed to use rbinom to generate from
binomial distribution.
(b) (5 points) Besides the test, we can also provide an approximate 100(1 )% CI for 1 2 using the
formula
Create an R function myprop.test that generates the data, calculate the test statistic T and then
output the p-value of the test and the CI for 1 2. The function should take the following arguments
n1, number of trials for V
n2, number of trials for W
theta1, success probability for V
theta2, success probability for W
alpha, significance level
reps, number of replications
and the function outputs a list with three components:
3
pval, vector of p-values of each replication
upper, vector of upper bounds of CIs in each replication
lower, vector of lower bounds of CIs in each replication
(c) (5 points) Test your function by estimating the coverage probability of the CI when 1 = 0.6, 2 =
0.4, n1 = n2 = 20, = 0.05. Use reps=105
. Create a 95% conservative CI for the coverage
probability. Does it contain 1 ?
(d) (5 points) Use your function to create a power curve when we increase the distance between 1
and 2. More specifically, let n1 = n2 = 100, 2 = 0.52, = 0.05 and 1 takes values in
{0.52, 0.525, . . . , 0.680}. Set reps = 1000. Make a plot of the estimated power curve and also
include the lower and upper bounds of a 95% Clopper-Pearson confidence interval using dashed lines.