” Stat-8003程序 写作、R编程语言程序调试Stat-8003 McAlinn/Fall-20problem set no. 1learning objectives. compute likelihoods, both for a generic sample, i.e., (x1, …, xn),and for a specific sample, i.e., (2, 3, 6, 4, 8, 5, 6, 2, 3, 6, 5); write some short programs in Rto generate fake data sets from a given model and plot the corresponding likelihoods.problem 1. set-up: you are interested in studying the writing style of a popular TimeMagazine contributor, FZ. you collect a simple random sample of his articles and count howmany times he uses the word however in each of the articles in your sample, (x1, …, xn).In this set-up, xiis the number of times the word however appeared in the i-th article.question 1.1. (10 points) Define the population of interest, the population quantity ofinterest, and the sampling units.question 1.2. (10 points) what are potentially useful estimands for studying writing style?(hint: you are interested in comparing FZ writing style to that of other contributors.)question 1.3. (10 points) model: let Xi denote the quantity that captures the numberof times the word however appears in the i-th article. lets assume that the quantitiesX1, …Xn are independent and identically distributed (IID) according to a Poisson distributionwith unknown parameter ,p(Xi = xi| ) = Poisson(xi| ) for i = 1, …, n.Stat-8003作业 写作、R编程语言作业调试、R实验作业using the 2-by-2 table of whats variable/constant versus whats observed/unknown, declarewhats the technical Nature (random variable, latent variable, known constant or unknownconstant) of the quantities involved the set-up/model above: X1, ..Xn, x1, …xn, and n.question 1.4. (10 points) write the data generating process for the model above.question 1.5. (10 points) define the likelihood L() = p( | ) for this model and set-up atthe highest level of abstraction.question 1.6. (10 points) write the likelihood L() for a generic sample of n articles,(x1, …, xn).question 1.7. (10 points) write the log-likelihood `() for a generic sample of n articles,(x1, …, xn).question 1.8. (10 points) write the log-likelihood `() for the following specific sample of 7articles (12, 4, 5, 3, 7, 5, 6).1question 1.9. (10 points) plot the log-likelihood `() in R for the same specific sample of7 articles (12, 4, 5, 3, 7, 5, 6). What is the maximum value of (approximately)?question 1.10. (10 Points) draw a graphical representation of this model, which explicitlyshows the random quantities and the unknown constants only.Extra credit mmmh … something is amiss. the articles FZ writes have different lengths.if we model the word Occurrences in each article as IID Poisson random variables with rate, we are implicitly assuming that the articles have the same length. why? (10 points;extra credit) and if that is true, what is the implied common length? (10 points; extracredit)problem 2. set-up: you collect another random sample of articles penned by FZ andcount how many times he uses the word however in each of the articles in your sample,(x1, …, xn). you also count the length of each article in your sample, (y1, …, yn). In thisset-up, xiis the number Of times the word however appeared in the i-th article, as before,and yiis the total number of words in the i-th article.question 2.1. (10 points) model: let Xi denote the quantity that captures the numberof times the word however appears in the i-th article. lets assume that the quantitiesX1, …Xn are independent and identically distributed (IID) according to a Poisson distributionwith unknown parameter yi1000 ,p(Xi = xi| yi, , 1000) = Poisson(xi| yi1000) for i = 1, …, n.using the 2-by-2 table of whats variable/constant versus whats observed/unknown, declarewhats the technical Nature (random variable, latent variable, known constant or unknownconstant) of the quantities involved the set-up/model above: X1, ..Xn, x1, …xn, y1, …yn, and n.question 2.2. (10 points) what is the interpretation of yi1000 in this model? explain.question 2.3. (10 points) what is the interpretation of in this model? explain.question 2.4. (10 points) write the data generating process for the model above.question 2.5. (10 points) define the likelihood L() = p( | ) for this model and set-up atthe highest level of abstraction.question 2.6. (10 points) write the likelihood L() for a generic sample of n articles,(x1, …, xn), and n article lengths, (y1, …, yn).2question 2.7. (10 points) Write the log-likelihood `() for a generic sample of n articles,(x1, …, xn), and n article lengths, (y1, …, yn).question 2.8. (10 points) Simulate the number of occurrences of the word however for 5articles using the Data generating process. Assume = 10 and coresponding article lengthsy = (1730, 947, 1830, 1210, 1100). Record the number of occurrences of however in eacharticle.question 2.9. (10 points) write the log-likelihood `() for the following the specific sampleof occurrences you generated in the previous question and their corresponding 5 articlelengths (1730, 947, 1830, 1210, 1100).question 2.10. (10 points) Plot the log-likelihood from the previous question in R. Doesthe maximum occur near = 10?question 2.11. (10 points) draw a graphical representation of this model, which explicitlyshows the random quantities and the unknown constants only.OK, that was a more reasonable model. but FZ writes about different topics. our modelis not capturing that. is FZ more prone to offering his own opinions when he writes aboutpolitics than when He writes about other topics? lets investigate.problem 3. set-up: you collect a random sample of articles penned by FZ and count howmany times he uses the certain word I in each of the articles in your sample, (x1, …, xn).In this set-up, xiis the number of times the word I appeared in the i-th article.question 3.1. (10 points) model: let Xi denote the quantity that captures the number oftimes the word I appears in the i-th article. let Ziindicate whether the i-th article isabout politics, denoted by Zi = 1, or not, denoted by Zi = 0. lets assume that the quantitiesX1, …, Xn are independent of one another conditionally on the corresponding valuesof Z1, …, Zn. lets assume that the quantities Z1, …, Zn are independent and identicallydistributed (IID) according to a Bernoulli distribution with parameter ,p(Zi| ) = Bernoulli(zi| ) for i = 1, …, n.lets further assume That the number of occurrences of the word I in an article aboutpolitics follows a Poisson distribution with unknown parameter P olitics,p(Xi = xi| Zi = 1, P olitics) = Poisson(xi| P olitics) for i = 1, …, n,and that the number of occurrences of the word I in an article about any other topic followsa Binomial distribution with size 1000 and unknown parameter Other,p(Xi = xi| Zi = 0, 1000, Other) = Binomial(xi| 1000, Other) for i = 1, …, n.3using the 2-by-2 table of whats variable/constant versus whats observed/unknown, declarewhats the technical Nature (random variable, latent variable, known constant or unknownconstant) of the quantities involved the set-up/model above: X1, ..Xn, x1, …xn, Z1, ..Zn,z1, …zn, , P olitics, Other and n.question 3.2. (10 points) write the data generating process for the model above.question 3.3. (10 points) simulate 1000 values of Xiin R from the data generating processassuming pi = 0.3, P olitics = 30 and Other = 0.02. Plot the values of Xi|Zi = 1 andXi|Zi = 0 as two histograms on the same plot. Color the histograms by the value of Zi sothe two populations can be distinguished.question 3.4. (10 points) write the likelihood for 1 article, Li(P olitics, Other) = p(Xi =xi| P olitics, Other).question 3.5. (10 points) write the likelihood L(P olitics, Other) for a generic sample of narticles, (x1, …, xn).question 3.6. (10 points) write the log-likelihood `(P olitics, Other) for a generic sample ofn articles, (x1, …, xn).question 3.7. (10 points) write the log-likelihood `(P olitics, Other) for the following specificsample of 8 articles (12, 4, 8, 3, 3, 10, 1, 9).question 3.8. (10 points) draw a graphical representation of this model, which explicitlyshows the random quantities and the unknown constants only.Extra credit wait, but is it reasonable to assume that the rate is an unknown constantin all of our models? it seems like a stretch. (10 points; if you agree)problem 4. set-up: lets Go back to the simplest possible set-up for this exercise. youcollect a random sample of articles penned by FZ and count how many times he uses theword and in each of the articles in your sample, (x1, …, xn). In this set-up, xiis the numberof times the word and appeared in the i-th article, as before.question 4.1. (10 points) model: let Xi denote the quantity that captures the number oftimes the word and Appears in the i-th article. lets assume that the quantities X1, …Xnare independent and identically distributed (IID) according to a Poisson distribution withunknown parameter ,p(Xi = xi| = i) = Poisson(xi| i) for i = 1, …, n.4in addition, lets assume that the rate is distributed according to a Gamma distributionwith unknown parameters and ,f( = i| , ) = Gamma(i| , ).using the 2-by-2 table of whats variable/constant versus whats observed/unknown, declarewhats the technical nature (random variable, latent variable, known constant or unknownconstant) of the quantities involved the set-up/model above: X1, ..Xn, x1, …xn, , 1, …n,, and n.question 4.2. (10 points) write the data generating process for the model above.question 4.3. (10 points) in R simulate 1000 values from the data generating process.Assume = 10 and = 1. Compute the mean and variance of the Xi.question 4.4. (10 points) in R simulate 1000 values assuming i = 10 for all i (ignorethe Gamma distribution). Compute the mean and variance of the Xi now. How do theycompare to the mean and variance you calcualted in question 4.3?question 4.5. (10 points) write the likelihood for 1 article, Li(, ) = p(Xi = xi| , ).question 4.6. (10 points) Write the log-likelihood `(, ) for a generic sample of n articles,(x1, …, xn).question 4.7. (10 points) write the log-likelihood `(, ) for the following specific sampleof 8 articles (64, 61, 89, 55, 57, 76, 47, 55).question 4.8. (10 points) Draw a graphical representation of this model, which explicitlyshows the random quantities and the unknown constants only.Extra credit do you recognize the very special probability mass function you just obtainedfor p(Xi = xi| , ) = Li(, )? (10 points; extra credit) excellent! you just proved a usefulresult: Gamma mixture of Poisson is a … .Generate samples from this Distribution and verify graphically that you get the distributionlooks the same as that in 4.3 (you must use appropriate parameters you identified above).(10 points; extra credit)5如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
“
添加老师微信回复‘’官网 辅导‘’获取专业老师帮助,或点击联系老师1对1在线指导。