## Module 11: Statistical inference for simple linear regression practice...

 Author Message NEAS Supreme Being         Group: Administrators Posts: 4.2K, Visits: 1.2K Module 11: Statistical inference for simple linear regression practice problems(The attached PDF file has better formatting.)** Exercise 11.1: Key assumptions of classical regression analysisClassical regression analysis is based on the statistical model yj = " + \$1 xj1 + \$2 xj2 + … + \$k xjk + ,j.Explain the following key assumptions of classical regression analysis:A. Linearity of the relation between the explanatory variables and the response variableB. Constant variance of the error termC. Normality of the error termD. Independence of the error termsE. Explanatory variables are fixed or have no measurement errorF. X values are not invariant: they have at least two valuesPart A: The linearity assumption is that the expected error term is zero: E(,j) = 0Jacob: Do you mean that the average y-value is a linear function of the average x-values? We show this bytaking expectations:E(yj) = E(" + \$1 xj1 + \$2 xj2 + … + \$k xjk + ,j) A ¯y = " + \$1 ¯x1 + \$2 ¯x2 + … + \$k ¯xkRachel: Your explanation shows that the means of the observed X values and the mean of the observed Yvalues lie on the fitted regression line. The value of A (the least squares estimator of ") is chosen to force thisrelation. Mean squared estimators have the property that¯y = A + B1 ¯x1 + B2 ¯x2 + … + Bk ¯xkThis relation is true even if the true relation between Y and the X-values is not linear. It reflects the estimationmethod, not the attributes of the relation.The linearity assumption is that the expected error term is zero at each point: E(,j) = 0 for all j.Jacob: Are you saying that the expected value of each observed residual is zero?Rachel: Distinguish between the observed residual and the error term.! The observed residual is the observed response variable Y minus the linear function of the observedexplanatory variables, using the ordinary least squares estimators (A and B) for the parameters.! The error term is the expected response variable Y at any point minus the linear function of the observedexplanatory variables, using the population regression parameters " and \$.The population regression parameters " and \$ specify the relation between the explanatory variables and theresponse variable. The ordinary least squares estimators A and B are linear functions of the sample values.A and B are not the same as " and \$. The linearity assumption says the expected value of the error term ateach point is zero, not that the expected value of the residual using the ordinary least squares estimators iszero at each point.Jacob: Can you show how the linearity assumption relates to the expected value of the error term?Rachel: Suppose the true relation is Y = 1 + X2 + ,, which is not linear. We fit a regression line through threepoints: X = (0, 1, 2), for which the expected Y values are (1, 2, 5).The ordinary least squares estimators A and B depend on the sample values. They are unbiased estimatorsof " and \$. They are linear functions of the observed y-values, so the expected values of A and B are fittedvalues using the expected values of Y at each point. This fitted regression line is Y = 0.66667 + 2X + ,.The expected residuals are computed in the table below:X-value Expected YValueFitted YValueExpectedResidual0 1 0.66667 0.333331 2 2.66667 -0.666672 5 4.66667 0.33333! X = 0: expected residual = 0.33333.! X = 1: expected residual = –0.66667.! X = 2: expected residual = 0.33333.Intuition: The linearity assumption says that the expected error term at each X value is zero, not that the meanresiduals for the sample points is zero. The second statement is an attribute of least squares estimators; thefirst statement is an assumption of classical regression analysis.Jacob: Is the expected error term over all X values zero even if the true relation is not linear?Rachel: The expected residual over all X values is zero whether or not the true relation is linear.But the X values are fixed in experimental studies and are measured without error in observational studies.They do not have an error term; the only random variables with an error term in the regression equation are, and Y.Jacob: GLM stands for generalized linear model, implying that the relation is linear. But GLMs are used whenthe response variable is not a linear function of the explanatory variables.Rachel: The term generalized linear means that the response variable is a function of a linear combinationof the explanatory variables. This function is inverse of the link function. That is, the link function of the fittedresponse variable is a linear function of the explanatory variables.Part B: The variance of the error term is the same at each X value: var (,j) = F2å = constant for all xj.Illustration: If F2å = 2 at x = 1, then F2å = 2 at x = 3Jacob: Another textbook says that the variance of the error term is the same at each expected Y value.Rachel: E(Y) = " + \$ × X, so the expected Y values map one to one into the X values. The variance of theerror term is the same for all observations.Intuition: Generalized linear models differ from classical regression analysis several ways, including therelation of the variance of the response variable to its mean.! For generalized linear models, the variance of yj is often a function of íj.! For classical regression analysis, the variance of yj is independent of íj.Jacob: Is constant variance a reasonable assumption?Rachel: The assumption of constant variance may not reflect actual conditions, but it underlies the formulasfor ordinary least squares estimators.Illustration: Suppose we regress annual income of actuaries on the number of exams they have passed, with" = 30,000 and \$ = 10,000.! Students with no exams have average annual incomes of 30,000.! Average income increases \$10,000 an exam, so actuaries with nine exams have average incomes of120,000.This assumed relation is the population regression parameters: the relation of income to exams is linear.Actual salaries are not the same for all actuaries at a given exam level.! Students with no exams get about the same starting salary. The actual range is narrow, perhaps 28,000to 32,000.! Experienced actuaries vary: some are gifted statisticians or managers and earn 160,000, and some arepoor workers and earn 80,000.Even if the response variable is a linear function of the explanatory variable, the variance is rarely constant.Jacob: This illustration shows that omitted variables may be more important for actuaries with more exams.It does not relate the variance to the size of the explanatory variable.Rachel: GLMs show the relation of the variance of the response variable to its mean (fitted) value.The relation of the variance to the mean depends on the conditional distribution of the response variable.! General insurance claim counts have Poisson distributions or negative binomial distributions: the varianceis proportional to the mean.! General insurance claim severities have lognormal or Gamma distributions: the standard deviation isproportional to the mean.! Life insurance mortality rates have binomial distributions: the variance is proportional to B × (1 – B).Jacob: If the variance of the response variable depends on its mean, why not use that as our assumption?Rachel: In real statistical applications, we often do. We use weighted least squares estimators or GLMs (whichuse iterated weighted least squares estimators). The assumption of constant variance underlies the formulasfor ordinary least squares estimators.Part C: Normality: The error terms have a normal distribution with a mean of zero and a variance of F2å:,j - N(0, F2å)Jacob: Why not state these first three assumptions in the reverse order?! First, the error terms have a normal distribution. This seems most important." If the error terms have a Poisson distribution, Gamma distribution, or binomial distribution, theirvariance depends on the expected value of the response variable." If the error terms have a normal distribution, their variance may or may not be constant.! Second, the variance of the error term is the same at all points. This assumption underlies the formulasfor the ordinary least squares estimators." If the error terms have a normal distribution with constant variance, the least squares estimators arealso maximum likelihood estimators.! Third, the mean of the error term is zero. If it equals k . 0, add k to " to get a mean error term of zero.Rachel: On the contrary: the first three assumptions are listed in order of importance.! The first assumption is critical if the expected error term is not zero at each point, the response variableis not a linear function of the explanatory variables. The model is not correctly specified, so it is biased.Jacob: Is the model biased only if it is structural, not if it is empirical?Rachel: Fox says: Omission of explanatory variables causes bias only for structural relations, not for empiricalrelations. Model specifiication error causes bias for both types of relation.! The second assumption is also important: if the variance of the error term is not constant, we should givegreater weight to points with lower variance when fitting the regression line. The simple functional formsfor A and B (the ordinary least squares estimators for " and \$) would not be correct.! The third assumption is not that important. If the mean of the error term is zero and its variance isconstant, the regression line is still the most efficient estimator along all unbiased linear estimators evenif the distribution of the error terms is not normal.Jacob: If the assumption about a normal distribution is not important, why do we include it?Rachel: This assumption is underlies the formulas for the standard errors of the regression coefficients, thet-values, the p-values, and the confidence intervals.! If the error terms have a normal distribution with constant variance, we exact standard errors, t-values,p-values, and confidence intervals.! If the error terms do not have a normal distribution, these values are not exact. They are asymptoticallycorrect for large samples (under reasonable conditions), but they are not exact.! The standard errors, t-values, p-values, and confidence intervals are not exact, but they are reasonablygood if the variance of the error terms is constant. If this variance is not constant, we use weighted leastsquares or generalized linear models.Jacob: Is this assumption of normally distributed error terms generally true?Rachel: The central limit theorem implies that it is a good approximation if the explanatory variables are thesum of independent random variables of similar size.Jacob: Another textbook says that the Y values have a conditional normal distribution.Rachel: yj = " + \$1 xj1 + \$2 xj2 + … + \$k xjk + ,j. The only random variable on the right side of the equation is ,j,so if ,j has a normal distribution, yj has a normal distribution. Fox says: “Equivalently, the conditionaldistribution of the response variable is normal: Yj - N(" + \$xj, F2å).”Jacob: How does a conditional distribution differ from a distribution?Rachel: The term distribution (or probability distribution) has two meanings. The regression analysis formulasuse both meanings, which is confusing to some candidates.The sample distribution is the distribution of the observed values in the sample. The values of F2(x) and F2(y)are the sample distributions of the observed x- and y-values.The conditional distribution is the distribution of the random variable about its expected value. The explanatoryvariable is chosen by the statistician (in experimental studies) or observed without error in observationalstudies, so it has no conditional distribution. The response variable is a random variable. It has a conditionaldistribution at each point.Jacob: How do these distributions differ for the error term?Rachel: The error terms are independent and identically distributed. Each error term is normally distributedwith a mean of zero and a variance of F2å, so the distribution of all error terms is also normally distributed witha mean of zero and a variance of F2å.Take heed: This is the distribution of the population of error terms. The sample distribution of the observedvalues does not have a mean of zero and a variance of F2å (unless N ÿ 4). The sample distribution of theresiduals has a mean of zero (by construction) and a variance whose expected value is F2å but whose actualvalue may differ.Jacob: How do these distributions differ for the response variable?Rachel: The expected value of yj is " + \$ × xj. The sample distribution of the y-values does not have a normaldistribution. Each yj is " + \$ × xj + ,j, so each response variable has a normal distribution with a mean of itsexpected value and a variance of F2å.Independence of error terms vs time seriesPart D: Independence: the error terms are independent: ,j, ,k are independent for j . kJacob: Independence seems generally true. If we regress home prices on home sizes, why would the errorterm for observation j+1 depend on error term for observation j?Rachel: We have two statistics on-line courses: one for regression analysis and one for time series. In a timeseries, random fluctuations persist at least one more period. The formulas for confidence intervals of forecastsdiffer from the formulas in the regression analysis course.Jacob: Why does a regression where the response variable is a time series create problems?Rachel: If the response variable is a time series, the expected value of the error term is not zero at each point,and the error term at point j+1 depends on the error term at point j. Suppose we regress the daily temperatureon explanatory variables like cloud cover, amount of smog, carbon dioxide emissions, and humidity. Theseexplanatory variables may affect the daily temperature, but we have omitted the effect of the previous day'stemperature. The least squares estimate for the variance of the error term is too large.Jacob: A time series looks at values over time. What if all data points are from the same time, such as dailytemperatures from the same day at different places?Rachel: A time series can be temporal or spacial. The daily temperature is similar in adjacent cities and lesssimilar at the distance between the cities increases.Illustration: Home prices may be a linear function of plot size and the number of rooms. But home prices haveboth spacial and temporal relations as well.! Homes in adjacent locations have more similar prices than homes in different states or countries.! Home prices for all properties fluctuate over time with interest rates, recessions, and tax rates.Illustration: Suppose we measure daily temperature in a city on the equator. The expected daily temperatureis 80E every day of the year, with on seasonal fluctuations, but random fluctuations cause the temperature tovary from 60E to 100E. Daily temperature is a time series:.if the temperature is high one day, it will probablybe high the next day as well, since the high temperature reflects weather patterns that persist for several days.The Fox textbook says: “The assumption of independence needs to be justified by the procedures of datacollection. For example, if the data constitute a simple random sample drawn from a large population, thenthe assumption of independence will be met to a close approximation. In contrast, if the data comprise a timeseries, then the assumption of independence may be very wrong.”Jacob: Another textbook says that any two Y values are independent: each yj is normally distributed andindependent. Does this mean! that Yj - Yk is independent of j - k (that is, Yj+1 is independent of Yj), or! that Yj - íj is independent of Yk - ík?Rachel: yj = " + \$1 xj1 + \$2 xj2 + … + \$k xjk + ,j. The only random variable on the right side of the equation is ,j,and íj = " + \$1 xj1 + \$2 xj2 + … + \$k xjk, so if ,j and ,k are independent, yj - íj and yk - ík are independent.The Fox textbook says: “Any pair of errors ,i and ,j (or, equivalently, conditional response variables Yi and Yj)are independent for i . j.Explanatory variables: fixed or measured without errorPart E: The explanatory variables (X values) are fixed.! In experimental studies, the statistician picks explanatory variables and observes the response variable.! In observational studies, the explanatory variables are measured without error.Jacob: Are actuarial pricing studies experimental studies?! To see how sex affects claim frequency, one picks 100 male drivers and 100 female drivers.! To see how age affects claim frequency, one picks 100 youthful drivers and 100 adult drivers.Rachel: Actuarial pricing studies are observational studies, not experimental studies. The actuary observesmale drivers and female drivers; the actual does not create male drivers and female drivers. Sex and age areattributes of the driver, not interventions.Jacob: What is an intervention?Rachel: In experimental studies, the researcher creates the explanatory variable. A clinical trial for a new druguses 100 randomly selected subjects show are given the drug and 100 randomly selected subjects are arenot given the drug (or given a placebo). Any subject may be shifted between the two groups (control groupvs treatment group). In contrast, a driver can not be shifted between male and female drivers by changing sex.Jacob: Why is the difference between attributes and interventions important?Rachel: Attributes are often correlated with other (omitted) explanatory variables. Attributes may create biasesin structural relations; interventions are less likely to create biases if they are truly random.Illustration: An actuary regresses motor insurance claim frequency on urban vs rural territories and concludesthat urban drivers have higher claim frequency than rural drivers. But territory is not the cause, so the studyis biased. The actual causes of the territorial difference are the attributes of residents of cities vs rural areas.Illustration: Duncan’s Canadian occupational prestige is an observational study, as are most social scienceand actuarial studies.Observational studiesIn observational studies, the explanatory variables values are observed, not fixed by design.Jacob: Are the observed x-values a random sample?Rachel: The term random sample has several meanings. An actuary comparing urban vs rural drivers mayuse all drivers in the insurer’s data base or a random sample of those drivers. But this sample (or population)is not random. The insurer may write business in states or countries where urban drives are poorer than ruralor suburban drivers. The regression analysis may not apply to places where urban drivers are richer than ruralor suburban drivers. In some European and Asian countries, urban residents are wealthier than suburban orrural residents.The statistical term randomized means that an intervention is applied randomly to subjects. The subjects haveno other differences than the random application of the intervention.Illustration: Patients are randomly assigned the new vs the old medication. Even the doctors and nurses donot know which patients receive which medication.Accurate measurementIn observational studies, explanatory variables are measured without error and are independent of the errors.If the explanatory variables are random, or if we do not measure them precisely, our estimates have anothersource of random fluctuation.Illustration: A statistician regresses home values on area of the house or the lot. The regression depends onaccurate measurement of the area. We assume the area is measured without errors.Jacob: Is this assumption reasonable?Rachel: It is often reasonable. Life insurance mortality rates and personal auto claim frequencies depend onthe age and sex of the policyholder (or driver). Age and sex are not random variables, and we measure themwithout random errors. Age and sex are attributes, not interventions, so they are not independent of the causalvariables affecting mortality rates and claim frequency, but they are measured without error,.Part F: X is not invariant: the data points have at least two X values. If the X value is the same for all datapoints, the least squares estimators are " = ¯y (the mean of the y-values).Jacob: Do we infer that \$ = 0 if all the x-values are the same?Rachel: We can not infer anything about \$, since no data show how a change in x affects y.Jacob: Does invariance that mean that no X values should be the same?Rachel: On the contrary, many X values are the same in regression analyses. Suppose we regress personalauto claim frequency on sex and age of the driver and the territory in which the car is garaged. Sex has twovalues, age may be one of three values (youthful, adult, retired), and the state may have ten territories. If theregression uses 100,000 cars, many sets of X values are identical.** Exercise 11.2: Expected valuesWhich of the following statements stem from the assumptions of classical regression analysis?xj is the value of the explanatory variable at point j.,j is the value of the error term at point j.yj is the observed value of the explanatory variable at point j.íj is the fitted value of the explanatory variable at point j.A. The correlation of xj with ,j is zero.B. The correlation of íj with ,j is zero.C. The correlation of yj with ,j is zero.Part A: Classical regression analysis assumes the error term is independent of the explanatory variables.Part B: í = " + \$ × x, so the correlation of í with , is D(,, " + \$ × x) = D(,, ") + \$ × D (,, x) = 0 + \$ × 0 = 0.Part C: y = í + ,, so D(y, ,) . 0. When , > 0, y > í, and when , < 0, y < í. We state two implications:! For any given observation j, ,j and yj are random variables that are perfectly correlated.! For all observations combined, ,j and yj are random variables that are positively correlated.** Exercise 11.3: Expected valuesWhich of the following statements stem from the assumptions of classical regression analysis?xj is the value of the explanatory variable at point j.,j is the value of the error term at point j.yj is the observed value of the explanatory variable at point j.íj is the fitted value of the explanatory variable at point j.A. The expected value of xj × ,j is zero.B. The expected value of yj × ,j is zero.C. The expected value of íj × ,j is zero.Solution 11.3: The solution to this exercise follows from the previous exercise.Part A: D(xj, ,j) = 0, so E(xj × ,j) = 0Part B: D(yj, ,j) > 0, so E(yj × ,j) . 0Part C: D(íj, ,j) = 0, so E(íj × ,j) = 0** Exercise 11.4: Least-squares coefficientsA regression model with N observations and k explanatory variables isyj = " + \$1 xj1 + \$2 xj2 + … + \$k xjk + ,j.Under the assumptions of classical regression analysis, the least-squares coefficients have five attributes.Explain each of these attributes.A. Linear functions of the dataB. Unbiased estimators of the population regression coefficientsC. The most efficient unbiased linear estimators of the population regression coefficientsD. The same as the maximum likelihood estimatorsE. Normally distributedPart A: The least squares coefficients are linear functions of the observed data.Jacob: How are they linear functions? B = 3(xi – ¯x)(yi – ¯y) / 3(xi – ¯x)2. This has terms of xi yi in the numeratorand x2i in the denominator. A = ¯y – B × ¯x, so A is also a function of both the x and y values.Rachel: The observed data are the response variable (the Y values), not the explanatory variables (X values).! The explanatory variable is fixed, not a random variable.! The error term is a random variable is a fixed (but usually unknown) variance F2å.! " and \$ are unknown population regression parameters, not random variables which we estimate.! The response variable yj is a random variable with" a mean of " + \$ × xj" a variance of F2åThe least squares estimator B is a linear function of the yj.! If we repeat the regression analysis, B will differ from the first analysis, since the y-values differ.! The x-values do not differ in the two experimental studies.! If we repeat the regression analysis many times, the mean B values approaches \$.The xj are the coefficients of the linear function of the y-values.Jacob: Must we know these coefficients for the on-line course?Rachel: The fitted values íj are A + B xj. A and B are linear functions of the observed y-values, so each í valueis also a linear function of all the observed y-values. The module on hat-values gives the coefficients of thislinear function.Jacob: Most regression analysis in the social sciences (and in actuarial work) are observational studies, notexperimental studies, where the x-values are chosen from observed values. Aren’t they random variables?Rachel: Even for observational studies, where X values are observed (not fixed by the statistician), the valuesare measured without error. They are not random variables, since they have no standard error.Illustration: An actuary regresses life insurance mortality rates on sex and age. The observed mortality rateis a random variable (with a binomial distribution). Sex and age may be sampled from the insurer’s data, butthey are fixed quantities with no measurement error.Illustration: A real estate broker regresses home prices on the number of rooms and the size of the home. Thehome price is a random variable, which may fluctuate from day to day, depending on the offers by potentialbuyers. The number of rooms and the size of the home are depend on the homes available for sale but theyare fixed quantities with no measurement error.Jacob: What difference does measurement error make?Rachel: Classical regression analysis assumes Yj = " + \$ Xj + ,j. If the Xj have measurement error, we areregressing Yj on (Xj + the measurement error). The inferences of the regression analysis, such as the varianceof the error terms, no longer hold.Part B: The least squares estimator are unbiased estimators of the population regression coefficients.Jacob: \$ is a fixed but unknown population regression parameter; B is a random variable.! If B = \$, then B is correct.! If B > \$, B is too high.! If B < \$, B is too low.If B > \$, doesn’t that mean that B is biased upward? If B < \$, doesn’t that mean that B is biased downward?We want the correct figures, not figures that are too high or too low.Rachel: B is a random variable, with errors that are normally distributed. B is never exactly equal to \$ becauseof random fluctuations, but the expected value of B is \$.Don’t confuse an over-estimate or under-estimate with a biased estimator.! An estimate may be too high or too low even if it is unbiased.! An estimator can be biased or unbiased.Illustration: Suppose the true linear relation ismotor insurance accident frequency = 0.0005% × distance driven each year (in miles or kilometers).A regression gives B = 0.0007% on one sample and 0.0004% on another sample. If we repeat the regressionan unlimited number of times, the average B should be 0.0005% (if the assumptions of classical regressionanalysis hold).Jacob: Aren’t most estimators unbiased? Why would anyone use a biased estimator?Rachel: Some estimators are unbiased, some are biased. Actuaries use biased estimators for many studies,since they may reduce the mean squared error.Illustration: An actuary examines claim sizes, where the size-of-loss distribution is lognormal. Some actuariesuse the median or a mean excluding the highest and lowest values to avoid distortion by random large losses.These estimators are useful, especially if the sample is small, but they are biased. The median and the exhigh-low mean understate the true mean of lognormal distribution.Part C: Suppose we have a random sample of ten numbers from a normal distribution with a mean of :. Weestimate the mean of the normal distribution three ways:! Method #1: The average of the highest and lowest numbers.! Method #2: The average of all numbers except the highest and lowest numbers.! Method #3: The average of all the numbers.If the numbers are normal distributed, all three methods are unbiased. We repeat the random sample 10,000times, getting three estimators for each sample. Each method gives 10,000 estimates.Let kBj be estimate #j for method #k. For example, 2B1,000 is the 1,000th estimate for method #2. If all methodsare unbiased, 31Bj / 10,000 = 32Bj / 10,000 = 33Bj / 10,000 = :.The squared error is (kBj – :)2, and the mean squared error is 3 (kBj – :)2 / 10,000. The three estimators donot have the same mean squared error. Rather3(1Bj – :)2 / 10,000 > 3(2Bj – :)2 / 10,000 > 3(3Bj – :)2 / 10,000.An estimator with a lower mean squared error is more efficient. The least squares estimators are the mostefficient linear estimators.Jacob: Perhaps one of the samples has an unusually high figure that occurs rarely. In this case, Method #2might have a lower mean squared error than Method #3.Rachel: You are right. Instead of 10,000, we should say N estimates. In the limit, as N ÿ 4,3(1Bj – :)2 / N > 3(2Bj – :)2 / N > 3(3Bj – :)2 / N.Jacob: Why is the word linear in the line: the least squares estimators are the most efficient linear estimators?Rachel: Other estimators, such as maximum likelihood estimators, may be more efficient, but they are notlinear functions of the observed values. (But see the answer to Part D.)Part D: The maximum likelihood estimators are the best estimators. Maximum likelihood is covered in latermodules, not here. If the classical regression assumptions are true, the least squares estimators are also themaximum likelihood estimators.Jacob: Part C says that non-linear estimators (like maximum likelihood estimators) may be more efficient thanordinary least squares estimators. Part D says that the maximum likelihood estimators are the best, and thatordinary least squares estimators are the same as the maximum likelihood estimators. Is this contradictory?Rachel: Part C assumes linearity, constant variance, and independence; it does not assume that the errorterms have a normal distribution. Part D assumes the error terms have a normal distribution. Fox says: whenthe error distribution is heavier tailed than normal, the least squares estimators may be much less efficientthan certain robust-regression estimators, which are not linear functions of the data.Part E: The least squares estimators are normally distributed.Jacob: An estimate is a scalar, like B = 1.183 or B = 25,000. B doesn’t have a distribution.Rachel: Suppose the true \$ is 100. We take 10,000 samples of 10 data points each, and for each sample wecompute B. The 10,000 B’s have a normal distribution with a mean of \$.Jacob: Why do we care about the distribution of the ordinary least squares estimators?Rachel: The standard errors of the estimators, t-values, p-values, and confidence intervals assume a normaldistribution for the estimators. If the true distribution is not normal, the estimator may still be unbiased, but theformulas to calculate standard errors of the estimators, t-values, p-values, and confidence intervals are notcorrect.** Exercise 11.5: Empirical association vs causal relationFox distinguishes between empirical association and causal relation.If an explanatory variable X2 is omitted from a regression equation, the incomplete regression equation (usingonly X1) is biased if the following three conditions are true. Explain each of them.A. The regression equation represents a causal relation (a structural model), not an empirical association.B. The omitted explanatory variable X2 is a cause of the response variable Y.C. The omitted explanatory variable X2 is correlated with the explanatory variable X1.Part A: Suppose the response variable is the annual trend in workers’ compensation loss costs, and the twoexplanatory variables are X1 = wage inflation and X2 = medical inflation. Workers’ compensation has twopieces: indemnity benefits, which increase with wage inflation, and medical benefits, which increase withmedical inflation. For simplicity, suppose indemnity and medical benefits are each 50% of total benefits, andno other items besides wage and medical inflation affect workers’ compensation loss costs.The proper regression equation is Y = 0.500 X1 + 0.500 X2 + ,jWage inflation and medical inflation are correlated, since both reflect monetary inflation. Suppose theiraverage values are 5% per annum and their correlation is 80%.An actuary uses the regression equation Y = " + \$1 X1 + ,jThe least squares estimators are A = 1% and B1 = 0.900.Question: Is this regression equation biased?Answer: If the regression equation represents an empirical association, it is correct. Wage inflation (X1) hasa 90% correlation with workers’ compensation loss costs: 50% through indemnity benefits and 50% × 80%= 40% through medical benefits. If the regression equation represents a causal relation, it is biased. If wageinflation increases 1% but medical inflation does not change, workers’ compensation loss costs increase0.5%, not 0.9%.Jacob: The population regression parameter " is zero, so shouldn’t A be zero as well?Rachel: Only 80% of medical inflation affects the regression equation with a \$1 parameter but no \$2 parameter.The average medical inflation is 5% per annum, so 20% × 5% = 1% is included in A.Part B: The second and third conditions are that! The omitted explanatory variable X2 is a cause of the response variable Y.! The omitted explanatory variable X2 is correlated with another explanatory variable X1.Question: Why does this say that X2 is a cause of the response variable Y but is just correlated with X1?Answer: Suppose we regress workers’ compensation loss cost trends on inflation. Assume (for simplicity) thatinflation causes the loss cost trend and it also determines the nominal interest rate. That is, 1% higher inflationcauses a 1% higher trend and 1% higher nominal interest rate. In this example, X1 = inflation, X2 = interestrate, and Y = loss cost trend. X2 is omitted from the regression equation, and it is correlated with Y and X1,but no bias exists, since interest rates do not cause loss cost trends.** Exercise 11.6: BiasA statistician regresses Y on explanatory variable X1 but does not use a second explanatory variable X2. Theregression line is Y = " + \$ X1 + ,Under what combination of the following conditions is the estimate of \$ biased?A. D(Y, X1) = 0B. D(Y, X2) = 0C. D(Y, X1) . 0D. D(Y, X2) . 0E. D(X1, X2) = 0F. D(X1, X2) . 0G. X1 has a causal effect on YH. X2 has a causal effect on YI. X1 does not have a causal effect on YJ. X2 does not have a causal effect on YK. The regression equation is structural (the explanatory variables cause the response variable)L. The regression equation is associative (explanatory variables are correlated with the response variable)Solution 11.6: F, H, and KSuppose higher money supply growth causes higher inflation, which causes higher nominal interest rates.Type of relation: An associative relation gives the empirical relation of X1 and Y. Ignoring other variables isnot relevant. Nominal interest rates are empirically related to the money supply growth rate, so a regressionon interest rates on money supply growth is fine. But inflation is the cause of higher nominal interest rates.The regression of nominal interest rates on the money supply growth rate is not a proper causal relation. Itis biased, since it ignores the effects on inflation.Causal effect: The nominal interest rate and the inflation rate are correlated. A structural regression of interestrates on the money supply growth rate is biased, since the omitted variable (inflation) is a cause of higherinterest rates. A structural regression of inflation rates on the money supply growth rate is not biased, sincethe omitted variable (interest rates) is not the cause of higher inflation rates.Correlation: A structural regression of interest rates on the money supply growth rate is biased, since theomitted variable (inflation) is correlated with the money supply growth rate.** Question 11.7: Observational studiesIn observational studies, the X values are sampled, not fixed by design. In these studies, which of the followingis an assumption of classical regression analysis?A. The explanatory variable is measured without error and is independent of the error.B. The explanatory variable and the error are independent and unbiased.C. The explanatory variable and the error are independent and efficient.D. The error is a linear function of the explanatory variable.E. The regression assumptions are not satisfied in observational studies.Answer 11.7: AJacob: What are observational studies vs experimental studies?Rachel: Suppose we want to assess the effects of diet and exercise on weight gain or loss.In an experimental study, we specify the diet and exercise regime for N persons. The first person might begiven a diet of 1,000 calories each of carbohydrates, fat, and proteins, with a 30 minute exercise regime. Thesecond might be given a diet of 1,500 calories each of carbohydrates and proteins (with no fats), with a 60minute exercise regime. For each person, the explanatory variables are fixed by the research study. We thenobserve the weight gain or loss during the period.In an observational study, we observe N persons. For each person, we record the diet and the exercise eachday. We then observe the weight gain or loss during the period.Jacob: What is the advantage of observational studies?Rachel: It is relatively easy to record the diet and exercise of each person. Each person records all food andexercise in a diary. It is not easy to specify what each person will eat every day, or what exercise each personwill do.Jacob: What are the drawbacks of observational studies?Rachel: The drawbacks are measurement error, spread of the explanatory variables, and bias.Measurement error: People tend to mis-estimate or ignore personal data. In a study of weight gain, a personmay say he had a small slice of pizza instead of a large one or may ignore a ice cream sundae.Spread: Most people have similar percentages of carbohydrates, fats, and protein. Observational studies withlow variances of the explanatory variables have high standard errors of the regression coefficients.Bias: People who eat more are often larger, so they may gain more weight. To hold constant other explanatoryvariables, experimental studies randomly assign diets to a sample of people.Jacob: For experimental studies, are the X values random variables?Rachel: For experimental studies, the X values are fixed by design. Attachments Fox Module 11 Basic regression stisticall inference practice problems df.pdf (926 views, 153.00 KB)
##### Merge Selected
Merge into selected topic...

Merge into merge target...

Merge into a specific topic ID...