## RA module 21: Structure of GLMs practice problems

 Author Message NEAS Supreme Being         Group: Administrators Posts: 4.2K, Visits: 1.2K RA module 21: Structure of GLMs practice problems  (The attached PDF file has better formatting.)  Fox Regression analysis Chapter 15 Structure of Generalized linear models  ** Exercise 21.1: Components of generalized linear models  A generalized linear model has three components: a linear predictor, a link function, and a random component (a conditional distribution of the response variable). A.    What is a linear predictor?B.    What is a link function?C.    What is the random component?D.    For classical regression analysis, what are these three elements? Part A: The linear predictor is a linear function of regressors, çj = á + â1 X1j + â2 X2j + … + âk Xkj. çj is a function of the fitted value, not necessarily the fitted value itself. Part B: The link function is smooth and invertible linearizing function g(⋅), which transforms the expectation of the response variable, ìj = E(Yj), to the linear predictor g(ìj) = çj.  Illustration: For a log-link function, if the linear predictor çj = á + â1 X1j + â2 X2j + … + âk Xkj = 2, the fitted value is e2 = 7.389, since ln(7.389) = 2.  Part C: The random component specifies the conditional distribution of the response variable Yj (for the jth of n independently sampled observations), given the values of the explanatory variables in the model. Illustration: For Poissson GLM with a log-link function, if the linear predictor çj = 2, Yj has a Poisson distribution with a mean of e2 = 7.389.  Part D: For classical regression, the linear predictor is the same. The link function is the identity function: çj = ìj = E(Yj).The random component is a normal distribution with the same variance at every point. Jacob: What types of link functions should we know? Rachel: Know the log link and logit link functions. Jacob: What types of conditional distributions should we know? Rachel: Know the Poisson, Gamma, and binomial distributions. Jacob: The textbook does not give formulas for solving GLMs. Do we have to solve GLMs for the final exam? Rachel: One can’t solve GLMs by pencil and paper. The final exam tests the GLM concepts in the practice problems on the discussion forum; it does not give data and ask for the GLM coefficients.    ** Exercise 21.2: Fitting generalized linear models   A.    How are generalized linear models fit to observed data?B.    How does this differ from classical regression analysis?  Part A: Fitting a distribution to observed values has two parts:          Choose the distribution, such as normal, Poisson, Gamma, binomial. This is the conditional distribution of the response variable.         Choose the parameters of the distribution to maximize the likelihood (the probability) of observing the empirical data.  Jacob: For classical regression analysis, do we choose a conditional distributions of the response variable?  Rachel: Yes, we choose a normal distribution with a constant variance.  Jacob: Are there statistical methods to choose the conditional distribution? Rachel: We use intuition and the relation of the variance to the mean. Intuition: For probabilities, we use a binomial distribution. For counts, we might use a Poisson distribution or a negative binomial distribution. For stock prices or claim severities, we might use a lognormal distribution or a Gamma distribution.  Part B: Classical regression analysis assumes the distribution is a normal distribution with the same variance at every point. With this assumption, maximizing the likelihood is the same as minimizing the squared error of the residuals. Ordinary least squares estimation for a normal distribution is maximum likelihood estimation.  Jacob: Do we maximize the likelihood or the loglikelihood?  Rachel: The loglikelihood is a monotonic function of the loglikelihood. If we have a points f(xj), where f is a function of x, the value xj which maximizes f(xj) is also the value which maximizes ln( f(xj) ). Jacob: Is this the same as minimizing the residual deviance? Rachel: The residual deviance is 2 × (K – loglikelihood (x1,j, x2,j, … , xn,j). The set of (x1,j, x2,j, … , xn,j) which maximize the likelihood also maximize the loglikelihood and minimize the residual deviance.  ** Exercise 21.3: Link function  An actuary uses a generalized linear model to relate the response variable Yj to two explanatory variables (covariates), X1 and X2.           Let ìj be the expected value for the response variable at observation j.         Let çj be the linear predictor at observation j  The intercept of the GLM is á, and the coefficients of X1 and X2 are â1 and â2.  A.    What is the linear predictor çj?B.    For a log-link function g(), what is the relation between ìj and the independent variables?C.    For a logit link function g(), what is the relation between ìj and the independent variables? Part A: The linear predictor at observation j = á + â1 X1j + â2 X2j. Jacob: GLMs differ from classical regression they are used for multiplicative models, probability models with dichotomous random variables, and models of skewed distributions. Yet this linear predictor is the same as the formula in classical regression analysis. Rachel: GLMs have three parts: linear predictor, link function, and conditional distribution of the response variable. The linear predictor is the same as for classical regression analysis.  Part B: g(ìj) = ln(ìj) = çj = á + â1 X1j + â2 X2j  Jacob: For classical regression analysis, do we say Yj = á + â1 X1j + â2 X2j ? Rachel: The observed value Yj is a random variable; it is not equal to an expression of scalars. For classical regression analysis, we write Yj = á + â1 X1j + â2 X2j + åj , where åj has a normal distribution with a mean of zero and the same variance for all values of the explanatory variables. GLMs use an identity link function for classical regression analysis, where g(x) = x: g(ìj) = ìj = á + â1 X1j + â2 X2j  Jacob: Why is the log-link function so often used in GLMs?  Rachel: Many relations are multiplicative models. For example, personal auto insurance premiums depend on driver characteristics (like male vs female) and territory (like urban vs rural). The insurance rates are a multiplicative model: the male rate may be twice the female rate and the urban rate may be three times the rural rate. The log-link function gives a multiplicative model:  ln(ìj) = çj = á + â1 X1j + â2 X2j ➾ ìj = exp(á + â1 X1j + â2 X2j) ➾ ìj = exp(á) × exp(â1 X1j) × exp(â2 X2j) Define new parameters:           á  = exp(á) = the base rate          â1  = exp(â1) = the male/female relativity          â2  = exp(â2) = the urban/rural relativity   Part C: g(ìj) = ln[ ìj / (1 – ìj) ] = çj = á + â1 X1j + â2 X2j  Jacob: What is the rationale for the logit link function? Log odds may be relevant to horse racing or Las Vegas casinos, but they have no intuitive relation to actuarial distributions.  Rachel: That is true, and the logit link function is not appropriate for all actuarial distributions. But the logit link function has the proper form; it converts a range from 0 to 1 to a range from –∞ to +∞.   ** Exercuse 21.4: Link function  An actuary uses a generalized linear model with a log-link function to relate the response variable Yj to two explanatory variables, X1 and X2. Let ìj be the expected value for the response variable at observation j. The intercept of the GLM is á, and the coefficients of X1 and X2 are â1 and â2.  A.    What is the relation of the explanatory variables to the response variable using the link function?B.    What is the relation of the explanatory variables to the response variable using the inverse of the link function? Part A: ln(ìj) = á + â1 X1 + â2 X2  Part B: ìj = exp(á + â1 X1 + â2 X2) Jacob: Why don’t we use     ln(Yj) = á + â1 X1 + â2 X2      and       Yj = exp(á + â1 X1 + â2 X2)? Rachel: Y is a random variable: the linear predictor adjusted by the inverse of the link function plus a random component.    ** Exercise 21.5: Likelihoods   A.    What is the range of a likelihood? B.    What is the range of a log-likelihood? C.    What is meant by a saturated model?D.    What is the relation of the likelihood for the saturated model vs any other model?E.    What is the relation of the log-likelihood for the saturated model vs any other model? Part A: Suppose Y is a function of X. As an example, let Y be the Poisson probability for X: y = μx e–μ/x!           pdf(x | ì) = (ìx e–ì) / x!         likelihood (ì | x) = (ìx e–ì) / x!   The pdf (probability density function) and the likelihood have a range of [0, 1].  Part B: The logarithm of 0 is –∞ and the logarithm of 1 is 0, so the range of the loglikelihood is (–∞, 0]. Part C: A saturated model has the fitted equal to the observed value at every point. If a regression equation or a GLM has N points, the saturated model has N parameters and 0 degrees of freedom.  Part D: The likelihood is greatest when ì = the observed value. For a given x, the value of (μx e–μ) / x! is maximized for ì = x. If Ls.is the likelihood for the saturated model and Lm is the likelihood for any other model (with the same type of conditional distribution for the response variable but different parameters), then 0 ≤ Lm ≤ Ls ≤ 1 Part E: If LLs.is the log-likelihood for the saturated model and LLm is the log-likelihood for any other model (with the same type of conditional distribution for the response variable but different parameters), then – ∞ ≤ LLm ≤ LLs ≤ 0 Note: This exercise uses ≤ (less than or equal to). Some exam problems use < (less than). Both relations are correct, as long as the model in question is not the saturated model.  Attachments Fox Regression analysis Chapter 15 Structure of Glms df.pdf (1.9K views, 76.00 KB)
##### Merge Selected
Merge into selected topic...

Merge into merge target...

Merge into a specific topic ID...