Generalized Linear Models
The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal errors, logistic and probit models for binary data, and log-linear models for multinomial data. Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution.
Examples of Generalized Linear Models
Traditional Linear Model
- response variable: a continuous variable
- distribution: normal
- link function: identity
Logistic Regression
- response variable: a proportion
- distribution: binary
- link function: logit
Poisson Regression in Log-Linear Model
- response variable: a count
- distribution: Poisson
- link function: log
Gamma Model with Log Link
- response variable: a positive, continuous variable
- distribution: gamma
- link function: log
Example: Gas Mileage
Description: Gas mileage, horsepower, and other information for 392 vehicles.
mpg:miles per gallon.
cylinders:Number of cylinders between 4 and 8.
displacement:Engine displacement (cu. inches).
horsepower:Engine horsepower.
weight:Vehicle weight (lbs.).
acceleration:Time to accelerate from 0 to 60 mph (sec.).
year:Model year.
origin:Origin of car (1. American, 2. European, 3. Japanese).
Source: Develop a regression model to estimate mpg.
Download the data from here
Task: What are the parameters that affect quote outcome?
Let's start with the simplest PROC GENMOD statements and include all variables, except origin.
PROC GENMOD DATA= tutorial.auto;
MODEL mpg = cylinders displacement horsepower weight acceleration year;
RUN;
Model Information | |
---|---|
Data Set | TUTORIAL.AUTO |
Distribution | Normal |
Link Function | Identity |
Dependent Variable | mpg |
Number of Observations Read | 392 |
---|---|
Number of Observations Used | 392 |
Criteria For Assessing Goodness Of Fit | |||
---|---|---|---|
Criterion | DF | Value | Value/DF |
Deviance | 385 | 4543.3470 | 11.8009 |
Scaled Deviance | 385 | 392.0000 | 1.0182 |
Pearson Chi-Square | 385 | 4543.3470 | 11.8009 |
Scaled Pearson X2 | 385 | 392.0000 | 1.0182 |
Log Likelihood | -1036.4548 | ||
Full Log Likelihood | -1036.4548 | ||
AIC (smaller is better) | 2088.9095 | ||
AICC (smaller is better) | 2089.2855 | ||
BIC (smaller is better) | 2120.6796 |
Algorithm converged. |
Analysis Of Maximum Likelihood Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
Wald 95% Confidence Limits | Wald Chi-Square | Pr > ChiSq | |
Intercept | 1 | -14.5353 | 4.7212 | -23.7885 | -5.2820 | 9.48 | 0.0021 |
cylinders | 1 | -0.3299 | 0.3291 | -0.9749 | 0.3152 | 1.00 | 0.3162 |
displacement | 1 | 0.0077 | 0.0073 | -0.0066 | 0.0220 | 1.11 | 0.2923 |
horsepower | 1 | -0.0004 | 0.0137 | -0.0273 | 0.0265 | 0.00 | 0.9772 |
weight | 1 | -0.0068 | 0.0007 | -0.0081 | -0.0055 | 104.71 | <.0001 |
acceleration | 1 | 0.0853 | 0.1011 | -0.1129 | 0.2835 | 0.71 | 0.3991 |
year | 1 | 0.7534 | 0.0521 | 0.6512 | 0.8556 | 208.72 | <.0001 |
Scale | 1 | 3.4044 | 0.1216 | 3.1743 | 3.6513 |
Note: | The scale parameter was estimated by maximum likelihood. |
Leave a Comment