Generalized Linear Models

The GENMOD procedure fits generalized linear models, as defined by Nelder and Wedderburn (1972). The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal errors, logistic and probit models for binary data, and log-linear models for multinomial data. Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution.

Examples of Generalized Linear Models
Traditional Linear Model
  • response variable: a continuous variable
  • distribution: normal
  • link function: identity
Logistic Regression
  • response variable: a proportion
  • distribution: binary
  • link function: logit
Poisson Regression in Log-Linear Model
  • response variable: a count
  • distribution: Poisson
  • link function: log
Gamma Model with Log Link
  • response variable: a positive, continuous variable
  • distribution: gamma
  • link function: log
Example: Gas Mileage

Description: Gas mileage, horsepower, and other information for 392 vehicles.

mpg:miles per gallon.
cylinders:Number of cylinders between 4 and 8.
displacement:Engine displacement (cu. inches).
horsepower:Engine horsepower.
weight:Vehicle weight (lbs.).
acceleration:Time to accelerate from 0 to 60 mph (sec.).
year:Model year.
origin:Origin of car (1. American, 2. European, 3. Japanese).

Source: Develop a regression model to estimate mpg.
Download the data from here

Task: What are the parameters that affect quote outcome?

Let's start with the simplest PROC GENMOD statements and include all variables, except origin.

PROC GENMOD DATA= tutorial.auto;
MODEL mpg = cylinders displacement horsepower weight acceleration year;
RUN;
The GENMOD Procedure
Model Information
Data Set TUTORIAL.AUTO
Distribution Normal
Link Function Identity
Dependent Variable mpg
Number of Observations Read 392
Number of Observations Used 392
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 385 4543.3470 11.8009
Scaled Deviance 385 392.0000 1.0182
Pearson Chi-Square 385 4543.3470 11.8009
Scaled Pearson X2 385 392.0000 1.0182
Log Likelihood   -1036.4548  
Full Log Likelihood   -1036.4548  
AIC (smaller is better)   2088.9095  
AICC (smaller is better)   2089.2855  
BIC (smaller is better)   2120.6796  
Algorithm converged.
Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 -14.5353 4.7212 -23.7885 -5.2820 9.48 0.0021
cylinders 1 -0.3299 0.3291 -0.9749 0.3152 1.00 0.3162
displacement 1 0.0077 0.0073 -0.0066 0.0220 1.11 0.2923
horsepower 1 -0.0004 0.0137 -0.0273 0.0265 0.00 0.9772
weight 1 -0.0068 0.0007 -0.0081 -0.0055 104.71 <.0001
acceleration 1 0.0853 0.1011 -0.1129 0.2835 0.71 0.3991
year 1 0.7534 0.0521 0.6512 0.8556 208.72 <.0001
Scale 1 3.4044 0.1216 3.1743 3.6513    
Note: The scale parameter was estimated by maximum likelihood.