Structural Equations

Structural equation modeling is an important statistical tool in social and behavioral sciences. Structural equations express relationships among a system of variables that can be either observed variables (manifest variables) or unobserved hypothetical variables (latent variables).

Infection risk after a surgery

Description: This imaginary dataset includes patients admitted to a hospital for inpatient or outpatient surgeries and whether their wound got infected or not during treatment.

infected: Yes/No.
size: Size of the incision in cm.
admission: O(outpatient) or I(inpatient).
surcomp: Whether there was a surgical complication or not (Yes/No).
antibio: Whether patient received antibiotics (Yes/No).

Source: Download the data from here

Task: Develop a model that explains the relationship between infection and other variables.

While this seems to be a typical logistic regression case a few details make it difficult to accurately model this dataset. Here is the problem: we know (in advance) that being outpatient increases the risk of infection and so do larger incision sizes, presence of surgical complications and lack of an antibiotic treatment. However we also know that larger incision sizes and surgical complications cause patient to be admitted as inpatient instead. Furthermore most of the patients who get antibiotic treatment are actually inpatient. So how do we separate these effects? One option is to use structural equations, in particular path modeling. Take a look at the diagram below. size, antibio and surcomp affects infection both directly and indirectly (through admission).

To find out the size of these effects we use PROC CALIS. Following is what the code looks like:

PROC CALIS DATA = tutorial.infection;
PATH
     infection <- admission ,
     infection <- size ,
     infection <- antibio ,
     infection <- surcomp ,
     infection <- size ,
     infection <- antibio ,
     infection <- surcomp ;
EFFPART
     infection <- size antibio surcomp ;
RUN;
Standardized Results for PATH List
Path Parameter Estimate Standard
Error
t Value Pr > |t|
infection <=== admission _Parm1 0.39587 0.0000479 8265.6 <.0001
infection <=== size _Parm2 0.07147 0.00256 27.9619 <.0001
infection <=== antibio _Parm3 0.32031 0.0001332 2405.1 <.0001
infection <=== surcomp _Parm4 0.32031 0.0001332 2405.1 <.0001
admission <=== size _Parm5 0.09027 0.00324 27.8676 <.0001
admission <=== antibio _Parm6 0.40456 0.0001193 3392.1 <.0001
admission <=== surcomp _Parm7 0.40456 0.0001193 3392.1 <.0001
Standardized Results for Variance Parameters
Variance
Type
Variable Parameter Estimate Standard
Error
t Value Pr > |t|
Exogenous size _Add1 1.00000      
  antibio _Add2 1.00000      
  surcomp _Add3 1.00000      
Error infection _Add4 0.41038 0.0003413 1202.5 <.0001
  admission _Add5 0.65469 0.0003860 1696.0 <.0001
Standardized Results for Covariances Among Exogenous Variables
Var1 Var2 Parameter Estimate Standard
Error
t Value Pr > |t|
size antibio _Add6 0.04482 0.00162 27.6405 <.0001
size surcomp _Add7 0.04482 0.00162 27.6405 <.0001
surcomp antibio _Add8 0.01000 0 Infty .
Standardized Effects on infection
Effect / Std Error / t Value / p Value
  Total Direct Indirect
size
0.1072
0.003834
27.9619
<.0001
0.0715
0.002556
27.9619
<.0001
0.0357
0.001278
27.9619
<.0001
antibio
0.4805
0.000200
2405
<.0001
0.3203
0.000133
2405
<.0001
0.1602
0.0000666
2405
<.0001
surcomp
0.4805
0.000200
2405
<.0001
0.3203
0.000133
2405
<.0001
0.1602
0.0000666
2405
<.0001

Now we can fill out our path diagram: