Neural Networks with SAS PROC HPNEURAL

Neural Networks

Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.[1] The neural network itself is not an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs.[2] Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the results to identify cats in other images. They do this without any prior knowledge about cats, for example, that they have fur, tails, whiskers and cat-like faces. Instead, they automatically generate identifying characteristics from the learning material that they process. HPNEURAL is the procedure that runs neural network algorithms on SAS. The HPNEURAL procedure is a high-performance procedure that trains a multilayer perceptron neural network.

Example: Customer Churn

Description: Customer retention is a challenge in the ultracompetitive mobile phone industry. A mobile phone company is studying factors related to customer churn, a term used for customers who have moved to an alternative service provider. In particular, the company would like to build a model to predict which customers are most likely to move their service to a competitor. This knowledge will be used to identify customers for targeted interventions, with the ultimate goal of reducing churn.

customerid: Customer ID.
gender: Male or Female.
senior: Senior citizen or not (1/0).
partner: Living with a partner or not (Yes/No).
dependents: Has dependents living with customer (Yes/No).
tenure: Account length of the customer in months.
phoneservice: Customer has phone service or not.
multiplelines: No phone/No/Yes
internet: Internet service type: DSL/Fiber optic/No.
onlinesecurity: No internet/No/Yes.
onlinebackup: No internet/No/Yes.
deviceprotection: No internet/No/Yes.
techsupport: No internet/No/Yes.
streamingtv: No internet/No/Yes.
streamingmovies: No internet/No/Yes.
contract: Month-to-month/One year/Two year.
paperlessbilling: Yes/No.
paymentmethod: Electronic check/Mailed check/Bank transfer/Credit card.
monthlycharges: Average monthly charge in dollars.
totalcharges: Cumulative charges in dollars.
churn: Yes/No.

Source: https://www.ibm.com/communities/analytics/watson-analytics-blog/predictive-insights-in-the-telco-customer-churn-data-set/
Download the data from here

Task: Design a model for estimating customer churn.

Here's the code required to run a neural network on SAS along with the output:

							
								PROC HPNEURAL DATA=tutorial.telco; 

								ARCHITECTURE MLP;

								TARGET churn / LEVEL=NOM;

								INPUT gender senior partner dependents phoneservice multiplelines internetservice onlinesecurity onlinebackup deviceprotection techsupport streamingtv streamingmovies contract paperlessbilling paymentmethod / LEVEL=NOM;

								INPUT tenure monthlycharges totalcharges / LEVEL=INT;

								HIDDEN 2;

								TRAIN;

								RUN;

Performance Information
Execution Mode	Single-Machine
Number of Threads	4

Data Access Information
Data	Engine	Role	Path
TUTORIAL.TELCO	V9	Input	On Client

Model Information
Data Source	TUTORIAL.TELCO
Architecture	MLP Direct
Number of Input Variables	19
Number of Hidden Layers	1
Number of Hidden Neurons	2
Number of Target Variables	1
Number of Weights	125
Optimization Technique	Limited Memory BFGS

Number of Observations Read	7043
Number of Observations Used	7032
Number Used for Training	5275
Number Used for Validation	1757

Fit Statistics Table
_NAME_	Train: Number of Observations	Valid: Number of Observations	L1 Norm of Weights	Train: Average Error Function	Valid: Average Error Function	Train: Average Absolute Error	Valid: Average Absolute Error	Train: Maximum Absolute Error	Valid: Maximum Absolute Error	Train: Number of Wrong Classifications	Valid: Number of Wrong Classifications	Train: Misclassification Rate	Valid: Misclassification Rate
Churn	5275	1757	19.114913	0.411453	0.421091	0.269520	0.268620	0.997001	0.998081	1020	346	0.1934	0.1969

Misclassification Table for Churn
Class:	YES	NO
YES	256	203
NO	143	1155

Training Table
Try	Iterations	Avg Training Error	Avg Validation Error	Reason for Stopping	Best?
1	50	0.409130	0.422848	MAXITER
2	50	0.411453	0.421091	MAXITER	Y
3	50	0.412209	0.422835	MAXITER
4	50	0.410603	0.421930	MAXITER
5	50	0.412224	0.422865	MAXITER

Let's analyze each statement one by one:

ARCHITECTURE statement specifies the architecture of the neural network to be trained. Here are the options:

LOGISTIC specifies a multilayer perceptron that has no hidden units (which is equivalent to a logistic regression). We cannot specify HIDDEN layers if we specify this option. Here is the misclassification rate result if we use this option for the current data set.

Misclassification Table for Churn (Rate=0.192)
Class:	YES	NO
YES	260	199
NO	139	1159

MLP specifies a multilayer perceptron that has one or more hidden layers. This is the default. Here is the misclassification rate result if we use this option with 2 hidden layers for the current data set.

Misclassification Table for Churn (Rate=0.193)
Class:	YES	NO
YES	243	216
NO	123	1175

MLP DIRECT specifies a multilayer perceptron that has one or more hidden layers and has direct connections between each input and each target neuron. Here is the misclassification rate result if we use this option with 2 hidden layers for the current data set.

Misclassification Table for Churn (Rate=0.197)
Class:	YES	NO
YES	256	203
NO	143	1155

Clearly architecture selection does not have a significant effect on the outcome for this case.

TARGET and INPUT specify the variable types. Different from HPFOREST (and I don't know why), you need to specify the level of the variable as NOM or INT in short form (vs NOMINAL and INTERVAL in HPFOREST).
HIDDEN statement specifies the number of hidden layers in the network and the number of neurons in each hidden layer. The first HIDDEN statement specifies the number of hidden neurons in the first hidden layer. The second HIDDEN statement specifies the number of hidden neurons in the second hidden layer, and so on. A maximum of 100 HIDDEN statements are allowed. You must specify number, and it must be an integer greater than or equal to 1. Here's the effect of number of hidden layers for the first layer on the misclassification rate (for this case):

Optimum number seems to be located around 20 though the effect is not very significant.
TRAIN statement causes the HPNEURAL procedure to use the training data that are specified in the PROC HPNEURAL statement to train a neural network model whose structure is specified in the ARCHITECTURE, INPUT, TARGET, and HIDDEN statements. Following options are available:

TRAIN NUMTRIES=5 MAXITER=50 OUTMODEL=model_telco;
NUMTRIES specifies the number of times the network is to be trained using a different starting points. Specifying this option helps ensure that the optimizer finds the set of weights that truly minimizes the objective function and does not return a local minimum. The value of number must be an integer between 1 and 99,999. The default is 5.
MAXITER specifies the maximum number of iterations (weight adjustments) for the optimizer to make before terminating. When you are training using large data sets, you can do a training run with MAXITER=1 to determine approximately how long each iteration will take. The default is 50. (You can see from the output table "Training Table" that iterations stopped at 50 without necessarily reaching the minimum. For this case, I tried MAXITER=1000 but it stopped at 104 and there wasn't a noticeable improvement.)
OUTMODEL specifies the data set to which to save the model parameters for the trained network. These parameters include the network architecture, input and target variable names and types, and trained weights. You can use the model data set later to score a different input data set as long as the variable names and types of the variables in the new input data set match those of the training data set. For example, if we have new rows available for telco data set we can use the previous model we developed to utilize:

PROC HPNEURAL DATA=telco_new; SCORE MODEL=model_telco OUT=score_telconew; RUN;