
Neural Networks
Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.[1] The neural network itself is not an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs.[2] Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the results to identify cats in other images. They do this without any prior knowledge about cats, for example, that they have fur, tails, whiskers and cat-like faces. Instead, they automatically generate identifying characteristics from the learning material that they process. HPNEURAL is the procedure that runs neural network algorithms on SAS. The HPNEURAL procedure is a high-performance procedure that trains a multilayer perceptron neural network.
Example: Customer Churn
Description: Customer retention is a challenge in the ultracompetitive mobile phone industry. A mobile phone company is studying factors related to customer churn, a term used for customers who have moved to an alternative service provider. In particular, the company would like to build a model to predict which customers are most likely to move their service to a competitor. This knowledge will be used to identify customers for targeted interventions, with the ultimate goal of reducing churn.
customerid: Customer ID.
gender: Male or Female.
senior: Senior citizen or not (1/0).
partner: Living with a partner or not (Yes/No).
dependents: Has dependents living with customer (Yes/No).
tenure: Account length of the customer in months.
phoneservice: Customer has phone service or not.
multiplelines: No phone/No/Yes
internet: Internet service type: DSL/Fiber optic/No.
onlinesecurity: No internet/No/Yes.
onlinebackup: No internet/No/Yes.
deviceprotection: No internet/No/Yes.
techsupport: No internet/No/Yes.
streamingtv: No internet/No/Yes.
streamingmovies: No internet/No/Yes.
contract: Month-to-month/One year/Two year.
paperlessbilling: Yes/No.
paymentmethod: Electronic check/Mailed check/Bank transfer/Credit card.
monthlycharges: Average monthly charge in dollars.
totalcharges: Cumulative charges in dollars.
churn: Yes/No.
Source: https://www.ibm.com/communities/analytics/watson-analytics-blog/predictive-insights-in-the-telco-customer-churn-data-set/
Download the data from here
Task: Design a model for estimating customer churn.
Here's the code required to run a neural network on SAS along with the output:
PROC HPNEURAL DATA=tutorial.telco;
ARCHITECTURE MLP;
TARGET churn / LEVEL=NOM;
INPUT gender senior partner dependents phoneservice multiplelines internetservice onlinesecurity onlinebackup deviceprotection techsupport streamingtv streamingmovies contract paperlessbilling paymentmethod / LEVEL=NOM;
INPUT tenure monthlycharges totalcharges / LEVEL=INT;
HIDDEN 2;
TRAIN;
RUN;
Performance Information | |
---|---|
Execution Mode | Single-Machine |
Number of Threads | 4 |
Data Access Information | |||
---|---|---|---|
Data | Engine | Role | Path |
TUTORIAL.TELCO | V9 | Input | On Client |
Model Information | |
---|---|
Data Source | TUTORIAL.TELCO |
Architecture | MLP Direct |
Number of Input Variables | 19 |
Number of Hidden Layers | 1 |
Number of Hidden Neurons | 2 |
Number of Target Variables | 1 |
Number of Weights | 125 |
Optimization Technique | Limited Memory BFGS |
Number of Observations Read | 7043 |
---|---|
Number of Observations Used | 7032 |
Number Used for Training | 5275 |
Number Used for Validation | 1757 |
Fit Statistics Table | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
_NAME_ | Train: Number of Observations |
Valid: Number of Observations |
L1 Norm of Weights | Train: Average Error Function |
Valid: Average Error Function |
Train: Average Absolute Error |
Valid: Average Absolute Error |
Train: Maximum Absolute Error |
Valid: Maximum Absolute Error |
Train: Number of Wrong Classifications |
Valid: Number of Wrong Classifications |
Train: Misclassification Rate |
Valid: Misclassification Rate |
Churn | 5275 | 1757 | 19.114913 | 0.411453 | 0.421091 | 0.269520 | 0.268620 | 0.997001 | 0.998081 | 1020 | 346 | 0.1934 | 0.1969 |
Misclassification Table for Churn |
||
---|---|---|
Class: | YES | NO |
YES | 256 | 203 |
NO | 143 | 1155 |
Training Table | |||||
---|---|---|---|---|---|
Try | Iterations | Avg Training Error | Avg Validation Error | Reason for Stopping | Best? |
1 | 50 | 0.409130 | 0.422848 | MAXITER | |
2 | 50 | 0.411453 | 0.421091 | MAXITER | Y |
3 | 50 | 0.412209 | 0.422835 | MAXITER | |
4 | 50 | 0.410603 | 0.421930 | MAXITER | |
5 | 50 | 0.412224 | 0.422865 | MAXITER |
Let's analyze each statement one by one:
-
ARCHITECTURE statement specifies the architecture of the neural network to be trained. Here are the options:
-
LOGISTIC specifies a multilayer perceptron that has no hidden units (which is equivalent to a logistic regression). We cannot specify HIDDEN layers if we specify this option. Here is the misclassification rate result if we use this option for the current data set.
Misclassification Table
for Churn (Rate=0.192)Class: YES NO YES 260 199 NO 139 1159 -
MLP specifies a multilayer perceptron that has one or more hidden layers. This is the default. Here is the misclassification rate result if we use this option with 2 hidden layers for the current data set.
Misclassification Table
for Churn (Rate=0.193)Class: YES NO YES 243 216 NO 123 1175 -
MLP DIRECT specifies a multilayer perceptron that has one or more hidden layers and has direct connections between each input and each target neuron. Here is the misclassification rate result if we use this option with 2 hidden layers for the current data set.
Misclassification Table
for Churn (Rate=0.197)Class: YES NO YES 256 203 NO 143 1155
-
LOGISTIC specifies a multilayer perceptron that has no hidden units (which is equivalent to a logistic regression). We cannot specify HIDDEN layers if we specify this option. Here is the misclassification rate result if we use this option for the current data set.
-
TARGET and INPUT specify the variable types. Different from HPFOREST (and I don't know why), you need to specify the level of the variable as NOM or INT in short form (vs NOMINAL and INTERVAL in HPFOREST).
-
HIDDEN statement specifies the number of hidden layers in the network and the number of neurons in each hidden layer. The first HIDDEN statement specifies the number of hidden neurons in the first hidden layer. The second HIDDEN statement specifies the number of hidden neurons in the second hidden layer, and so on. A maximum of 100 HIDDEN statements are allowed. You must specify number, and it must be an integer greater than or equal to 1. Here's the effect of number of hidden layers for the first layer on the misclassification rate (for this case):
Optimum number seems to be located around 20 though the effect is not very significant.
-
TRAIN statement causes the HPNEURAL procedure to use the training data that are specified in the PROC HPNEURAL statement to train a neural network model whose structure is specified in the ARCHITECTURE, INPUT, TARGET, and HIDDEN statements. Following options are available:
TRAIN NUMTRIES=5 MAXITER=50 OUTMODEL=model_telco;
MAXITER specifies the maximum number of iterations (weight adjustments) for the optimizer to make before terminating. When you are training using large data sets, you can do a training run with MAXITER=1 to determine approximately how long each iteration will take. The default is 50. (You can see from the output table "Training Table" that iterations stopped at 50 without necessarily reaching the minimum. For this case, I tried MAXITER=1000 but it stopped at 104 and there wasn't a noticeable improvement.)
OUTMODEL specifies the data set to which to save the model parameters for the trained network. These parameters include the network architecture, input and target variable names and types, and trained weights. You can use the model data set later to score a different input data set as long as the variable names and types of the variables in the new input data set match those of the training data set. For example, if we have new rows available for telco data set we can use the previous model we developed to utilize:
PROC HPNEURAL DATA=telco_new;
SCORE MODEL=model_telco OUT=score_telconew;
RUN;
Leave a Comment