Neural Networks

Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.[1] The neural network itself is not an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs.[2] Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the results to identify cats in other images. They do this without any prior knowledge about cats, for example, that they have fur, tails, whiskers and cat-like faces. Instead, they automatically generate identifying characteristics from the learning material that they process. HPNEURAL is the procedure that runs neural network algorithms on SAS. The HPNEURAL procedure is a high-performance procedure that trains a multilayer perceptron neural network.

Example: Customer Churn

Description: Customer retention is a challenge in the ultracompetitive mobile phone industry. A mobile phone company is studying factors related to customer churn, a term used for customers who have moved to an alternative service provider. In particular, the company would like to build a model to predict which customers are most likely to move their service to a competitor. This knowledge will be used to identify customers for targeted interventions, with the ultimate goal of reducing churn.

customerid: Customer ID.
gender: Male or Female.
senior: Senior citizen or not (1/0).
partner: Living with a partner or not (Yes/No).
dependents: Has dependents living with customer (Yes/No).
tenure: Account length of the customer in months.
phoneservice: Customer has phone service or not.
multiplelines: No phone/No/Yes
internet: Internet service type: DSL/Fiber optic/No.
onlinesecurity: No internet/No/Yes.
onlinebackup: No internet/No/Yes.
deviceprotection: No internet/No/Yes.
techsupport: No internet/No/Yes.
streamingtv: No internet/No/Yes.
streamingmovies: No internet/No/Yes.
contract: Month-to-month/One year/Two year.
paperlessbilling: Yes/No.
paymentmethod: Electronic check/Mailed check/Bank transfer/Credit card.
monthlycharges: Average monthly charge in dollars.
totalcharges: Cumulative charges in dollars.
churn: Yes/No.

Source: https://www.ibm.com/communities/analytics/watson-analytics-blog/predictive-insights-in-the-telco-customer-churn-data-set/
Download the data from here

Task: Design a model for estimating customer churn.

Here's the code required to run a neural network on SAS along with the output:

PROC HPNEURAL DATA=tutorial.telco;
ARCHITECTURE MLP;
TARGET churn / LEVEL=NOM;
INPUT gender senior partner dependents phoneservice multiplelines internetservice onlinesecurity onlinebackup deviceprotection techsupport streamingtv streamingmovies contract paperlessbilling paymentmethod / LEVEL=NOM;
INPUT tenure monthlycharges totalcharges / LEVEL=INT;
HIDDEN 2;
TRAIN;
RUN;
Performance Information
Execution Mode Single-Machine
Number of Threads 4
Data Access Information
Data Engine Role Path
TUTORIAL.TELCO V9 Input On Client
Model Information
Data Source TUTORIAL.TELCO
Architecture MLP Direct
Number of Input Variables 19
Number of Hidden Layers 1
Number of Hidden Neurons 2
Number of Target Variables 1
Number of Weights 125
Optimization Technique Limited Memory BFGS
Number of Observations Read 7043
Number of Observations Used 7032
Number Used for Training 5275
Number Used for Validation 1757
Fit Statistics Table
_NAME_ Train: Number
of Observations
Valid: Number
of Observations
L1 Norm of Weights Train: Average Error
Function
Valid: Average Error
Function
Train: Average Absolute
Error
Valid: Average Absolute
Error
Train: Maximum Absolute
Error
Valid: Maximum Absolute
Error
Train: Number
of Wrong Classifications
Valid: Number
of Wrong Classifications
Train: Misclassification
Rate
Valid: Misclassification
Rate
Churn 5275 1757 19.114913 0.411453 0.421091 0.269520 0.268620 0.997001 0.998081 1020 346 0.1934 0.1969
Misclassification Table
for Churn
Class: YES NO
YES 256 203
NO 143 1155
Training Table
Try Iterations Avg Training Error Avg Validation Error Reason for Stopping Best?
1 50 0.409130 0.422848 MAXITER  
2 50 0.411453 0.421091 MAXITER Y
3 50 0.412209 0.422835 MAXITER  
4 50 0.410603 0.421930 MAXITER  
5 50 0.412224 0.422865 MAXITER  

Let's analyze each statement one by one:

  • ARCHITECTURE statement specifies the architecture of the neural network to be trained. Here are the options:
    • LOGISTIC specifies a multilayer perceptron that has no hidden units (which is equivalent to a logistic regression). We cannot specify HIDDEN layers if we specify this option. Here is the misclassification rate result if we use this option for the current data set.

      Misclassification Table
      for Churn (Rate=0.192)
      Class: YES NO
      YES 260 199
      NO 139 1159
    • MLP specifies a multilayer perceptron that has one or more hidden layers. This is the default. Here is the misclassification rate result if we use this option with 2 hidden layers for the current data set.

      Misclassification Table
      for Churn (Rate=0.193)
      Class: YES NO
      YES 243 216
      NO 123 1175
    • MLP DIRECT specifies a multilayer perceptron that has one or more hidden layers and has direct connections between each input and each target neuron. Here is the misclassification rate result if we use this option with 2 hidden layers for the current data set.

      Misclassification Table
      for Churn (Rate=0.197)
      Class: YES NO
      YES 256 203
      NO 143 1155
    Clearly architecture selection does not have a significant effect on the outcome for this case.

  • TARGET and INPUT specify the variable types. Different from HPFOREST (and I don't know why), you need to specify the level of the variable as NOM or INT in short form (vs NOMINAL and INTERVAL in HPFOREST).

  • HIDDEN statement specifies the number of hidden layers in the network and the number of neurons in each hidden layer. The first HIDDEN statement specifies the number of hidden neurons in the first hidden layer. The second HIDDEN statement specifies the number of hidden neurons in the second hidden layer, and so on. A maximum of 100 HIDDEN statements are allowed. You must specify number, and it must be an integer greater than or equal to 1. Here's the effect of number of hidden layers for the first layer on the misclassification rate (for this case):

    Optimum number seems to be located around 20 though the effect is not very significant.

  • TRAIN statement causes the HPNEURAL procedure to use the training data that are specified in the PROC HPNEURAL statement to train a neural network model whose structure is specified in the ARCHITECTURE, INPUT, TARGET, and HIDDEN statements. Following options are available:

    TRAIN NUMTRIES=5 MAXITER=50 OUTMODEL=model_telco;
    NUMTRIES specifies the number of times the network is to be trained using a different starting points. Specifying this option helps ensure that the optimizer finds the set of weights that truly minimizes the objective function and does not return a local minimum. The value of number must be an integer between 1 and 99,999. The default is 5.
    MAXITER specifies the maximum number of iterations (weight adjustments) for the optimizer to make before terminating. When you are training using large data sets, you can do a training run with MAXITER=1 to determine approximately how long each iteration will take. The default is 50. (You can see from the output table "Training Table" that iterations stopped at 50 without necessarily reaching the minimum. For this case, I tried MAXITER=1000 but it stopped at 104 and there wasn't a noticeable improvement.)
    OUTMODEL specifies the data set to which to save the model parameters for the trained network. These parameters include the network architecture, input and target variable names and types, and trained weights. You can use the model data set later to score a different input data set as long as the variable names and types of the variables in the new input data set match those of the training data set. For example, if we have new rows available for telco data set we can use the previous model we developed to utilize:

    PROC HPNEURAL DATA=telco_new;
    SCORE MODEL=model_telco OUT=score_telconew;
    RUN;