Importing Data Into SAS
We have to import the data into SAS before analyzing it. This is usually the first step. One of the best ways to import data into SAS is PROC IMPORT (PROC short for procedure).
First, open SAS. Here is a typical workspace:
Top-left window is the Log window. The Log window enables you to view messages about your SAS session and your SAS programs. If the program that you submit has unexpected results, then the log helps you to identify the error. If you cannot see the Log window, you can reopen it from the main menu View -> Log.
Bottom-left window is the Editor window. The Program Editor window enables you to enter, edit, submit, and save SAS programs. We will type our programs here. If you cannot see the Log window, you can reopen it from the main menu View -> Enhanced Editor.
Let's import our sample data. You can download it from here. We will import this file into SAS so that we can further process it. But before that let's create a SAS library where we will keep our converted files. This will help us access these files later within SAS without importing them again. You can create a library from the menu, Tools -> New Library.
Now we can import our file into the new library we created with IMPORT procedure:
If import is successful, you will see a bunch of code in your Log window. Before further digging into PROC IMPORT, you may want to see the imported data. To do this we use PROC PRINT:
After you hit the F3 key on your keyboard or click the Run from the toolbar, you will see the imported data in the output window.
Now let's dig in to the details.
PROC IMPORT FILE="file location"
calls IMPORT procedure and FILE argument specifies the complete path and filename for the input PC file, spreadsheet, or delimited external file. In our case it's FILE="D:\sample_xy.csv".
OUT=saslibraryname.sasimportedfilename
identifies the output SAS data set with the library. In our case it's OUT="tutorial.lecture1". tutorial is the name of the SAS library we created earlier. If we don't denote a library, SAS will write this to WORK library which is cleared after each SAS session. To make the data set permanent we need to state the library so that we can access to it later. We will see shortly how we can access this data within SAS.
DBMS=identifier
specifies the type of data to import. Default value is DBMS=CSV so we didn't actually have to state it in our case. Other options are TAB for tab-delimited file and DLM='char' to specify the delimiting character.
REPLACE;
overwrites an existing SAS data set. If you do not specify REPLACE, the IMPORT procedure does not overwrite an existing data set. Note that we end every statement in SAS with a semicolon.
Now that we wrote our PROC IMPORT statement with its arguments, we can move on to the separate statements within IMPORT procedure but outside PROC IMPORT statement.
GETNAMES=YES;
Specify whether SAS variable names should be generated from the first record in the input file. In our case this will help SAS to read the column names from the first row.
DATAROW=2;
Specifies the first row for reading the data. In our case first row is 2 because row 1 has the column titles. If GETNAMES=YES, the default value is 2 else it's 1.
GUESSINGROWS=20;
Specifies the number of rows to scan the appropriate data type for the column. The default value is 20. Let's see where stating this number can be useful. Below is a list of patients with different medical operation codes applied. By default SAS will scan down to patient 20 to determine data type of Operation Code. Since data is numeric until patient 27, SAS will assign numeric data type to this column and will ignore any non-numeric data it will encounter and enter them as empty cells. Therefore it is sometimes very helpful to state this number to avoid errors of such.
Patient | Operation Code |
---|---|
1 | 99348 |
2 | 99347 |
3 | 99347 |
4 | 99347 |
5 | 99309 |
6 | 99308 |
7 | 99308 |
8 | 99345 |
9 | 99312 |
10 | 99317 |
11 | 99307 |
12 | 99213 |
13 | 99304 |
14 | 99245 |
15 | 99245 |
16 | 99213 |
17 | 99213 |
18 | 99304 |
19 | 99307 |
20 | 99345 |
21 | 99245 |
22 | 99308 |
23 | 99343 |
24 | 99345 |
25 | 99245 |
26 | 99213 |
27 | J2357 |
28 | 99308 |
29 | J2357 |
Importing Tab-Delimited Files
Above method works well with CSV files and in it will be sufficient in majority of the cases. There are however tab-delimited files that you will encounter from time to time. Following code achieves importing of such files to SAS.
PROC IMPORT FILE = 'D:\location.file' OUT = tutorial.newtable DBMS = CSV REPLACE;
DELIMITER '09'x;
RUN;
Note that while IMPORT does a great job in identifying column types, an additional DATA step is usually required with complex datasets.
Leave a Comment