Exploratory Statistics: psych package
First step after data preparation is exploratory statistics. One of the most popular R packages is psych. We will use cake sample data from SAS website. You can download the data from here. This dataset is from a cake-baking contest: each participant's last name, age, score for presentation, score for taste, cake flavor, and number of cake layers. The number of cake layers is missing for two observations. The cake flavor is missing for another observation.
First, start RStudio. Then type the following (don't forget to replace file location with your own file location).
> library(psych)
> describe(data.frame(cake$Age, cake$PresentScore, cake$TasteScore, cake$Layers))
Note that data.frame function is used because input for describe function has to be either a matrix or data frame. Nomenclature for selecting individual columns of a data frame is dataframe$column, i.e. cake$Age selects the Age column from the cake data frame.
It is also possible to simply specify the cake data frame without specifying column names and let the describe function calculate descriptive statistics for all variables instead:
> describe(cake)
We can use omit argument to omit the categorial variables from calculation:
> describe(cake, omit=TRUE)
We can also group variables with describeBy function. Let's say we want to find out average taste scores for different flavors. Following will achieve that.
> describeBy(cake$TasteScore, group = cake$Flavor, omit=TRUE, mat=TRUE, na.rm=TRUE)
mat=TRUE argument shapes the output data into a matrix, i.e. a table. If not specified, describeBy by default produces multiple lists instead of one table. na.rm=TRUE deletes missing values before calculation.
Leave a Comment