In the first part of this series I started the analysis on a interesting dataset drawn from a sample of the users from the crowd-source review service Yelp. This post presents the second part of the analysis.
Continues from part I.
The following is the final set of features used for the analysis
##  "fans" "friendcount_log" ##  "votes.funny_log" "votes.useful_log" ##  "votes.cool_log" "votes.total_log" ##  "compliments.profile_log" "compliments.cute_log" ##  "compliments.funny_log" "compliments.plain_log" ##  "compliments.writer_log" "compliments.note_log" ##  "compliments.photos_log" "compliments.hot_log" ##  "compliments.cool_log" "compliments.more_log" ##  "compliments.list_log" "compliments.total_log" ##  "review_count_log" "starclass"
The evolution of my classification algorithms is the following:
a) Decision Trees
b) Random Forests
c) Stochastic Gradient Boosting
After several initial tests using Decision trees, I tried Random Forests and eventually settled with Stochastic Gradient Boosting due to its performance – computational cost ratio. Since it is faster than random forests prototyping and parameter search could be done in a more efficient way.
Experimentally I created the following four test cases:
1- One star users vs 5 star users (2 class target)
2- One star users vs all the rest (2 class target)
3- Five star users vs all the rest (2 class target)
4- One star users vs 5 star users vs all the rest (3 class target)
As a general rule, all the experiments involved 5-fold cross validation, feature analysis to check for near zero variance predictors and feature centering.
The parameter tuning of the Stochastic Gradient Boosting was performed by means of a carefully defined multi-parametric search grid. The tunable parameter is the SGB model are
n.trees (# Boosting Iterations),
interaction.depth (Max Tree Depth),
shrinkage (Shrinkage) and
n.minobsinnode (Min. Terminal Node Size).
myTuneGrid <- expand.grid(n.trees = seq(1,201,5),interaction.depth = 2:8,shrinkage = 0.1,n.minobsinnode = 10
The general results evaluated on the testing data for each of the aforementioned 4 cases are hereby presented
Case 1. One star users vs 5 star users (2 class target)
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull ## 0.7700045 0.2463451 0.7665594 0.7734225 0.9046364 ## AccuracyPValue McnemarPValue ## 1.0000000 0.0000000
Case 2. One star users vs all the rest (2 class target)
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull ## 0.9455470 0.1630522 0.9447197 0.9463653 0.9897775 ## AccuracyPValue McnemarPValue ## 1.0000000 0.0000000
Case 3. Five star users vs all the rest (2 class target)
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull ## 0.8529848 0.4137592 0.8516982 0.8542644 0.8748240 ## AccuracyPValue McnemarPValue ## 1.0000000 0.0000000
Case 4. One star users vs 5 star users vs all the rest (3 class target)
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull ## 0.8245047 0.4625911 0.8232026 0.8258011 0.8398815 ## AccuracyPValue McnemarPValue ## 1.0000000 0.0000000
The complete detailed Confusion Matrices are not included. However, from these matrices it could be observed how the most difficult class to classify correctly is the “Unhappy” one, that is, the one where the users give only extremely negative reviews.
In the following section the results will be discussed in its appropriate context.
The first thing that we can notice looking at the classification results from the 4 experimental cases is that the classification scores are surprisingly positive, especially taken into consideration the high level of noise and uncertainty included in a human-based endeavor such as Yelp. It is worth remembering that the experience and perception of a business upon an user is a very subjective phenomena, dependent on a high number of variables and factors, including the user’s internal subjective mental states.
The classification results show that the “unhappy” class is always the most difficult to classify, this is explained in the lack of distinct features characterizing this class. It would seem that terribly dissatisfied people lack followers, compliments and other variables that make the “extremely happy users” more easily classifiable.
However, from the results we can conclude that even in the presence of highly subjective phenomena, good quality data can guide a successful analysis and give objective hints to perform evidence-based decisions. In this case the 4 experimental cases would serve as a basis for designing new variables that, once collected, would serve to strengthen the classification of these two important groups of users.