What is a Grnn? Not a neural net!

In a recent article ([1] , [2]) a type of neural net called a Generalized Regression Nueral Net was used. What is it? It turned out to be no more than a simple smoothing algoritm. In the following code my variables that start with a Caapital letter are vectors or matrices. I can write the whole “GRNN” in three lines of R- code :

My_grnn <-
  function(Y, Xtrain, Xtest, sigma, m=nrow(Xtrain)){
    D<- as.matrix(dist(rbind(Xtrain, Xtest)))[1:m, -(1:m)]
    W<- exp(- D^2/((2*sigma^2)))
    return( Y %*% W / colSums(W))
}

Here Xtrain are observation with known results Y, and Xtest are observation for wchich we want to return the predicted values. In this simple algoritm D are the Euclidian distances between Xtrain and Xtest, W is the weight attached to those distances and then we can return the weighted average of Y as predicions.

As an algorithm this does not deserve the name neural network. It is also doing very badly as a machine learning algorithm compared to other generalized regressions. Compare it to the workhorse xgboost on the Boston Housing data.

# ------------- real example
require(MASS)
data(Boston)
B <- scale(Boston, center=TRUE, scale = TRUE)
B[,14]<- Boston[,14]
# the medv column has median value of owner-occupied homes in $1000
# 506 rows,lets take 80 to predict
set.seed(17*41)
h<- sample(nrow(Boston), 80)

Xtrain<- as.matrix(B[-h, -14])
Ytrain<- B[-h, 14]
Xtest<- as.matrix(B[h, -14])
Y_test<- B[h, 14]

# determine best sigma for grnn
range <- seq(0.1, 3, by=0.01)
result <- sapply(range, function(s){
  Y_hat<- My_grnn(Ytrain, Xtrain, Xtest, sigma=s)
  return(Metrics::rmse(Y_hat, Y_test))
})
best<- range[which(result==min(result))]
pred_grnn <- My_grnn(Ytrain, Xtrain, Xtest, sigma=best)

require(xgboost)
param <- list(
  eta = 0.005,
  subsample = 0.95,
  max_depth = 6,
  min_child_weight = 1,
  colsample_bytree = 0.7,
  nthreads = 3
)

dmodel <- xgb.DMatrix(as.matrix(Xtrain), label = Ytrain, missing=NA)
dvalid <- xgb.DMatrix(as.matrix(Xtest), label = Y_test, missing=NA)

model <- xgb.train(
  data = dmodel,
  nrounds = 10000, 
  params = param,  
  watchlist = list(val = dvalid),
  early_stopping_rounds = 1
)
pred_xgb <- predict(model, dvalid, ntree_limit = model$best_iteration)
## compare predictions with root mean squared error
Metrics::rmse(pred_grnn, Y_test)
Metrics::rmse(pred_xgb, Y_test)
## [1] 6.086455
## [1] 2.993994
## yes -- grnn is very bad, a factor 2 worse than xgboost

Notes

[1] J. Abbot & J. Marohasy,  “The application of machine learning for evaluating anthropogenic versus natural climate change”, GeoResJ, dec. 2017,  https://doi.org/10.1016/j.grj.2017.08.001

[2] For fundamental criticism see f.i.  https://andthentheresphysics.wordpress.com/2017/08/22/machine-unlearning/

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s