In a recent article ([1] , [2]) a type of neural net called a Generalized Regression Nueral Net was used. What is it? It turned out to be no more than a simple smoothing algoritm. In the following code my variables that start with a *Caapital letter* are vectors or matrices. I can write the whole “GRNN” in three lines of R- code :

My_grnn <-
function(Y, Xtrain, Xtest, sigma, m=nrow(Xtrain)){
D<- as.matrix(dist(rbind(Xtrain, Xtest)))[1:m, -(1:m)]
W<- exp(- D^2/((2*sigma^2)))
return( Y %*% W / colSums(W))
}

Here *Xtrain* are observation with known results *Y*, and *Xtest* are observation for wchich we want to return the predicted values. In this simple algoritm *D* are the Euclidian distances between *Xtrain* and *Xtest*, *W* is the weight attached to those distances and then we can return the weighted average of *Y* as predicions.

As an algorithm this does not deserve the name neural network. It is also doing very badly as a machine learning algorithm compared to other generalized regressions. Compare it to the workhorse xgboost on the Boston Housing data.

# ------------- real example
require(MASS)
data(Boston)
B <- scale(Boston, center=TRUE, scale = TRUE)
B[,14]<- Boston[,14]
# the medv column has median value of owner-occupied homes in $1000
# 506 rows,lets take 80 to predict
set.seed(17*41)
h<- sample(nrow(Boston), 80)
Xtrain<- as.matrix(B[-h, -14])
Ytrain<- B[-h, 14]
Xtest<- as.matrix(B[h, -14])
Y_test<- B[h, 14]
# determine best sigma for grnn
range <- seq(0.1, 3, by=0.01)
result <- sapply(range, function(s){
Y_hat<- My_grnn(Ytrain, Xtrain, Xtest, sigma=s)
return(Metrics::rmse(Y_hat, Y_test))
})
best<- range[which(result==min(result))]
pred_grnn <- My_grnn(Ytrain, Xtrain, Xtest, sigma=best)
require(xgboost)
param <- list(
eta = 0.005,
subsample = 0.95,
max_depth = 6,
min_child_weight = 1,
colsample_bytree = 0.7,
nthreads = 3
)
dmodel <- xgb.DMatrix(as.matrix(Xtrain), label = Ytrain, missing=NA)
dvalid <- xgb.DMatrix(as.matrix(Xtest), label = Y_test, missing=NA)
model <- xgb.train(
data = dmodel,
nrounds = 10000,
params = param,
watchlist = list(val = dvalid),
early_stopping_rounds = 1
)
pred_xgb <- predict(model, dvalid, ntree_limit = model$best_iteration)
## compare predictions with root mean squared error
Metrics::rmse(pred_grnn, Y_test)
Metrics::rmse(pred_xgb, Y_test)
## [1] 6.086455
## [1] 2.993994
## yes -- grnn is very bad, a factor 2 worse than xgboost

**Notes**

[1] J. Abbot & J. Marohasy, “The application of machine learning for evaluating anthropogenic versus natural climate change”, GeoResJ, dec. 2017, https://doi.org/10.1016/j.grj.2017.08.001

[2] For fundamental criticism see f.i. https://andthentheresphysics.wordpress.com/2017/08/22/machine-unlearning/