Why validation in Principle Components Analysis?

I have written an analysis and replication of the following figure in M&M05:


It is known – in some circles- that this figure is misleading. What I have not seen yet is that the figure should have looked like this:


and also not have seen is that – had M&M done a proper validation of the choice of Principle Components, they would also have found a hockestick in their version of the scaling of the data:


The full report is here (warning, PDF):

Draft: Why validation in Principle Components Analysis?

Reactions that are directly related to the statistics and R code in this draft are welcome.  If you have someting else to say wait until after the first revision.

[M&M05a]: McIntyre, S. and McKitrick, R.: 2005, ‘Hockey sticks, principal components, and spurious significance’, Geophys. Res. Lett. 32, L03710

Posted in Uncategorized | Leave a comment

Where was that “hiatus”?

Some people are still believing that the global temperature somehow has not increased, or increased less than expected in the last 18 years. Nothing is further from the truth: taking the average of the four most commonly used surface temperature series (GISS, NCEI, Hadcrut, JMA) until mid 2015 we get the following picture:


Yes, the last 18 years the linear increase was less than the increase in the 30 years before, but that does not mean the overall rise in global temperature anomaly was smaller: there is also this jump that needs to be taken into account. If we do that, the total increase is very consistently 0.14 degrees per decade since 1967.

Let’s zoom in on the upper right corner, it contains an extra message:

27-08-15b The median of all anomalies since mid 1997 lays .16 above the chosen reference temperature (the average of 1986 – 2006),  but what is astonishing is thathalf of the months in the last 18 years have been warmer than any month before (with two exeptions). .

Posted in Uncategorized | 2 Comments

Takens Theorem

A famous dynamical system is the Lorenz butterfly. It is a curve that moves in time through the (X,Y,Z)-space and that is attracted by a strange attractor.

# discrete simulation of the Lorenz system
# N is some large number

s = 10; r = 28; b = 8/3
x = 1; y=-1; z = 10

X=rep(x,N); Y=rep(y, N); Z=rep(z, N)
for (t in 2:N) {
  x=X[t-1];y=Y[t-1]; z=Z[t-1]
  dx= - s*x + s*y
  dy= r*x - y - x*z
  dz= - b*z + x*y
  X[t]=x+dx*dt; Y[t]=y+dy*dt; Z[t]=z+dz*dt  

This can be visualized quite easily:

lorenz002 from MrOoijer on Vimeo.

This seems rather chaotic, but there is order in the chaos. There is a theorem by Floris Takens from 1981 that says we can reconstruct the dynamical system from the X-values alone. Choose a parameter tau, and the system (X(t), X(t-tau), X(t-2*tau)) is some kind of projection of the original system. For instance for tau=25 it looks like this:


So let’s assume we have a time series with observations from a system that might be approximated by some dynamical system. Using Takens theorem we can try to understand some of the characteristics of the underlying dynamical system. Or, if we have observed more than one variable, this analysis might help us understand the relationship bewteen the variables.

There are technical difficulties. Takens theorem is about smooth differential equations, whereas in the real world we will observe discrete points in time, i.e. we have difference equations. Secondly, we have a lot of noise in the observations, so we need some sort of stochastic version of the theorem.

Fortunately most of these points have been addressed in the mathematical literature since 1981. See part II.

Posted in Uncategorized | Leave a comment

18 years hiatus?

Regression is:

Anomaly= a+ b*(months since 1-1-1997)/120 + 
         c*ENSO(lag=3) + d*TSI(ssp-proxy with lag=3) 
         + e*volcanic_areasols(lag=6)

The confidence intervals are the standard CI’s  corrected for auto-correlation. Notice the larger uncertainty in the coefficients for the satellite-data,  caused by higher auto-correlation and bigger swings in the data.

The preRS is similar to the normal RS but penalizes for variates that do not contribute to the regression enough.

Pictures and processing from https://mrooijer.shinyapps.io/graphic/










Posted in Uncategorized | Leave a comment

Global Temperature Applet

I made an applet in R with Shiny: explore the global temperature trends and the various factors that influence the variations.  Click the picture below to go to the applet.


This is a work in progress. Comments, questions, etc. can be given in the reply area for this blog item.

Posted in Statistics | Leave a comment

It has not stopped

this-Cowtan & Way this-Hadcrut4 this-JMA this-NASA-GISS this-NOAAAnd finally the series until end 2013 from Berkeley:



Posted in Uncategorized | 1 Comment

GISS Global temperatures: reporting history

GISS Global Sea and Land Surface Temperatures as they were reported in 1981, 1995, 2003 and today. The line are 5 years averages, or, more accurate, 60 months averages.



Baselines: for 1981 the baseline is the period of 1880-1980. For others it is 1951-1980. Can be found in the documents.

Conclusion: this diagram shows that Steven Goddard erred here.

The “hiatus”


In this picture we have plotted the 60-months moving average against a background of the monthly averages.

If we want to view this moving average as an indication of the current anomaly, then we have to move it to the right, otherwise the moving averages of the “current anomaly” would contain future values.

Normally the hiatus is illustrated by drawing a straight line from around 1997, and – as this line is almost horizontal -it is used as proof that the warming has stalled. However, it is not so simple. In 1997-1998 the global temperatures suddenly rose to new levels, and from that new level indeed the temperatures stayed more or less the same(*).

Look at the dots in the upper right corner. These higher values are all in a sudden  common after 1998, whereas before 1997 there are none of them except for a couple of outliers. After 1998 more than 50% of all months were warmer than any month before (except those 2 outliers).

What we witness is a steep increase of 0.18 degree C in 5 years time followed by a stationary period. You cannot call that a stagnation IMHO. The next blog item gives a more detailed analysis.

(*) Using only values after 1997 in an autocorrelated series is a statistical blunder.

Posted in Uncategorized | 1 Comment