Not all numbers are mean

In [1] D. Stapel mentioned a test with 32 students with two attributes: secure / insecure and chose meat / chose vegetarian dish. He gave the following table:

60% || 40%
-----------
20% || 80%

The number theorist in me quickly noticed something funny: these percentages are not possible. For integers n1 + n2 = 32, 60% * n1 and 20% * n2 are never close enough to integers at the same time. I pointed this out to the authors, but got no reply. A week later Stapel’s fraud was published.

This is a simple example of a much more general phenomena for discrete distributions.  Not all values are possible for mean, st.dev., etc. I will give some examples using R-code.

Take N samples with replacement from a probability distribution on the numbers 1 to k. The sum of these samples ranges from N to k*N, so there are 1+(k-1)*N possibilities. On the other hand, the sample mean is a number between 1 and k, and if we write the mean with 2 decimals, there are 1+(k-1)*100 different numbers. If N is small, only N% of all these 2-decimals numbers are possible. In other words, for an integer n <= k are in the interval [n, n+1) precisely N numbers possible. Example test:

m=4.13; N=16;
> round(round(m*N, 0)/N, 2)
[1] 4.12

Let’s try the situation of 16 students that score some items on a scale of 1 to 7 (a Likert scale, very popular in circles of social psychology).

v <- function(N, k, ...){sample(1:k, N, replace = TRUE, ...)}
u <- unique(sapply(1:10000, function(i){mean(v(16,7))}))
round(u[order(u)],2)

This will give a row of  numbers starting with 2.31 2.38 2.44 2.50 2.56 2.62 2.69 2.75 2.81 2.88 2.94 3.00 or similar. These are the plausible possibilities. Using the test above we can make a list of the small N’s that can result in the give means:

> b[100:700]=sapply(100:700, function(i){m=round(i/100, 2); return((1:100)[m == round(round(m*(1:100), 0)/(1:100), 2)]) })
> fN <- function(m){b[100*m]}
> fN(4.13)
[[1]]
[1] 15 23 30 31 38 39 45 46 47 52 53 54 55 60 61
[16] 62 63 67 68 69 70 71 75 76 77 78 79 82 83 84
[31] 85 86 87 89 90 91 92 93 94 95 97 98 99 100

— o —

[1] R. Vonk 2011 circulated an unpublished note about “selfish” meat eaters by D. Stapel (e.a.) to the press (in Dutch) that contained faked numbers.

Advertisements
This entry was posted in Statistics and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s